I am new to Flink.
Was looking into the code to understand how Flink does
FullSnapshot and Incremental Snapshot using RocksDB
What I understood:
1. For full snapshot, we call RocksDb snapshot api which
basically an iterator handle to the entries in RocksDB instance. We
iterate over every entry one by one and serialize that to some
distributed file system.
Similarly in restore for fullSnapshot, we read the file to get
every entry and apply that to the rocksDb instance one by one to
fully construct the db instance.
2. On the other hand in for Incremental Snapshot, we rely
on RocksDB Checkpoint api to copy the sst files to HDFS/S3
Similarly on restore, we copy the sst files to local directory and
instantiate rocksDB instance with the path of the directory.
1. Why did we took 2 different approaches using different
RocksDB apis ?
We could have used Checkpoint api of RocksDB for fullSnapshot as
2. Is there any specific reason to use Snapshot API of
rocksDB over Checkpoint api of RocksDB for
I am sure, I am missing some important point, really curious
to know that.
Any explanation will be really great. Thanks in advance.