Chapter 6: Storage Layer
In the previous chapter, Chapter 5: Transaction Execution, we saw how ResilientDB takes agreed-upon transactions and executes them using specialists like KVExecutor
or UTXOExecutor
. For example, the KVExecutor
handles a command like Set("myKey", "myValue")
.
But where does that myValue
actually get stored so that it isn’t lost when the computer restarts? And how can we retrieve it later using Get("myKey")
?
Welcome to Chapter 6! We’re diving into the Storage Layer. This is the component that handles the actual saving and loading of data – the blockchain state and transaction history.
Think of the Storage Layer as the filing cabinet or hard drive of ResilientDB. After the committee (Chapter 3) agrees on a decision and the specialist (Chapter 5) knows what change to make, the Storage Layer is responsible for actually writing it down and putting it in the right place.
Why Do We Need a Storage Layer?
Imagine you’re using ResilientDB as a super-reliable notepad. You tell it to remember “Meeting at 3 PM”.
- The command gets sent (Chapter 1), communicated (Chapter 2), agreed upon (Chapter 3), collected (Chapter 4), and the execution logic (Chapter 5) knows it needs to store this note.
- But if the computer running ResilientDB shuts down, how does it remember “Meeting at 3 PM” when it starts back up?
We need persistence – the ability to save data in a way that survives restarts. This usually means writing to a hard disk drive (HDD) or solid-state drive (SSD).
However, sometimes, especially during testing or for specific high-speed scenarios, we might prioritize speed over durability. We might want to store data directly in the computer’s fast memory (RAM), even if it gets lost on restart.
The Storage Layer provides an abstraction to handle both needs: reliable disk storage and fast memory storage.
The Blueprint: The Storage
Interface
ResilientDB defines a standard “blueprint” for how storage should work, called the Storage
interface (defined in chain/storage/storage.h
). This interface lists the basic operations any storage engine must support, like:
SetValue(key, value)
: Store a piece of data (value) associated with a name (key).GetValue(key)
: Retrieve the data associated with a key.- Other operations like getting ranges of data or handling data versions.
Analogy: The Storage
interface is like the job description for a librarian. It says the librarian must be able to check books in (SetValue
) and check books out (GetValue
), but it doesn’t specify how they organize the shelves or which specific building they work in.
// Simplified view from chain/storage/storage.h
namespace resdb {
class Storage {
public:
Storage() = default;
virtual ~Storage() = default; // Virtual destructor is important for interfaces
// Store the key-value pair. Returns 0 on success.
virtual int SetValue(const std::string& key, const std::string& value) = 0;
// Retrieve the value for the given key. Returns empty string if not found.
virtual std::string GetValue(const std::string& key) = 0;
// === Other methods omitted for simplicity ===
// virtual std::string GetAllValues() = 0;
// virtual std::string GetRange(...) = 0;
// virtual int SetValueWithVersion(...) = 0;
// virtual std::pair<std::string, int> GetValueWithVersion(...) = 0;
// virtual ... GetHistory(...) = 0;
};
} // namespace resdb
This code defines the basic contract. Any class that claims to be a Storage
provider must provide implementations for these virtual
functions (indicated by = 0
).
The Implementations: Different Ways to Store Data
ResilientDB comes with different implementations (librarians who follow the job description) of the Storage
interface:
-
ResLevelDB
(Disk Storage):- What it is: This uses LevelDB, a fast key-value storage library developed by Google that writes data to disk.
- Analogy: This is like a traditional library with physical shelves. When you check in a book (
SetValue
), the librarian carefully files it on the correct shelf. It stays there even if the library closes overnight (persistence). Checking out (GetValue
) involves finding it on the shelf. - Pros: Data is persistent (survives restarts). Well-suited for production use where durability is crucial.
- Cons: Disk access is slower than memory access.
-
MemoryDB
(Memory Storage):- What it is: This stores everything directly in the computer’s RAM using standard data structures like maps.
- Analogy: This is like a librarian using only sticky notes on their desk. Checking in (
SetValue
) means writing on a new note and sticking it somewhere. Checking out (GetValue
) is quick – just glance at the notes. But when the librarian goes home (program stops), the desk is cleared, and all notes are lost (non-persistent). - Pros: Very fast access because it avoids slow disk operations. Useful for testing or scenarios where data loss on restart is acceptable.
- Cons: Data is lost when the program stops or restarts.
How is Storage Used? (Example: KVExecutor
)
Remember the KVExecutor
from Chapter 5? It needs to store the key-value data somewhere. When it’s created, it’s given an instance of a Storage
object (which could be ResLevelDB
or MemoryDB
depending on the configuration).
// Simplified from executor/kv/kv_executor.cpp
#include "executor/kv/kv_executor.h"
#include "chain/storage/storage.h" // Include the Storage interface
// Constructor - takes a Storage object
KVExecutor::KVExecutor(std::unique_ptr<Storage> storage)
: storage_(std::move(storage)) {} // Store the passed-in storage object
// Method to handle SET command
void KVExecutor::Set(const std::string& key, const std::string& value) {
// Call the SetValue method ON THE STORAGE OBJECT
// We don't know or care here if it's LevelDB or MemoryDB!
storage_->SetValue(key, value);
}
// Method to handle GET command
std::string KVExecutor::Get(const std::string& key) {
// Call the GetValue method ON THE STORAGE OBJECT
return storage_->GetValue(key);
}
The KVExecutor
code doesn’t need to know the specific type of storage. It just calls the methods defined in the Storage
interface (storage_->SetValue
, storage_->GetValue
). This makes the KVExecutor
flexible – it can work with any storage backend that follows the Storage
blueprint.
Internal Implementation Walkthrough
Let’s see what happens when KVExecutor
calls storage_->SetValue("myKey", "myValue")
:
KVExecutor
Call: TheKVExecutor
invokes theSetValue
method on thestorage_
object it holds.- Polymorphism (Magic Dispatch): Because
SetValue
is avirtual
function in theStorage
base class, the C++ runtime figures out the actual type of thestorage_
object (is itResLevelDB
orMemoryDB
?). It then calls theSetValue
implementation specific to that type. - Specific Implementation Executes:
- If
storage_
isResLevelDB
: TheResLevelDB::SetValue
method is called. This method will interact with the LevelDB library, preparing the data to be written to disk (often using a write batch for efficiency). - If
storage_
isMemoryDB
: TheMemoryDB::SetValue
method is called. This method will simply update its internal in-memory map (e.g.,kv_map_["myKey"] = "myValue"
).
- If
Sequence Diagram:
This diagram shows how the KVExecutor
interacts with the abstract Storage
interface, and how the call is routed to the correct implementation (ResLevelDB
or MemoryDB
).
Diving Deeper into Code
Let’s look at simplified versions of the implementations.
ResLevelDB
(Disk Storage):
This class wraps the actual LevelDB library (leveldb::DB
).
// Simplified from chain/storage/leveldb.h and chain/storage/leveldb.cpp
#include "chain/storage/storage.h"
#include "leveldb/db.h" // Include LevelDB's header
#include "leveldb/write_batch.h"
namespace resdb {
namespace storage {
class ResLevelDB : public Storage { // Inherits from Storage interface
public:
ResLevelDB(/* ... path, config ... */) {
// Open the LevelDB database file at the given path
leveldb::Options options;
options.create_if_missing = true;
leveldb::Status status = leveldb::DB::Open(options, path, &db_ptr_);
// Handle errors...
db_ = std::unique_ptr<leveldb::DB>(db_ptr_); // Manage DB pointer
}
// Implement SetValue using LevelDB
int SetValue(const std::string& key, const std::string& value) override {
// Add the operation to a batch for efficiency
batch_.Put(key, value);
// Maybe write the batch to disk if it's full enough
if (batch_.ApproximateSize() >= write_batch_size_) {
leveldb::Status status = db_->Write(leveldb::WriteOptions(), &batch_);
// Handle status, clear batch_ on success
return status.ok() ? 0 : -1;
}
return 0; // Success (batched)
}
// Implement GetValue using LevelDB
std::string GetValue(const std::string& key) override {
std::string value;
leveldb::Status status = db_->Get(leveldb::ReadOptions(), key, &value);
return status.ok() ? value : ""; // Return value or empty if not found
}
private:
std::unique_ptr<leveldb::DB> db_; // Pointer to the actual LevelDB object
leveldb::WriteBatch batch_; // Buffer for writes
// ... other members like config ...
};
// Factory function to create a LevelDB storage instance
std::unique_ptr<Storage> NewResLevelDB(const std::string& path, ...) {
return std::make_unique<ResLevelDB>(path, ...);
}
} // namespace storage
} // namespace resdb
This shows ResLevelDB
implementing SetValue
and GetValue
by calling the corresponding methods on the underlying leveldb::DB
object. It uses a WriteBatch
to group multiple writes together before sending them to the disk, which is much more efficient.
MemoryDB
(Memory Storage):
This implementation is much simpler, using standard C++ maps.
// Simplified from chain/storage/memory_db.h and chain/storage/memory_db.cpp
#include "chain/storage/storage.h"
#include <unordered_map> // Use a hash map for key-value storage
#include <string>
#include <memory>
namespace resdb {
namespace storage {
class MemoryDB : public Storage { // Inherits from Storage interface
public:
MemoryDB() {} // Constructor is simple
// Implement SetValue using an in-memory map
int SetValue(const std::string& key, const std::string& value) override {
// Just insert or update the key in the map
kv_map_[key] = value;
return 0; // Always succeeds in this simple model
}
// Implement GetValue using an in-memory map
std::string GetValue(const std::string& key) override {
auto it = kv_map_.find(key); // Search the map
if (it != kv_map_.end()) {
return it->second; // Found: return the value
} else {
return ""; // Not found: return empty string
}
}
private:
// The actual storage: a map from string keys to string values
std::unordered_map<std::string, std::string> kv_map_;
};
// Factory function to create a MemoryDB storage instance
std::unique_ptr<Storage> NewMemoryDB() {
return std::make_unique<MemoryDB>();
}
} // namespace storage
} // namespace resdb
Here, SetValue
just assigns the value to the key in the kv_map_
, and GetValue
looks up the key in the map. It’s fast but, crucially, kv_map_
exists only in RAM and disappears when the program stops.
Choosing Your Storage
So, which one should you use?
- For production deployments where you need your data to be safe and survive restarts, you’ll almost always use
ResLevelDB
. - For unit testing components like
KVExecutor
, usingMemoryDB
can make tests much faster as they don’t need to read/write from the disk. - For performance testing or specific applications where maximum speed is needed and persistence isn’t the top priority,
MemoryDB
might be considered.
How do you tell ResilientDB which storage to use? This is typically done through configuration files, which we’ll explore in Chapter 8: ResilientDB Configuration (ResDBConfig).
Conclusion
You’ve now learned about the Storage Layer, the foundation where ResilientDB keeps its data!
- We saw that the Storage Layer acts as the filing cabinet or hard drive, responsible for saving and retrieving the blockchain state.
- The
Storage
interface provides a standard blueprint for storage operations (SetValue
,GetValue
). ResLevelDB
is an implementation using LevelDB for persistent disk storage (durable but slower).MemoryDB
is an implementation using in-memory maps for fast, temporary storage (quick but data lost on restart).- Components like
KVExecutor
interact with theStorage
interface, allowing them to work with either disk or memory storage without changing their own code.
Now that we know how data is stored, how does ResilientDB ensure that even if a replica crashes, it can recover its state correctly? How does it periodically save snapshots of its state to avoid replaying the entire transaction history from the beginning? That’s where Checkpointing and Recovery come in, the topic of our next chapter!
Next: Chapter 7: Checkpointing & Recovery
Generated by AI Codebase Knowledge Builder