Chapter 2: Network Communication

In Chapter 1: Client Interaction, we learned how you, as a user or application, can talk to ResilientDB using special “Client” tools. You package your request (like storing data) and send it off using something like KVClient.

But where does that request go? And how do the different computers (replicas) that make up the ResilientDB network talk to each other to process your request and agree on the result?

Welcome to Chapter 2! Here, we’ll explore the Network Communication layer. This is the backbone that allows all the different parts of ResilientDB to connect and exchange messages.

Think of it like the postal service and telephone network combined for ResilientDB:

Postal Service: It needs to reliably send letters (messages like transaction proposals or votes) to the correct addresses (other replicas).
Telephone Network: It needs a way to receive incoming calls (client requests or messages from other replicas) so the system can answer them.

This chapter focuses on two key components:

ReplicaCommunicator: Handles sending messages between the ResilientDB replicas. (The Postal Service)
ServiceNetwork: Handles listening for and receiving incoming messages, both from clients and other replicas. (The Telephone Network Operator)

Why is Network Communication Important?

ResilientDB is a distributed system. This means it doesn’t run on just one computer. Instead, a team of computers (replicas) work together. To work together, they must communicate constantly over the network.

Imagine you and your friends are trying to decide on a movie to watch tonight.

You send a message proposing “Let’s watch SciFi Movie X!” (like a client sending a transaction).
Your friends need to receive this message.
They discuss amongst themselves, sending messages back and forth like “I vote yes for SciFi Movie X!” or “How about Comedy Movie Y?” (like replicas exchanging votes for consensus).
Eventually, enough friends agree, and someone sends a final message: “Okay, SciFi Movie X it is!” (like a replica sending a response back to the client).

Without a reliable way to send and receive these messages, the group could never agree! ResilientDB needs its own robust communication system for the replicas to coordinate.

Sending Messages Between Replicas: `ReplicaCommunicator`

The ReplicaCommunicator is responsible for sending messages from one replica to other replicas in the network. It’s like the postal service handling outgoing mail.

Key Jobs:

Knowing Addresses: It knows the network addresses (IP and port) of all the other replicas it needs to talk to, based on the system configuration (ResilientDB Configuration (ResDBConfig)).
Sending Methods: It provides ways to send messages:
- Broadcast: Send the same message to all other replicas (like sending a party invitation to everyone).
- Targeted Send: Send a message to one specific replica (like sending a private note).
Reliability (Under the Hood): It often uses lower-level tools (like NetChannel or AsyncReplicaClient) to handle the actual network sending, potentially retrying if a message fails to send initially.
Batching (Optional): For efficiency, it might collect several small messages going to the same destination and send them together in one larger “package” (BatchQueue).

Analogy: ReplicaCommunicator is the mailroom clerk. You give the clerk a letter (message) and tell them who it’s for (one replica or everyone). The clerk figures out the addresses and makes sure the letters get sent out.

Simplified Usage Example (Conceptual):

Imagine Replica A needs to tell all other replicas about a new transaction proposal.


// Inside Replica A's code (conceptual)
#include "platform/networkstrate/replica_communicator.h"
#include "platform/proto/resdb.pb.h" // Contains message definitions like 'Request'
 
// Assume 'replica_communicator' is already set up with network info
// Assume 'my_transaction_proposal' is a 'Request' message object
 
// Broadcast the proposal to all other replicas
replica_communicator->BroadCast(my_transaction_proposal);
 
// Or, send a specific message just to Replica B (identified by node_id 2)
// Assume 'private_message' is another 'Request' object
replica_communicator->SendMessage(private_message, /* node_id = */ 2);

This code shows how simple it is to use the ReplicaCommunicator. You create your message (my_transaction_proposal or private_message) and call either BroadCast or SendMessage. The communicator handles the rest!

Listening for Incoming Messages: `ServiceNetwork`

While ReplicaCommunicator handles sending messages, ServiceNetwork handles receiving them. Each replica runs a ServiceNetwork instance that acts like its ear to the world (or network, in this case!).

Key Jobs:

Listening: It opens a network “port” (like a specific phone number) and listens for incoming network connections.
Accepting Connections: When a client or another replica tries to connect and send a message, ServiceNetwork accepts the connection.
Receiving Data: It reads the incoming message data from the network connection.
Passing the Message: Once a complete message is received, it doesn’t process the message itself. Instead, it passes the message off to the appropriate internal component (represented by ServiceInterface) for actual processing. This processing might involve adding the transaction to a queue (Message/Transaction Collection (TransactionCollector / MessageManager)) or handling a consensus vote (Consensus Management (ConsensusManager)).

Analogy: ServiceNetwork is like the central telephone operator for a company (a ResilientDB replica).

It listens for the phone ringing (incoming network connections).
It answers the call (accepts the connection).
It takes the message (“Please connect me to Sales,” or the actual transaction data).
It transfers the call or message to the correct department (ServiceInterface).

Simplified Setup Example:

Setting up the ServiceNetwork usually happens when a ResilientDB replica starts.


// Simplified from platform/networkstrate/service_network.cpp
#include "platform/networkstrate/service_network.h"
#include "platform/config/resdb_config.h" // For configuration
#include "platform/networkstrate/service_interface.h" // Interface for processing
 
// Assume 'config' holds this replica's network info (IP, port)
// Assume 'my_service_handler' is an object implementing ServiceInterface
// (This object knows what to *do* with received messages)
std::unique_ptr<resdb::ServiceInterface> my_service_handler = ...;
 
// Create the ServiceNetwork
resdb::ServiceNetwork service_network(config, std::move(my_service_handler));
 
// Start listening in the background
service_network.Run(); // This starts the listening process

This code creates the ServiceNetwork, telling it where to listen (config) and who to pass messages to (my_service_handler). Calling Run() starts the listening loop.

How They Work Together: A Simple Flow

Let’s trace a very simplified path of a client request:

Client Sends: You use KVClient (from Chapter 1) to send a Set("mykey", "myvalue") request. The client library sends this message over the network to one of the ResilientDB replicas (let’s say Replica A).
Replica A Receives: Replica A’s ServiceNetwork is listening. It receives the incoming connection and the Set request message.
Replica A Passes Message: ServiceNetwork passes the Set request to its internal ServiceInterface. This might trigger the consensus process.
Replica A Broadcasts: To get agreement, Replica A’s consensus logic uses ReplicaCommunicator to broadcast the Set request (or a proposal based on it) to Replica B and Replica C.
Replicas B & C Receive: The ServiceNetwork on Replica B and Replica C receive the broadcasted message from Replica A.
Replicas B & C Process: They pass the message to their own ServiceInterface handlers to process (e.g., validate the request, prepare to vote).
Replicas B & C Send Votes: Replicas B and C use their ReplicaCommunicator to send “vote” messages back to Replica A (and possibly others).
Replica A Receives Votes: Replica A’s ServiceNetwork receives the incoming vote messages.
Replica A Finalizes: The votes are passed internally. Once enough votes arrive, Replica A knows the request is agreed upon. It might then execute the command (Transaction Execution (TransactionManager / TransactionExecutor)) and store the data (Storage Layer (Storage / LevelDB / MemoryDB)).
Replica A Responds to Client: Finally, Replica A might use its network connection (managed potentially via NetChannel originating from the initial ServiceNetwork interaction) to send a “Success” response back to the original client.

This diagram simplifies heavily, but it shows the interplay: ServiceNetwork receives, ReplicaCommunicator sends between replicas.

Under the Hood: Code Glimpses

Let’s peek at some simplified code structures.

1. ReplicaCommunicator Sending a Message:

The ReplicaCommunicator might use a helper class like NetChannel (which we saw briefly in Chapter 1) or a dedicated AsyncReplicaClient to talk to a specific replica.


// Simplified from platform/networkstrate/replica_communicator.cpp
 
// Method to send a message to a list of specific replicas
int ReplicaCommunicator::SendMessageInternal(
    const google::protobuf::Message& message,
    const std::vector<ReplicaInfo>& replicas) {
 
  int success_count = 0;
  for (const auto& replica : replicas) {
    // Get a network channel/client for the target replica's IP and port
    // This might create a temporary connection or use a persistent one.
    std::unique_ptr<NetChannel> client = GetClient(replica.ip(), replica.port());
    if (client == nullptr) {
      LOG(WARNING) << "Could not create client for " << replica.ip();
      continue;
    }
 
    // Optionally add a digital signature if configured
    if (verifier_ != nullptr) {
      client->SetSignatureVerifier(verifier_);
    }
 
    // Use the client/channel to send the message (already formatted)
    if (client->SendRawMessage(message) == 0) { // 0 means success
      success_count++;
    } else {
      LOG(ERROR) << "Failed to send message to " << replica.ip();
    }
    // The client/channel might be closed here if not using long connections
  }
  return success_count;
}

This shows ReplicaCommunicator iterating through target replicas, getting a communication channel (NetChannel) for each, and using that channel’s SendRawMessage method.

2. ServiceNetwork Receiving a Message:

ServiceNetwork often uses an underlying “Acceptor” component (Acceptor or AsyncAcceptor) to handle the low-level network listening and connection acceptance.

Acceptor (Simpler, Blocking Style): Listens for a connection, accepts it, reads one message, and puts the message (and the connection socket if a reply is needed) onto a queue (input_queue_).


// Simplified concept from platform/rdbc/acceptor.cpp
 
void Acceptor::Run() {
  LOG(INFO) << "Acceptor starting...";
  while (IsRunning()) {
    // 1. Wait for and accept a new incoming connection
    // 'socket_' is the main listening socket
    auto client_socket = socket_->Accept();
    if (client_socket == nullptr) {
      continue; // No connection yet, or error
    }
 
    // 2. Read the data (message) from the connected client
    std::unique_ptr<DataInfo> request_info = std::make_unique<DataInfo>();
    int ret = client_socket->Recv(&request_info->buff, &request_info->data_len);
    if (ret <= 0) {
       // Error reading or client disconnected
       client_socket->Close();
       continue;
    }
 
    // 3. Package the client socket and the received data
    std::unique_ptr<QueueItem> item = std::make_unique<QueueItem>();
    item->socket = std::move(client_socket); // Keep socket for potential reply
    item->data = std::move(request_info);
 
    // 4. Push the item onto the input queue for ServiceNetwork's workers
    input_queue_->Push(std::move(item));
  }
}

ServiceNetwork Processing from Queue: Worker threads in ServiceNetwork then pick items from this queue.


// Simplified concept from platform/networkstrate/service_network.cpp
 
void ServiceNetwork::InputProcess() {
  LOG(INFO) << "ServiceNetwork worker starting...";
  while (IsRunning()) {
    // 1. Wait for and pop an item from the input queue
    // (This item was put there by the Acceptor)
    std::unique_ptr<QueueItem> item = input_queue_->Pop(1000); // Wait up to 1s
    if (item == nullptr) {
      continue; // No message received in the timeout period
    }
 
    // 2. Extract the socket and data
    auto client_socket = (item->socket == nullptr) ? nullptr : std::move(item->socket);
    auto request_data = std::move(item->data);
 
    // 3. Create a context (includes the client connection info)
    std::unique_ptr<Context> context = std::make_unique<Context>();
    if(client_socket != nullptr) {
       // Wrap the socket in a NetChannel for potential replies
       context->client = std::make_unique<NetChannel>(std::move(client_socket), true);
    }
 
    // 4. Pass the context and the received data to the actual service logic
    if (request_data) {
       // 'service_' is the ServiceInterface implementation provided during setup
       service_->Process(std::move(context), std::move(request_data));
    }
  }
}

This shows the hand-off: Acceptor receives raw data and puts it on a queue. ServiceNetwork’s worker threads take items from the queue and pass them to the service_ object (which implements ServiceInterface) for actual application-level processing.

AsyncAcceptor (More Complex, Non-Blocking): Uses asynchronous I/O (like Boost.Asio) to handle many connections concurrently without blocking threads per connection. When data arrives, it calls a callback function (AcceptorHandler in ServiceNetwork) which then usually puts the data onto the input_queue_. The end result for InputProcess is similar.

Conclusion

You’ve now seen the vital communication layer of ResilientDB!

We learned that ReplicaCommunicator acts like the postal service, responsible for sending messages (like proposals and votes) between the different replica computers. It knows the addresses and can broadcast or send targeted messages.
We saw that ServiceNetwork acts like the telephone network operator, responsible for listening for and receiving incoming messages, whether from clients or other replicas. It passes these messages on for processing.
These two components work together, using lower-level network tools, to allow the distributed replicas of ResilientDB to coordinate effectively.

So, the replicas can now send and receive messages reliably. But what exactly are they saying to each other? How do they use these messages to agree on the order of transactions and the state of the database? That’s the magic of consensus, and it’s the topic of our next chapter!

Next: Chapter 3: Consensus Management

Generated by AI Codebase Knowledge Builder