Inter-component communication

Genode provides three principle mechanisms for inter-component communication, namely synchronous remote procedure calls (RPC), asynchronous notifications, and shared memory. Section Synchronous remote procedure calls (RPC) describes synchronous RPC as the most prominent one. In addition to transferring information across component boundaries, the RPC mechanism provides the means for delegating capabilities and thereby authority throughout the system.

The RPC mechanism closely resembles the semantics of a function call where the control is transferred from the caller to the callee until the function returns. As discussed in Section Client-server relationship, there are situations where the provider of information does not wish to depend on the recipient to return control. Such situations are addressed by the means of an asynchronous notification mechanism explained in Section Asynchronous notifications.

Neither synchronous RPC nor asynchronous notifications are suitable for transferring large bulks of information between components. RPC messages are strictly bound to a small size and asynchronous notifications do not carry any payload at all. This is where shared memory comes into play. By sharing memory between components, large bulks of information can be propagated without the active participation of the kernel. Section Shared memory explains the procedure of establishing shared memory between components.

Each of the three basic mechanisms is rarely found in isolation. Most inter-component interactions are a combination of these mechanisms. Section Asynchronous state propagation introduces a pattern for propagating state information by combining asynchronous notifications with RPC. Section Synchronous bulk transfer shows how synchronous RPC can be combined with shared memory to transfer large bulks of information in a synchronous way. Section Asynchronous bulk transfer - packet streams combines asynchronous notifications with shared memory to largely decouple producers and consumers of high-throughput data streams.

\clearpage

Synchronous remote procedure calls (RPC)

Section Capability invocation introduced remote procedure calls (RPC) as Genode's fundamental mechanism to delegate authority between components. It introduced the terminology for RPC objects, capabilities, object identities, and entrypoints. It also outlined the flow of control between a client, the kernel, and a server during an RPC call. This section complements Section Capability invocation with the information of how the mechanism presents itself at the C++ language level. It first introduces the layered structure of the RPC mechanism and the notion of typed capabilities. After presenting the class structure of an RPC server, it shows how those classes interact when RPC objects are created and called.

Typed capabilities

Figure 1

Layered architecture of the RPC mechanism

Figure 1 depicts the software layers of the RPC mechanism.

Kernel inter-process-communication (IPC) mechanism

At the lowest level, the kernel's IPC mechanism is used to transfer messages back and forth between client and server. The actual mechanism largely differs between the various kernels supported by Genode. Chapter Under the hood gives insights into the functioning of the IPC mechanism as used on specific kernels. Genode's capability-based security model is based on the presumption that the kernel protects object identities as kernel objects, allows user-level components to refer to kernel objects via capabilities, and supports the delegation of capabilities between components using the kernel's IPC mechanism. At the kernel-interface level, the kernel is not aware of language semantics like the C++ type system. From the kernel's point of view, an object identity merely exists and can be referred to, but has no type.

IPC library

The IPC library introduces a kernel-independent programming interface that is needed to implement the principle semantics of clients and servers. For each kernel supported by Genode, there exists a distinct IPC library that uses the respective kernel mechanism. The IPC library introduces the notions of untyped capabilities, message buffers, IPC clients, and IPC servers.

An untyped capability is the representation of a Genode capability at the C++ language level. It consists of the local name of the referred-to object identity as well as a means to manage the lifetime of the capability, i.e., a reference counter. The exact representation of an untyped capability depends on the kernel used.

A message buffer is a statically sized buffer that carries the payload of an IPC message. It distinguishes two types of payload, namely raw data and capabilities. Payloads of both kinds can be simultaneously present. A message buffer can carry up to 1 KiB of raw data and up to four capabilities. Prior to issuing the kernel IPC operation, the IPC library translates the message-buffer content to the format understood by the kernel's IPC operation.

The client side of the communication channel executes an IPC call operation with a destination capability, a send buffer, and a receive buffer as arguments. The send buffer contains the RPC function arguments, which can comprise plain data as well as capabilities. The IPC library transfers these arguments to the server via a platform-specific kernel operation and waits for the server's response. The response is returned to the caller as new content of the receive buffer.

At the server side of the communication channel, an entrypoint thread executes the IPC reply and IPC reply-and-wait operations to interact with potentially many clients. Analogously to the client, it uses two message buffers, a receive buffer for incoming requests and a send buffer for delivering the reply of the last request. For each entrypoint, there exists an associated untyped capability that is created with the entrypoint. This capability and can be combined with an IPC client object to perform calls to the server. The IPC reply-and-wait operation delivers the content of the reply buffer to the last caller and then waits for a new request using a platform-specific kernel operation. Once unblocked by the kernel, it returns the arguments for the new request in the request buffer. The server does not obtain any form of client identification along with an incoming message that could be used to implement server-side access-control policies. Instead of performing access control based on a client identification in the server, access control is solely performed by the kernel on the invocation of capabilities. If a request was delivered to the server, the client has – by definition – a capability for communicating with the server and thereby the authority to perform the request.

RPC stub code

The RPC stub code complements the IPC library with the semantics of RPC interfaces and RPC functions. An RPC interface is an abstract C++ class with the declarations of the functions callable by RPC clients. Thereby each RPC interface is represented as a C++ type. The declarations are accompanied with annotations that allow the C++ compiler to generate the so-called RPC stub code on both the client side and server side. Genode uses C++ templates to generate the stub code, which avoids the crossing of a language barrier when designing RPC interfaces and alleviates the need for code-generating tools in addition to the compiler.

The client-side stub code translates C++ method calls to IPC-library operations. Each RPC function of an RPC interface has an associated opcode (according to the order of RPC functions). This opcode along with the method arguments are inserted into the IPC client's send buffer. Vice versa, the stub code translates the content of the IPC client's receive buffer to return values of the method invocation.

The server-side stub code implements the so-called dispatch function, which takes the IPC server's receive buffer, translates the message into a proper C++ method call, calls the corresponding server-side function of the RPC interface, and translates the function results into the IPC server's send buffer.

RPC object and client object

Thanks to the RPC stub code, the server-side implementation of an RPC object comes down to the implementation of the abstract interface of the corresponding RPC interface. When an RPC object is associated with an entrypoint, the entrypoint creates a unique capability for the given RPC object. RPC objects are typed with their corresponding RPC interface. This C++ type information is propagated to the corresponding capabilities. For example, when associating an RPC object that implements the LOG-session interface with an entrypoint, the resulting capability is a LOG-session capability.

This capability represents the authority to invoke the functions of the RPC object. On the client side, the client object plays the role of a proxy of the RPC object within the client's component. Thereby, the client becomes able to interact with the RPC object in a natural manner.

Sessions and connections

Section Services and sessions introduced sessions between client and server components as the basic building blocks of system composition. At the server side each session is represented by an RPC object that implements the session interface. At the client side, an open session is represented by a connection object. The connection object encapsulates the session arguments and also represents a client object to interact with the session.

Figure 2

Fundamental capability types

As depicted in Figure 1, capabilities are associated with types on all levels above the IPC library. Because the IPC library is solely used by the RPC stub code but not at the framework's API level, capabilities appear as being C++ type safe, even across component boundaries. Each RPC interface implicitly defines a corresponding capability type. Figure 2 shows the inheritance graph of Genode's most fundamental capability types.

Server-side class structure

Figure 3

Server-side structure of the RPC mechanism

Figure 3 gives on overview of the C++ classes that are involved at the server side of the RPC mechanism. As described in Section Capability invocation, each entrypoint maintains a so-called object pool. The object pool contains references to RPC objects associated with the entrypoint. When receiving an RPC request along with the local name of the invoked object identity, the entrypoint uses the object pool to lookup the corresponding RPC object. As seen in the figure, the RPC object is a class template parametrized with its RPC interface. When instantiated, the dispatch function is generated by the C++ compiler according to the RPC interface.

\clearpage

RPC-object creation

Figure 4 shows the procedure of creating a new RPC object. The server component has already created an entrypoint, which, in turn, created its corresponding object pool.

Figure 4

Creation of a new RPC object

The server component creates an instance of an RPC object. "RPC object" denotes an object that inherits the RPC object class template typed with the RPC interface and that implements the virtual functions of this interface. By inheriting the RPC object class template, it gets equipped with a dispatch function for the given RPC interface.

Note that a single entrypoint can be used to manage any number of RPC objects of arbitrary types.
The server component associates the RPC object with the entrypoint by calling the entrypoint's manage function with the RPC object as argument. The entrypoint responds to this call by allocating a new object identity using a session to core's PD service (Section Protection domains (PD)). For allocating the new object identity, the entrypoint specifies the untyped capability of its IPC server as argument. Core's PD service responds to the request by instructing the kernel to create a new object identity associated with the untyped capability. Thereby, the kernel creates a new capability that is derived from the untyped capability. When invoked, the derived capability refers to the same IPC server as the original untyped capability. But it represents a distinct object identity. The IPC server retrieves the local name of this object identity when called via the derived capability. The entrypoint stores the association of the derived capability with the RPC object in the object pool.
The entrypoint hands out the derived capability as return value of the manage function. At this step, the derived capability is converted into a typed capability with its type corresponding to the type of the RPC object that was specified as argument. This way, the link between the types of the RPC object and the corresponding capability is preserved at the C++ language level.
The server delegates the capability to another component, e.g., as payload of a remote procedure call. At this point, the client receives the authority to call the RPC object.

RPC-object invocation

Figure 5 shows the flow of execution when a client calls an RPC object by invoking a capability.

Figure 5

Invocation of an RPC object

The client invokes the given capability using an instance of an RPC client object, which uses the IPC library to invoke the kernel's IPC mechanism. The kernel delivers the request to the IPC server that belongs to the invoked capability and wakes up the corresponding entrypoint. On reception of the request, the entrypoint obtains the local name of the invoked object identity.
The entrypoint uses the local name of the invoked object identity as a key into its object pool to look up the matching RPC object. If the lookup fails, the entrypoint replies with an error.
If the matching RPC object was found, the entrypoint calls the RPC object's dispatch method. This method is implemented by the server-side stub code. It converts the content of the receive buffer of the IPC server to a method call. I.e., it obtains the opcode of the RPC function from the receive buffer to decide which method to call, and supplies the arguments according to the definition in the RPC interface.
On the return of the RPC function, the RPC stub code populates the send buffer of the IPC server with the function results and invokes the kernel's reply operation via the IPC library. Thereby, the entrypoint becomes ready to serve the next request.
When delivering the reply to the client, the kernel resumes the execution of the client, which can pick up the results of the RPC call.

\clearpage

Asynchronous notifications

The synchronous RPC mechanism described in the previous section is not sufficient to cover all forms of inter-component interactions. It shows its limitations in the following situations.

Waiting for multiple conditions

In principle, the RPC mechanism can be used by an RPC client to block for a condition at a server. For example, a timer server could provide a blocking sleep function that, when called by a client, blocks the client for a certain amount of time. However, if the client wanted to respond to multiple conditions such as a timeout, incoming user input, and network activity, it would need to spawn one thread for each condition where each thread would block for a different condition. If one condition triggers, the respective thread would resume its execution and respond to the condition. However, because all threads could potentially be woken up independently from each other – as their execution depends only on their respective condition – they need to synchronize access to shared state. Consequently, components that need to respond to multiple conditions would not only waste threads but also suffer from synchronization overhead.

At the server side, the approach of blocking RPC calls is equally bad in the presence of multiple clients. For example, a timer service with the above outlined blocking interface would need to spawn one thread per client.

Signaling events to untrusted parties

With merely synchronous RPC, a server cannot deliver sporadic events to its clients. If the server wanted to inform one of its clients about such an event, it would need to act as a client itself by performing an RPC call to its own client. However, by performing an RPC call, the caller passes the control of execution to the callee. In the case of a server that serves multiple clients, it would put the availability of the server at the discretion of all its clients, which is unacceptable.

A similar situation is the interplay between a parent and a child where the parent does not trust its child but still wishes to propagate sporadic events to the child.

The solution to those problems is the use of asynchronous notifications, also named signals. Figure 6 shows the interplay between two components. The component labeled as signal handler responds to potentially many external conditions propagated as signals. The component labeled as signal producer triggers a condition. Note that both can be arbitrary components.

\clearpage

Figure 6

Interplay between signal producer and signal handler

Signal-context creation and delegation

The upper part of Figure 6 depicts the steps needed by a signal handler to become able to receive asynchronous notifications.

Each Genode component is equipped with at least one initial entrypoint that responds to incoming RPC requests as well as asynchronous notifications. Similar to how it can handle requests for an arbitrary number of RPC objects, it can receive signals from many different sources. Within the signal-handler component, each source is represented as a so-called signal context. A component that needs to respond to multiple conditions creates one signal context for each condition. In the figure, a signal context "c" is created.
The signal-handler component associates the signal context with its entrypoint via the manage method. Analogous to the way how RPC objects are associated with entrypoints, the manage method returns a capability for the signal context. Under the hood, the entrypoint uses core's PD service to create this kind of capability.
As for regular capabilities, a signal-context capability can be delegated to other components. Thereby, the authority to trigger signals for the associated context is delegated.

Triggering signals

The lower part of Figure 6 illustrates the use of a signal-context capability by the signal producer.

Now in possession of the signal-context capability, the signal producer creates a so-called signal transmitter for the capability. The signal transmitter can be used to trigger a signal by calling the submit method. This method returns immediately. In contrast to a remote procedure call, the submission of a signal is a fire-and-forget operation.
At the time when the signal producer submitted the first signal, the signal handler is not yet ready to handle them. It is still busy with other things. Once the signal handler becomes ready to receive a new signal, the pending signal is delivered, which triggers the execution of the corresponding signal-handler method. Note that signals are not buffered. If signals are triggered at a high rate, multiple signals may result in only a single execution of the signal handler. For this reason, the handler cannot infer the number of events from the number of signal-handler invocations. In situations where such information is needed, the signal handler must retrieve it via another mechanism such as an RPC call to query the most current status of the server that produced the signals.
After handling the first batch of signals, the signal handler component blocks and becomes ready for another signal or RPC request. This time, no signals are immediately pending. After a while, however, the signal producer submits another signal, which eventually triggers another execution of the signal handler.

In contrast to remote procedure calls, signals carry no payload. If signals carried any payload, this payload would need to be buffered somewhere. Regardless of where this information is buffered, the buffer could overrun if signals are submitted at a higher rate than handled. There might be two approaches to deal with this situation. The first option would be to drop the payload once the buffer overruns, which would make the mechanism indeterministic, which is hardly desirable. The second option would be to sacrifice the fire-and-forget semantics at the producer side, blocking the producer when the buffer is full. However, this approach would put the liveliness of the producer at the whim of the signal handler. Consequently, signals are void of any payload.

\clearpage

Shared memory

Figure 7

Establishing shared memory between client and server. The server interacts with core's PD service. Both client and server interact with the region maps of their respective PD sessions at core.

By sharing memory between components, large amounts of information can be propagated across protection-domain boundaries without the active involvement of the kernel.

Sharing memory between components raises a number of questions. First, Section Resource trading explained that physical memory resources must be explicitly assigned to components either by their respective parents or by the means of resource trading. This raises the question of which component is bound to pay for the memory shared between multiple components. Second, unlike traditional operating systems where different programs can refer to globally visible files and thereby establish shared memory by mapping a prior-agreed file into their respective virtual memory spaces, Genode does not have a global name space. How do components refer to the to-be-shared piece of memory? Figure 7 answers these questions showing the sequence of shared-memory establishment between a server and its client. The diagram depicts a client, core, and a server. The notion of a client-server relationship is intrinsic for the shared-memory mechanism. When establishing shared memory between components, the component's roles as client and server must be clearly defined.

The server interacts with core's PD service to allocate a new RAM dataspace. Because the server uses its own PD session for that allocation, the dataspace is paid for by the server. At first glance, this seems contradictory to the principle that clients should have to pay for using services as discussed in Section Trading memory between clients and servers. However, this is not the case. By establishing the client-server relationship, the client has transferred a budget of RAM to the server via the session-quota mechanism. So the client already paid for the memory. Still, it is the server's responsibility to limit the size of the allocation to the client's session quota.

Because the server allocates the dataspace, it is the owner of the dataspace. Hence, the lifetime of the dataspace is controlled by the server.

Core's PD service returns a dataspace capability as the result of the allocation.
The server makes the content of the dataspace visible in its virtual address space by attaching the dataspace within the region map of its PD session. The server refers to the dataspace via the dataspace capability as returned from the prior allocation. When attaching the dataspace to the server's region map, core's PD service maps the dataspace content at a suitable virtual-address range that is not occupied with existing mappings and returns the base address of the occupied range to the server. Using this base address and the known dataspace size, the server can safely access the dataspace content by reading from or writing to its virtual memory.
The server delegates the authority to use the dataspace to the client. This delegation can happen in different ways, e.g., the client could request the dataspace capability via an RPC function at the server. But the delegation could also involve further components that transitively delegate the dataspace capability. Therefore, the delegation operation is depicted as a dashed line.
Once the client has obtained the dataspace capability, it can use the region map of its own PD session to make the dataspace content visible in its address space. Note that even though both client and server use core's PD service, each component uses a different session. Analogous to the server, the client receives a client-local address within its virtual address space as the result of the attach operation.
After the client has attached the dataspace within its region map, both client and server can access the shared memory using their respective virtual addresses.

In contrast to the server, the client is not in control over the lifetime of the dataspace. In principle, the server, as the owner of the dataspace, could free the dataspace at its PD session at any time and thereby revoke the corresponding memory mappings in all components that attached the dataspace. The client has to trust the server with respect to its liveliness, which is consistent with the discussion in Section Client-server relationship. A well-behaving server should tie the lifetime of a shared-memory dataspace to the lifetime of the client session. When the server frees the dataspace at its PD session, core implicitly detaches the dataspace from all region maps. Thereby the dataspace will become inaccessible to the client.

Asynchronous state propagation

In many cases, the mere information that a signal occurred is insufficient to handle the signal in a meaningful manner. For example, a component that registers a timeout handler at a timer server will eventually receive a timeout. But in order to handle the timeout properly, it needs to know the actual time. The time could not be delivered along with the timeout because signals cannot carry any payload. But the timeout handler may issue a subsequent RPC call to the timer server for requesting the time.

Another example of this combination of asynchronous notifications and remote procedure calls is the resource-balancing protocol described in Section Dynamic resource balancing.

Synchronous bulk transfer

The synchronous RPC mechanism described in Section Synchronous remote procedure calls (RPC) enables components to exchange information via a kernel operation. In contrast to shared memory, the kernel plays an active role by copying information (and delegating capabilities) between the communication partners. Most kernels impose a restriction onto the maximum message size. To comply with all kernels supported by Genode, RPC messages must not exceed a size of 1 KiB. In principle, larger payloads could be transferred as a sequence of RPCs. But since each RPC implies the costs of two context switches, this approach is not suitable for transferring large bulks of data. But by combining synchronous RPC with shared memory, these costs can be mitigated.

Figure 8

Transferring bulk data by combining synchronous RPC with shared memory

Figure 8 shows the procedure of transferring large bulk data using shared memory as a communication buffer while using synchronous RPCs for arbitrating the use of the buffer. The upper half of the figure depicts the setup phase that needs to be performed only once. The lower half exemplifies an operation where the client transfers a large amount of data to the server, which processes the data before transferring a large amount of data back to the client.

At session-creation time, the server allocates the dataspace, which represents the designated communication buffer. The steps resemble those described in Section Shared memory. The server uses session quota provided by the client for the allocation. This way, the client is able to aid the dimensioning of the dataspace by supplying an appropriate amount of session quota to the server. Since the server performed the allocation, the server is in control of the lifetime of the dataspace.
After the client established a session to the server, it initially queries the dataspace capability from the server using a synchronous RPC and attaches the dataspace to its own address space. After this step, both client and server can read and write the shared communication buffer.
Initially the client plays the role of the user of the dataspace. The client writes the bulk data into the dataspace. Naturally, the maximum amount of data is limited by the dataspace size.
The client performs an RPC call to the server. Thereby, it hands over the role of the dataspace user to the server. Note that this handover is not enforced. The client's PD retains the right to access the dataspace, i.e., by another thread running in the same PD.
On reception of the RPC, the server becomes active. It reads and processes the bulk data, and writes its results to the dataspace. The server must not assume to be the exclusive user of the dataspace. A misbehaving client may change the buffer content at any time. Therefore, the server must take appropriate precautions. In particular, if the data must be validated at the server side, the server must copy the data from the shared dataspace to a private buffer before validating and using it.
Once the server has finished processing the data and written the results to the dataspace, it replies to the RPC. Thereby, it hands back the role of the user of the dataspace to the client.
The client resumes its execution with the return of the RPC call, and can read the result of the server-side operation from the dataspace.

The RPC call may be used for carrying control information. For example, the client may provide the amount of data to process, or the server may provide the amount of data produced.

Asynchronous bulk transfer - packet streams

The packet-stream interface complements the facilities for the synchronous data transfer described in Sections Synchronous remote procedure calls (RPC) and Synchronous bulk transfer with a mechanism that carries payload over a shared memory block and employs an asynchronous data-flow protocol. It is designed for large bulk payloads such as network traffic, block-device data, video frames, and USB URB payloads.

Figure 9

Life cycle of a data packet transmitted over the packet-stream interface

As illustrated in Figure 9, the communication buffer consists of three parts: a submit queue, an acknowledgement queue, and a bulk buffer. The submit queue contains packets generated by the source to be processed by the sink. The acknowledgement queue contains packets that are processed and acknowledged by the sink. The bulk buffer contains the actual payload. The assignment of packets to bulk-buffer regions is performed by the source.

A packet is represented by a packet descriptor that refers to a portion of the bulk buffer and contains additional control information. Such control information may include an opcode and further arguments interpreted at the sink to perform an operation on the supplied packet data. Either the source or the sink is in charge of handling a given packet at a given time. At the points 1, 2, and 5, the packet is owned by the source. At the points 3 and 4, the packet is owned by the sink. Putting a packet descriptor in the submit or acknowledgement queue represents a handover of responsibility. The life cycle of a single packet looks as follows:

The source allocates a region of the bulk buffer for storing the packet payload (packet alloc). It then requests the local pointer to the payload (packet content) and fills the packet with data.
The source submits the packet to the submit queue (submit packet).
The sink requests a packet from the submit queue (get packet), determines the local pointer to the payload (packet content), and processes the contained data.
After having finished the processing of the packet, the sink acknowledges the packet (acknowledge packet), placing the packet into the acknowledgement queue.
The source reads the packet from the acknowledgement queue and releases the packet (release packet). Thereby, the region of the bulk buffer that was used by the packet becomes marked as free.

This protocol has four corner cases that are handled by signals:

Saturated submit queue: Under this condition, the source is not able to submit another packet and may decide to block. Once the sink observes such a condition being cleared - that is when removing a packet from a formerly saturated submit queue - it delivers a ready-to-submit signal to wake up the source.
Submit queue is empty: Whenever the source places a packet into an empty submit queue, it assumes that the sink may have blocked for the arrival of new packets and delivers a packet-avail signal to wake up the sink.
Saturated acknowledgement queue: Unless the acknowledgement queue has enough capacity for another acknowledgement, the sink is unable to make progress and may therefore block. Once the source consumes an acknowledgement from a formerly saturated acknowledgement queue, it notifies the sink about the cleared condition by delivering a ready-to-ack signal.
Acknowledgement queue is empty: In this case, the source may block until the sink places another acknowledged packet into the formerly empty acknowledgement queue and delivers an ack-avail signal.

If bidirectional data exchange between a client and a server is desired, there are two approaches:

One stream of operations: If data transfers in either direction are triggered by the client only, a single packet stream where the client acts as the source and the server represents the sink can accommodate transfers in both directions. For example, the block session interface (Section Block) represents read and write requests as packet descriptors. The allocation of the operation's read or write buffer within the bulk buffer is performed by the client, being the source of the stream of operations. For write operations, the client populates the write buffer with the to-be-written information before submitting the packet. When the server processes the incoming packets, it distinguishes the read and write operations using the control information given in the packet descriptor. For a write operation, it processes the information contained in the packet. For a read operation, it populates the packet with new information before acknowledging the packet.
Two streams of data: If data transfers in both directions can be triggered independently from client and server, two packet streams can be used. For example, the NIC session interface (Section NIC) uses one packet stream for ingoing and one packet stream for outgoing network traffic. For outgoing traffic, the client plays the role of the source. For incoming traffic, the server (such as a NIC driver) is the source.