[Up: Distributed Systems]
[Previous: High-Level Distribution] [Next: Evaluation]

Middleware

Both low-level and high-level approaches of abstracting distribution are limited in their usability. Low-level distribution platforms like PVM still require the developer to be concerned with many details of the interaction between components, like the synchronization and the packaging of complex data into simple transferable chunks. On the other hand, high-level platforms are, more or less by definition, tightly integrated in a particular programming environment and limited to one programming language. Before a distributed system can be implemented, developers must familiarize themselves with a new language's syntax constructs and programming paradigm.

The middleware approach lies in between. The idea is to extend an existing programming language by introducing a new layer ``in the middle,'' between the application and the network, that hides the complexity of communication and data transfer. To the developer, the invocation of a remote procedure should appear no different than the invocation of a local one. According to figure 2.1, middleware provides the distributed system with both a presentation and a session layer, automating the encoding and transfer of parameters.

Many middleware architectures exist; some in common use today are Remote Procedure Calls (RPC), the Common Object Request Broker Architecture (CORBA), the Distributed Component Object Model (DCOM) and Java Remote Method Invocations (RMI).

Java RMI [14] recycles many of the ideas introduced by RPC and CORBA but is tied to a single programming language. While being widely deployed today, the interest in RMI fades as CORBA is becoming a core feature of Java [69]. DCOM [54] is widespread in the Windows world, but proprietary in design. A comparison of CORBA and DCOM can be found in [44]. The other two, RPC and CORBA, are examined in greater detail.

Remote Procedure Call

Remote procedure calls (RPC) were proposed as early as 1975 [73], their history is documented in [68]. A first implementation appeared from Xerox in 1984 [6]. The following text describes Sun RPC, released by Sun Microsystems, Inc., which is the implementation in most widespread use today. A different RPC mechanism is available as part of the Open Software Foundation's Distributed Computing Environment (DCE RPC).

RPC allows the the implementation of Client/Server distributed systems: clients can connect to a remote server and invoke one of the services provided by the server. On both the client and the server side, the invocation of the remote service appears as a normal procedure call. The client can pass a number of parameters that are transferred to the server side. The service evaluates the parameters and returns a result, which is transferred back and returned as result from the remote procedure call.

**Figure 2.3:** Service Interface Description in the RPC Language
$\begin{figure} \hrule\vspace{3mm} \par\verbatiminput{include/account.x}\par\hrule\par\end{figure}$

In order to use remote procedure calls, the collection of services provided by a server program first has to be described using the RPC Language. Each server is assigned both a name and a unique program number, and all services are declared with their full parameter lists. The RPC Language provides both simple types like octets or integers, and complex types like structures, enumerations or sequences.

The rpcgen program then reads an RPC Language file and produces both declarations and code for the C programming language. A header file contains C type definitions for all the types used in the RPC Language file, and two source code files containing a client stub and a server stub.

Before invoking services, however, client and server have to find each other using the portmapper (more recently called rpcbind), which is a permanently running daemon. The rpcgen-generated server stub code takes care of registering a server with the portmapper upon startup. Clients need to know a server's program number as contained in the RPC Language file and the name of the Internet host the server is running on. They then contact that host's portmapper, to acquire a client handle, which then serves as a server reference.

A client program then uses a local procedure call into the client stub, which provides the same signature^2.1 as the service itself, taking only the client handle as an additional parameter. The client stub transparently communicates the service's parameters to the server program by sending an RPC request. On the server side, this request is extracted by the server stub, which again performs a local procedure call into the user-provided service implementation. The service's result is then returned the same way.

Data is encoded using the External Data Representation [66] standard, the same as in the Parallel Virtual Machine (see section 2.2.2), ensuring interoperability between different hardware architectures.

RPC demonstrates the important middleware concept of separating interface and implementation. From the abstract interface declaration, the middleware generates code that aids in distribution by presenting the application with the usual local procedure call semantics and that automates data transfer and synchronization, performing as the application would expect a local procedure to behave.

Because of its heritage in the Unix environment, rpcgen can only produce stub code for the C programming language. However, the encoding of parameters is known, and the format for RPC messages is well-documented [65], so interoperability with different programming languages, or the implementation over networks other than the original TCP/IP are possible, as demonstrated by PVM.

The Network File System [8] is a successful and widespread example using RPC as distribution platform.

RPC can be used over both TCP and UDP. While the former guarantees exactly-once semantics,^2.2 i.e. the remote procedure is called exactly once, this is not the case with RPC over UDP, where requests are resent by the client stub after a timeout, and invocations use at-least-once semantics unless the server activates a duplicate request cache. NFS uses RPC over UDP for increased efficiency and must therefore use idempotent procedures that can safely be called more than once. A distributed system that does not want to be concerned with execution semantics and idempotence will obviously choose the TCP transport.

One crucial limitation of Sun RPC is its procedure-orientation, which enforces a stateless protocol. A service cannot identify the client [68] and therefore cannot associate state information with a transaction. In a stateful protocol, the state would have to be sent back and forth between client and service as an additional parameter (the Account data in figure 2.3).

Servers can be either single-threaded, where procedure calls on the server side are serialized, or multi-threaded, where requests are dispatched in parallel. Unfortunately, when asked to produce thread-safe code, rpcgen assumes multi-threading on the client side, too. In thread-safe code, the client stubs and services take an additional parameter, so the client-side interface is incompatible with the single-threaded version.

On the client side, remote procedure calls are always synchronous and block until the service's result is received. It is also impossible to mix client and server functionality. While a server can also act as a client and perform remote procedure calls on its own in order to service a request, a client program cannot offer services on its own. The cooperation is therefore strictly limited to concrete service-providers and service-users.

Lastly, services are identified by a number only. This number is chosen in the RPC Language file and hardcoded by the rpcgen program. The developer usually chooses a program number arbitrarily. If a different server uses, by coincidence, the same program number, clients might address a completely different server. It is much easier to avoid name clashes in a namespace using human-readable and descriptive plain-text names.

Despite its shortcomings, RPC is an important member of the middleware family, as it does already use a separate interface declaration and generated stub code while predating other middleware by half a decade.

CORBA

The Common Object Request Broker Architecture [63] by the Object Management Group was first published in 1991. Because CORBA is the focus of this thesis, the entire next chapter is spent on detailing the Object Request Broker, the central part of the architecture. This section presents the basic design of CORBA and compares it with RPC as middleware.

The basic idea of CORBA is much the same as it is for RPC. But whereas RPC is based on a procedural language and allows remote procedure calls, CORBA is based on object-oriented programming and allows remote method invocations on objects. Orfali's definition of an object [44] is rather short and vague: ``Objects are a blob of intelligence,'' expressing that an object provides methods on the outside, but that the internals of an object are unknown. Cardelli and Wegener [9] are more verbose and define that a language is object-oriented if and only if it satisfies the following requirements:

It supports objects that are data abstractions with an interface of named operations and a hidden local state.
Objects have an associated type [class].
Types may inherit attributes from supertypes.

It is important to note that the first and last item combined express polymorphism, that an object of a subtype can substitute an object of its supertype, because all attributes are inherited. The same properties must be expected from an object-oriented distribution platform and are indeed realized in CORBA. The first item addresses one of RPC's weaknesses (see section 2.4.1) by providing stateful services. Servers contain a number of objects that can be addressed individually, and the services that operate on objects only need to be implemented once.

Like RPC, CORBA uses a declarative language, the Interface Definition Language (IDL), to describe an object's interface. As with RPC, this description is used by an IDL compiler to generate stubs for the client side and skeletons on the server side. Using this generated code, remote method invocations look like an invocation on a local object, at least in an object-oriented programming language like C++. Also like RPC, the encoding of parameters is hardware-independent, this time called CDR (Common Data Representation).

CDR differs from XDR in that it allows the data stream to use both big-endian and little-endian encoding for numerical values, so that no conversion is needed if sender and recipient have the same endianness, whereas XDR would always encode numbers on both ends if their hardware uses the ``wrong'' endianness.

The phrase ``Object-oriented RPC'' nicely summarizes CORBA's key ideas, as it was certainly designed with the experience gathered from RPC in mind. Yet it is an unfair understatement, ignoring many features and considerations that do not exist in RPC.

Still, CORBA is based on a Client/Server design. Clients handle objects, but the clients do not need to be objects by themselves: an object-oriented programming language and an object-oriented distribution platform do not imply object-oriented design as well [7].

Remote method invocations cause a request to be sent from the client to the server, and the client usually waits synchronously until the reply is received.

The CORBA location forwarding mechanism provides basic support for coarse-grained server mobility, but a vendor-specific forwarding service (an Implementation Repository) is necessary on the server side to employ the feature [15].

[Previous: High-Level Distribution] [Next: Evaluation]
[Up: Distributed Systems]

Frank Pilhofer
1999-06-23