Inspirel banner

Programming Distributed Systems with YAMI4

3.1 Distributed Systems Are Dynamically Typed

It seems to be a widely accepted fact that statically typed languages are less error-prone and are more assisting in finding common programming errors related to data manipulations. In fact, those languages that are known as ``system-programming languages'' tend to be statically typed - this is even the case for all the languages that are currently supported in YAMI4.

The static type systems provide many benefits, but there is one important condition that must be met for them to be useful: all components that take part in data manipulation or data exchange have to be consistent in the sense that they must use the same definitions of all involved data structures.

The consistency between components is easy to achieve if the whole system is built with a tool that is able to check all artifacts against the given set of definitions. Compilers and linkers can do that reasonably well for a single program - even if the program is logically made of many components (modules, classes, functions, etc.), there is always some way to ensure that the same data structure definitions are used everywhere. Most importantly, when the structure definition is modified, the tool-chain can ensure that relevant components are updated to take these modifications into account.

The development workflow that was invented for single programs is rarely practical in distributed systems, where various components of the system are themselves programs. Even though each such program can be self-consistent with regard to the data structures that are in use, ensuring this consistency for the whole system is much more challenging.

An obvious solution is to try to scale up the same approach that worked for single programs - that is, to find a tool that will overview the building phase of the whole system and ensure that a single definition of all involved data structures will be used by all components.

Many communication frameworks are designed around this concept with some form of Interface Definition Language (IDL) as a vehicle for defining data structures and invocation signatures that are used for component interactions. Depending on the actual goal the IDL can be either language-neutral to allow its use with different programming languages - CORBA IDL is a popular example - or it can be actually a subset of the chosen programming language for mono-lingual systems - Java RMI or Ada Distributed Systems Annex use this approach. In any case, the IDL specification files are just extensions to the build process that is still conceptually valid as long as the build phase of the whole system can be managed as a single activity.

The problem with distributed systems is that this single-phase-build assumption is very often not true - and the bigger the system is in terms of its scale of deployment, the less likely it is that the build procedures for the whole will be managed centrally.

Not only it is difficult to control all possible applications in a distributed system, but in practice it might be even impossible to convince all interested parties to upgrade their software at the same time.

A simple and at the same time the most spectacular example of how challenging this might be is the World Wide Web with its several versions of HTML (and other) standards on the server side and several versions of browsers on the client side - obviously, even if there was conceptually only one ``specification'' of the web, the reality would involve the coexistence of different history snapshots of that specification. This unavoidable coexistence of inconsistent descriptions is what violates the simple single-phase-build assumption for distributed systems of any non-trivial size.

A reasonable solution to this problem is to be permissive in handling of messages and data structures. This means that the components that receive messages and data from their remote counterparts have to be flexible in accepting and interpreting the message content.

With statically typed system this solution usually has the form of general types or unions that can act as containers for whatever data is needed. It is also not uncommon to see a similar approach taken with regard to message names, where a single ``statically typed'' function has a string parameter describing the actual action to take - in Java terms a combination of these two approaches might in the extreme case look like this:

interface Server {
    Object execute(String actionName, Object[] parameters);
}

This example might look extreme, but in fact it is not even artificial - real-life interfaces exist with similar signatures.

Obviously, the above is just a poor-man attempt to build a dynamic type system on top of the static one.

YAMI4 recognizes this problem and admits that well-supported dynamic type system is more appropriate as a foundation for a messaging framework. It is always possible to build a static system on top of dynamic one with the help of code generators, but recovering flexibility from the static framework is much more difficult. This approach is implemented in YAMI4 in the sense that the application designer and developer has two layers of type support: