What Is Wrong With IDL

Introduction

It seems to be a widely accepted fact that statically typed languages are less error-prone and are more assisting in finding common programming errors related to data manipulations. In fact, those languages that are known as "system-programming languages" tend to be statically typed - this is at least the case for Ada, C++ and Java and static type safety is often cited as an advantage of these languages.

The static type systems provide many benefits, but there is one important condition that must be met for them to be useful: all components that take part in data manipulation or data exchange have to be consistent in the sense that they must use the same definitions of all involved data structures.

The consistency between components is easy to achieve if the whole system is built with a tool that is able to check all artifacts against the given set of definitions. Compilers and linkers can do that reasonably well for a single program - even if the program is logically made of many components (modules, classes, functions, etc.), there is always some way to ensure that the same data structure definitions are used everywhere. Most importantly, when the structure definition is modified, the tool-chain can ensure that relevant components are updated to take these modifications into account.

Problem - it does not scale up for separate programs

The development workflow that was invented for single programs is rarely practical in distributed systems, where various components of the system are themselves programs. Even though each such program can be self-consistent with regard to the data structures that are in use, ensuring this consistency for the whole system is much more challenging.

An obvious solution is to try to scale up the same approach that worked for single programs - that is, to find a tool that will overview the building phase of the whole system and ensure that a single definition of all involved data structures will be used by all components.

Many communication frameworks are designed around this concept with some form of Interface Definition Language (IDL) as a vehicle for defining data structures and invocation signatures that are used for component interactions. Depending on the actual goal the IDL can be either language-neutral to allow its use with different programming languages - CORBA IDL is a popular example - or it can be actually a subset of the chosen programming language for mono-lingual systems - Java RMI or Ada Distributed Systems Annex use this approach. In any case, the IDL specification files are just extensions to the build process that is still conceptually valid as long as the build phase of the whole system can be managed as a single activity.

The problem with distributed systems is that this single-phase-build assumption is very often not true - and the bigger the system is in terms of its scale of deployment, the less likely it is that the build procedures for the whole will be managed centrally.

Not only it is difficult to control all possible applications in a distributed system, but in practice it might be even impossible to convince all interested parties to upgrade their software at the same time when the modifications are introduced to the interface descriptions.

A simple and at the same time the most spectacular example of how challenging this might be is the World Wide Web with its several versions of HTML (and other) standards on the server side and several versions of browsers on the client side - obviously, even if there was conceptually only one "specification" of the web, the reality would involve the coexistence of different history snapshots of that specification. This unavoidable coexistence of inconsistent descriptions is what violates the simple single-phase-build assumption for distributed systems of any non-trivial size.

What might be the practical manifestation of this problem? The author has evaluated one of the many commercial communication products that use IDL as the vehicle for enforcing the static type safety and for the sake of experiment prepared a client-server pair of programs that used inconsistent IDL description for the exchanged data structure. During the test the server program ordinarily crashed due to memory allocation exception in its marshalling routines. Granted, this is largely a quality of implementation issue that might be handled much better by the communication framework, but in this particular case a small IDL inconsistency was enough to bring the server down. The inconsistency involved different order of fields in the data structure and as such need not even be a trivial target of the malicious attack - such an incostistency could be easily introduced in the system as a deployment mistake.

What is particularly striking in this example is that what was supposed to prevent errors actually created a severe security and reliability hole.

Distributed systems are dynamically typed

A reasonable solution to this problem is to be explicitly permissive in handling of messages and data structures. This means that the components that receive messages and data from their remote counterparts have to be flexible in accepting and interpreting the message content.

With statically typed system this solution usually has the form of general types or unions that can act as containers for whatever data is needed. It is also not uncommon to see a similar approach taken with regard to message names, where a single "statically typed" function has a string parameter describing the actual action to take - in Java terms a combination of these two approaches might in the extreme case look like this:

interface Server {
    Object execute(String actionName, Object[] parameters);
}

This example might look extreme, but in fact it is not even artificial - real-life interfaces exist with similar signatures.

Obviously, the above is just a poor-man attempt to build a dynamic type system on top of the static one. But is this a proper approach?

It can be argued that a real solution to this problem is to openly recognize the fact that statically-typed approaches do not work very well with the distributed scale of deployment and to admit that well-supported dynamic type system is more appropriate as a foundation for a messaging framework. Interestingly, it is always possible to build a static system on top of dynamic one with the help of code generators, but recovering missing flexibility (or even reliability!) from the static framework is much more difficult.