How Small Is Your Middleware?

Small. Lightweight. Lean.

Certainly, the words "small", "lightweight", "footprint", etc. can have different meanings depending on context, but there are some objective values that can be measured with regard to communication platforms. These are related to network and computing resources that are used by the middleware system in order to deliver the expected service. This article presents some basic measurements that were performed for YAMI4.

YAMI4 is a messaging solution for distributed systems that was designed with control and monitoring systems in mind. Such systems are typically embedded, where processing resources are much smaller than those found in big server boxes. YAMI4 was designed to fit such constrained environments and there are two ways in which this was achieved:

The YAMI4 API was devided into two parts: core and general-purpose. The core part manages the most fundamental communication functionality including connection pool, message queues and frame assembly engine. This layer, even though quite simplified, is perfectly sufficient for use in custom applications. On the other hand, the general-purpose layer builds on top of core services and adds threads management and higher-level interface that makes it somewhat easier to use. This division of API allows to address the limitations of the most constrained environments with the core library while supporting also a higher-level version of the interface where it can be afforded.
The YAMI4 library was designed and implemented without any external dependencies, which allowed to retain complete control on how memory is allocated and what data structures are used. In addition, the core library allows to use a pre-allocated memory block and operate entirely within that block, which means that dynamics of the communication subsystem does not interfere with the other parts of the application.

To put things in practical terms, the measurements described below were performed on 64-bit Linux system with the 1.6.0 version of the YAMI4 library and the example programs that are included in the distribution package.

What to measure?

It appears that it is actually not very straightforward to measure memory consumption on systems that support virtual memory management, but some simple methods can be used to get the idea of how the program is using its memory. On the Linux system, one way to measure memory consumption is to use information that is dynamically created in the /proc filesystem, in particular in the /proc/PID/status file, which contains information about the current usage of virtual memory that was allocated for program data (the VmData field). Another useful way to measure the current memory consumption is to run the program under some monitoring tool like valgrind and forcibly terminate it at the point that we want to measure - this forces the monitor to report a number of memory blocks as not yet deallocated ("leaked"), which basically reflects the amount of memory that the program has allocated for the purpose of creating its own data structures.

For measurements, the calculator example was taken from the YAMI4 distribution, both in the core and the general-purpose versions.

The core version

The calculator example in the core version does not use any additional threads and all activities related to messaging are performed in the context of the main program thread. Arguably, this allows the server program to minimize its memory requirements as there is no need for additional thread stack blocks. In this version, before accepting any incoming message, the YAMI4 calculator server consumes 276kB of virtual memory (this includes also the memory used by the C runtime library) and the valgrind reports... 950 bytes in reachable blocks. Yes, the dynamic data structures used by the idle server are below 1kB.

When the server is in the middle of processing an incoming client message, the reported value for VmData is still 276kB, while valgrind reports 1668 bytes in reachable blocks. Yes, that is still below 2kB for the server that processes a single small client request.

Of course, the run-time data structures do not represent the whole footprint of the library and the size of the binary can be treated as a footprint contributor as well. This amounts to only 260kB and taking into account that YAMI4 has no external dependencies other than the C runtime, it is really just 260kB, no strings attached. As a result, the statically linked calculator server is a lean 121kB binary file. This size corresponds to plain compilation and link steps, which leave debugging symbols in the executable file - if these symbols are not needed, they can be removed with the standard strip command. In the case of calculator server example, stripping reduces the already small executable size to 84kB.

The C++ general-purpose version

The general-purpose API is easier to use due to the fact that the library takes the responsibility for thread management, so that the application code does not have to deal with processing I/O events. The ease of use is an added value, but the price for this is increased use of computing resources - obviously, some additional threads are allocated, and the memory-related consequence of this is that each additional thread has a block of memory generously allocated for its stack region.

The calculator server has 3 threads: the main program thread (that does nothing), the I/O worker thread that processes I/O events and the dispatcher thread that delivers incoming messages to the application. These two additional threads account for the 16MB of reported VmData value - that is, 8MB for each thread. This value can be tuned, but the important thing to note is that this is a virtual memory allocation and does not reflect the actual memory consumption, which is typically much smaller. When the server program is run with valgrind, the reported sizes of reachable memory blocks are 3429 bytes for an idle server and 4646 bytes in the middle of processing the incoming message. Yes, this means that even at the level of general-purpose API, the calculator server consumes less than 5kB of memory for its internal data structures.

The size of library binary for the C++ general-purpose layer is only 380kB - again, there are no hidden dependencies to account for and the statically linked calculator server fits in a 266kB binary file, which can be stripped down to 183kB.

Is it lightweight enough?

It should be clear that with such low memory consumption and the possibility to implement a fully-functional server without additional threads, YAMI4 can easily fit even in the most constrained environments. In practical terms, YAMI4 itself does not contribute to resource requirements on the target platform - the library expects a POSIX-compliant platform and any hardware that is currently POSIX-capable has enough room for YAMI4 as a middleware for fully-functional embedded applications. It should be no surprise, then, that YAMI4 was recently verified to work correctly on Raspberry Pi.

So - how small is your middleware?