Avoiding Destruction Races

Introduction

One of the outstanding problems that can be found in many C++ libraries is the impact of the library API on the vulnerabilities related to object lifetime management.

Imagine the following library API pattern:

class Resource
{
public:
    void doSomething();
};

class ResourceManager
{
public:
    Resource * locateResource();
};

What is important in the above example is the combination of two API assumptions:

The resource manager has the ultimate control over the lifetime of its resources.
The manager leases the reference to its internally managed resources to the user code.

This API pattern has several variants, including also the case where the library calls the user code via some callback interface and provides the resource reference as a parameter, for example:

class Callback
{
public:
    virtual void callMe(Resource * r) = 0;
};

The above API patterns can be found in many libraries encapsulating interactions with external resources - database libraries, connection pools, networking and even GUI frameworks are all good candidates. Interestingly, even the C++ standard library contains a variant of this scheme in its handling of iterators as references to data structure nodes managed by containers. In all these cases the common characteristic of the API is that there is a manager object that deals with the lifetime of some entities and the low-level reference to those entities is leased to the user code with no restrictions on the usage scope.

The problem

The problem is that in such API pattern it is very difficult to establish a link between the manager object and the user code that would allow the two to share the common view on how long the given resource is valid.

In the case of STL containers and their iterators, the problem is "solved" by convention and precise documentation of when the given iterator might become invalid - leaving it up to the programmer to be careful enough not to violate the rules. The problem is that the potential violation leads to the failure that is usually non-recoverable at the program level - that is, when so called undefined behavior leads the program either to the data corruption or to the immediate death.

A similar problem appears in just about any other library that relies on this API scheme, as there is no way for the user code to figure out whether the referenced resource is still alive or not.

This problem is relatively easy to manage in the single-threaded code, but can become a real challenge in multi-threaded systems, where the decision to deactivate a given resource can be taken autonomously by the manager object with no relation to the actions performed by the user code. A possible outcome is that the manager object decides to deactivate the given resource just when the user code is about to use it - a kind of problem that is particularly difficult to reproduce and debug.

This kind of problem can be described as a destruction race as it relates to the unpredictable outcome of the resource destruction in the multi-threaded context where other threads refer to that resource and are likely to use it.

The main source of the problem is, however, quite simple: it is the fact that the user code is given unrestricted access to the resource by means of some low-level reference.

It might be interesting to note that managed languages with built-in garbage collectors help to solve this problem by avoiding the undefined behavior aspect of using the deactivated resources in such API schemes. Even though the notion of resource invalidation is still a problem (after all, the garbage collector has no ability to keep a broken network connection alive, for example), the attempt to use such resources can be turned into a recoverable failure.

Use descriptors instead of low-level references.

The solution to this problem in C++ (and other languages with similar resource management approaches) can be very simple. The undefined behavior is a direct result of unrestricted use of low-level references (this context includes pointers) - so perhaps by removing the low-level references from the API the library can avoid the risk of triggering the undefined behavior.

The idea of replacing low-level references with something more controllable is not new and can have various forms. One of them is a reference-counting of resources, where both the library and the user code are collectively responsible for the resource lifetime management with the use of smart pointers or similar constructs. This is not a bad idea, although in this approach the resource itself needs to be extended to cover also the additional state of being "alive but useless", so that any usage attempt after resource invalidation can be properly recognized and reported. Not every resource can be easily adapted to this approach and in some cases the additional proxy object can be required to accommodate that additional state.

Another way of handling this problem is with the use of resource descriptors.

A resource descriptor is not a reference - not even a "fat" one. It is a kind of lightweight token that is given to the user code so that it can later identify the resource in question by presenting that token whenever some operation is requested.

The resource descriptor is an idea that is heavily used in Unix-like systems to manage interactions with files, sockets and other external resources. As such, the descriptor does not refer to the given resource, but still identifies it - this is what makes it different from any pointer-like construct.

As an easy application of resource descriptors, the first API example above could be transformed in the following way:

class ResourceDescriptor
{
    // some lightweight and copyable content
};

class Manager
{
public:
    // find the resource
    ResourceDescriptor locateResource();

    // operate on the resource
    void doSomething(ResourceDescriptor res);
};

It should be noted that the resource as an independent entity disappeared from the API completely and its functionality was incorporated into the manager interface. In other words, the actions are no longer performed on the resource directly - rather, the manager object is asked to operate on the given resource on behalf of the user. This is an extremely important modification that allows to fully encapsulate the resource management within the manager object.

The fact that the resource as an entity is no longer accessible directly from the user code means that the manager object can freely decide when to deactivate (or even reactivate) the given resource, which in the multi-threading context means that all related synchronization is encapsulated in a single place. There is no risk of destroying something that is just about to be used, as both the destruction and the usage attempt can be properly synchronized. The only remaining burden is to discover the attempt to use the resource that is no longer valid, which can be easily done in the same encapsulated context without the risk of triggering the undefined behavior. This allows to turn the dreaded destruction race into a recoverable error - a very welcome improvement.

Descriptor's anatomy

What should constitute a resource descriptor?

There are lots of options and the only design goal is that the descriptor should be relatively lightweight to support return by value and copy semantics.

The example of Unix file descriptors is particularly lean - it is difficult to invent anything lighter than an integer value, although integers as indices into an internally managed array have a serious drawback: they do not protect against accidental reuse. In other words, any given descriptor value might accidentally happen to be equivalent to some other descriptor value created in the future and this fact makes the error detection more difficult.

A simple technique that prevents the accidental reuse of invalidated descriptors is to extend them with some form of version information. This can be as simple as a sequentially generated number that is matched with the current content of the internally managed array. A simple resource descriptor that acts as a versioned array index can be implemented in the following way:

class ResourceDescriptor
{
public:
    // ...

private:
    int resourceIndex;  // index into the manager's internal array
    long long version;  // resource version number
};

Such a resource descriptor has the following (hopefully obvious) lifetime pattern:

When the user asks for the resource, the manager creates appropriate descriptor identifying the internally managed resource together with is current version. This descriptor is passed to the user.
The user provides the given descriptor to the manager each time some operation is requested. The manager uses the descriptor's fields to figure out which resource is being referred to and to verify whether the descriptor is not older than the resource itself.
Whenever the manager decides to deactivate any given resource, it marks its array slot with a different (possibly just increased) version number, so that all existing descriptors that identify that resource become "out of date" and therefore invalid.

There are several implementation details that can influence the design of the resource descriptor, such as whether it should be immutable or how to provide access to its fields so that the manager can read them. These considerations have no impact on the general workings of the descriptor, except that only the resource manager should be allowed to create and set new descriptor values.

An important issue is, however, the capacity of the version information. A 64-bit value makes it practically impossible for any program to exhaust its capacity by repeated resource renewal - especially when the resource itself is coupled with some external or I/O entities.

Performance considerations

Obviously, replacing low-level direct pointers with additional lookup via descriptors might have an impact on performance of the individual operations. The severity of this impact, however, depends on the cost of these operations.

The performance impact can be particularly visible when the resource usage is very cheap - for example replacing plain STL iterators with version-aware node descriptors can have a visible influence on the application that performs lots of node accesses. In such applications the potential loss of performance is a price that has to be paid for safety. On the other hand, for resources that are associated with physical external entities (database connections, network channels, etc.) the use of resource descriptor instead of plain low-level resource pointer is a relatively cheap way to improve program safety with no measurable impact on performance.

Who is using it?

The concept of the resource descriptor is not new.

As already mentioned, descriptors are heavily used in Unix-like systems to manage access to files, sockets and other similar resources, although the provisions for error detections are quite weak due to the fact that file descriptors are quite "dumb".

Node descriptors instead of plain pointers are used in the Ada containers library, where cursors are used to identify container entries with strong provisions for version control, so that the concept of invalid cursor can be implemented in a safe way.

Resource descriptors are also used in the YAMI4 core libraries, where they isolate users from the internal lifetime management of the communication channels.