Testing Allocation Failures

Introduction

It is a common understanding that testing is a necessary part of any significant software development effort. The exact focus and the level of attention that is put on testing will vary between projects and between development teams for many reasons, including both technical and cultural ones. Opinions on the relevance of testing range from complete devotion on one side - leading to test-oriented methodologies like Test Driven Development (TDD) - to more reserved positions that are rooted in the practical observation that testing cannot completely prove the absence of bugs and relying on it often leads to false sense of security (see Mythical Code Coverage for more discussion on this). Still, some testing techniques are not only useful, but also technically interesting on their own.

The intent of this article is to present the technique that can be used to test and assess the robustness of programs in the presence of allocation failures. This is something that is inherently difficult to test using the "standard" testing methods, as they are typically focused on successful program paths and do not really address the failure-related branches that are often hidden and not even visible in the source code.

The question is: How is your program dealing with memory allocation failures? Do such failures lead to corrupted or inconsistent data structures or to undefined behavior?

The testing technique presented below allows to verify, to a large extent, that the program can behave reasonably when faced with limited memory conditions.

An example unit test

Consider this simple program:

#include <set>
#include <string>
#include <vector>

using namespace std;

vector<string> unique_sort(const vector<string> & v)
{
    set<string> s(v.begin(), v.end());
    return vector<string>(s.begin(), s.end());
}

void unit_test()
{
    vector<string> v1;
    v1.push_back("John");
    v1.push_back("Alicia");
    v1.push_back("Bob");
    v1.push_back("John");
    v1.push_back("Bob");

    vector<string> v2 = unique_sort(v1);
    assert(v2.size() == 3);
    assert(v2[0] == "Alicia");
    assert(v2[1] == "Bob");
    assert(v2[2] == "John");
}

int main()
{
    unit_test();
}

The above code contains a utility function - unique_sort - which is supposed to sort an array of strings and remove any duplicates, and a simple unit test that checks whether the function behaves as expected. Many such unit tests are possible, but they are not necessary here. The program works without triggering any assertions, which gives some level of confidence that the utility function is in fact correct.

Memory allocation failures

The above utility function looks good and it works properly - but does it work properly always and in every imaginable environment?

It is interesting to observe that many software systems fail at run-time even though they have been "exhaustively" tested (this can include very high level of code coverage in unit tests) due to the fact that the development and test environment differed from the target one - and the most typical problem found in embedded systems is the challenge of limited memory, where memory allocations might occasionally fail.

What will happen if there is not enough memory for the utility function above to complete? Obviously, the function will not be able to complete its task.

Will it kindly propagate the problem report up so that its users can decide how to proceed? This would be the preferred behavior.
Or will it crash after trying to dereference some null pointer or fall into other form of undefined behavior?

These questions cannot be answered with simple unit tests like the one above. This is because the interesting part of the code is actually hidden and it might be also just invisible thanks to high-level properties of the programming language. The utility function has only two lines of code (which are easy to cover by standard unit tests), but obviously the part that is interesting from the point of view of robustness is somewhere else. But if it is not even visible, then how it can be tested?

Fortunately, the C++ programming language (and several others like Ada) provide a way to instrument the memory allocation scheme and this can be used to test the behavior of any part of the program with simulated memory allocation failures. This instrumentation is based on custom implementation of memory allocation routines - in C++ this is done by overloading the new operator.

The following code adds the necessary instrumentation in order to measure how many memory allocations in total are needed for the unit test shown above:

#include <iostream>
// ...

using namespace std;

bool allocator_instrumented = false;
size_t allocation_counter = 0;

void * operator new(size_t requested_size) throw(bad_alloc)
{
    if (allocator_instrumented)
    {
        ++allocation_counter;
    }

    return malloc(requested_size);
}

void operator delete(void * ptr) throw()
{
    free(ptr);
}

// unique_sort and unit_test as before
// ...

int main()
{
    allocator_instrumented = true;
    unit_test();
    allocator_instrumented = false;

    cout << allocation_counter << " allocations\n";
}

The above extension to the previous example instruments the standard allocator by adding a simple counter and a flag to constrain it at run-time. On some particular implementation the above program prints:

13 allocations

This means that the whole unit test requires 13 memory allocations, executed as calls to standard new operator, to succeed.
The testing challenge here is to verify if the unit test will survive an allocation failure - obviously the test will not succeed in terms of normal execution, but some graceful error report should be expected instead of random crash.

It is actually quite easy to provoke allocation failure in place of any of the 13 allocations that are needed for the whole test. The following code is a fully automated test that performs two steps:

measure the total number of allocations that are needed for the test to succeed
run the test again, repeatedly, with provoked failure at the first, second, third, and so on allocation

// ...

bool allocator_instrumented = false;
size_t allocation_counter = 0;
size_t allocation_failure_position;

void * operator new(size_t requested_size) throw(bad_alloc)
{
    if (allocator_instrumented)
    {
        ++allocation_counter;
        if (allocation_counter == allocation_failure_position)
        {
            // simulate failure
            throw bad_alloc();
        }
    }

    return malloc(requested_size);
}

// operator delete, unique_sort and unit_test as before
// ...

int main()
{
    // measure the successful test run:
    allocator_instrumented = true;
    unit_test();

    size_t total_allocations_needed = allocation_counter;

    cout << "total allocations for successful test: "
        << total_allocations_needed << endl;

    // repeat the test with provoked failure on each "position"
    for (size_t i = 1; i <= total_allocations_needed; ++i)
    {
        cout << "testing failure at " << i << endl;

        allocation_counter = 0;
        allocation_failure_position = i;
        try
        {
            unit_test();
        }
        catch (const bad_alloc & e)
        {
            // OK, this was provoked and expected
        }
    }
}

The above program runs to completion without crashing and without exhibiting any undefined behavior, which gives some level of confidence that the tested utility function is robust in the presence of memory allocation failures. It does not leave the program in any inconsistent state and gracefully reports allocation failures to higher layers, where the report can be properly handled.

Limitations

The above technique is quite simple and for the same reason also limited in scope. In particular, the allocator instrumentation targets only the standard new operator and leaves other allocation schemes (like lower-level malloc) somewhat "under the radar". There are ways to work around this limitation by providing instrumented language run-time or replacing the memory allocator altogether, but the cleanest method of addressing this issue is by proper planning at the beginning of the project, where a dedicated project-wide allocation function can be provided in order to facilitate this kind of instrumentation.

This technique has been extensively used to verify the YAMI4 library and to ensure its robust behavior even in the presence of arbitrary memory allocation failures. The complete test was much more complex than the above example mainly due to the fact that the original set of tests involved interactions between many threads of execution, but the general idea presented above is a good basis for customized testing solution.