Possible Syntax for C++ Lambda

Introduction

This article is a very quick and dirty draft of the idea that was discussed some time ago on comp.lang.c++.moderated and concerning the possibility to add lambda to C++. There are many different approaches to this subject and this article is not a formal language feature proposal - rather a speculation on how the problem can be solved without abandoning the current language culture.

What is lambda

Considering the fact that there is no such thing as lambda in C++ (as defined by the '98 standard), people have different definitions of lambda - much of the heat in discussions comes from this single fact. What is discussed in this article has a broad context and is not limited to functions only - thus, as was pointed out, the term "lambda" could be confusing. Therefore, here's my own definition of an "anonymous entity", covering also lambda functions:

An anonymous entity is the unnamed definition of some language entity, written at the place where it is used.

The language entity can be a type, function, procedure, object, etc. In particular, the anonymous function is meant to be more or less equivalent to what is known as lambda functions in other languages.

In C++, many things already can exist as anonymous entities, defined exactly where they are used, as seen in the following examples:

// temporary, unnamed object
throw MyClass();

// object of unnamed struct
struct
{
    int a, b;
} s;

In both examples above, the language entity (object or type, respectively) could be defined with explicit name and used by that name as well:

MyClass c;
throw c;

struct S
{
    int a, b;
};
S s;

In these examples, using anonymous entity instead of a named one brings the following benefits (not all of them are directly visible above, but relevant examples are easy to write):

the name space is not polluted with the names of entities that are supposed to be used only once
the "locality of reference" in the source code is higher
the code is potentially shorter and more expressive

The existing possibility of using anonymous entities is not really orthogonal. For example, the temporary object cannot be bound to non-const reference, whereas the named, non-const object can. The unnamed struct cannot be used, for example, to define function's return or parameter type, etc. This, however, does not change the concept in general.

Fully orthogonal anonymous entity

The concept of anonymous entity in the language would be fully orthogonal, if the unnamed entity (defined in-place) could be used everywhere where the named entity is allowed. I do not claim that it is fully possible in C++, but I can imagine languages that have this property.

Informal proposal to add anonymous entity to C++

The proposed addition is not supposed to be fully orthogonal, but rather somehow distorted with practical purposes in mind. Two kinds of anonymous entities are proposed:

anonymous function
anonymous class

Anonymous Function

Anonymous function is an unnamed function, defined at the place where a pointer to function is expected or accepted. It has the same syntax as a regular function, but has no name. From the implementation point of view, it should be replaced by the definition of a free function (as if it was defined in an unnamed namespace) with compiler-generated unique name, and that name should be used where the anonymous function itself was used.

Examples:

atexit(
    void () { puts("Good bye!"); }
);

This should be replaced by:

namespace // unnamed
{
    void __some_unique_name()
    {
        puts("Good bye!");
    }
}

// and later:
atexit(__some_unique_name);

Another example:

std::transform(b1, e1, b2, int (int x) { return x + 1; });

More with STL:

std::for_each(b, e, void (int x) { std::cout << x << ' '; });

And more:

std::sort(b, e, bool (int a, int b) { return a < b; });

The anonymous function has the signature, which is a signature of the function that results from adding unique name to the rest of the code.

Anonymous function can use all names of types that are available (or typedefed) where it is written. This allows to use anonymous function within generic functions, where some type names may exist as bound type variables that will be fixed when the containing template function is instantiated. Example:

template <typename T>
void fun(std::vector<T> const &v)
{
    std::for_each(v.begin(), v.end(),
        void () (T const &x) { std::cout << x << ' '; }
    );
}

This should be replaced by:

namespace // unnamed
{
    template <typename T>
    void __some_unique_name(T const &x) { cout << x << ' '; }
}

// and later:
template <typename T>
void fun(std::vector<T> const &v)
{
    std::for_each(v.begin(), v.end(), __some_unique_name<T>);
}

Note the <T> at the end. It is important that this T is the same T as appears in the anonymous function. If there are more types used this way, then the anonymous function should be replaced by the free template function with appropriate number of template parameters.

Performance issue: As was pointed out in the public discussion, rewriting the anonymous function so that it results in a pointer to function may pose performance penalty due to the fact that calls through pointer to function are not likely to be inlined. In order to solve this problem, the anonymous function could have the additional variant, like in:

std::sort(b, e, inline bool (int a, int b) { return a < b; });

(note the inline keyword)

Such anonymous function could be rewritten not as a free function, but as an instance of a class with relevant function call operator, if the context allows to use generic functors instead of requiring pointer to functions only. This is the case with std::sort and such functor class could be used and be a likely candidate for inlining. If this rewrite is not legal (for example, when only pointer to function is accepted, like with std::qsort or any other C function), then normal function with pointer to function should be used.

The alternative solution to this performance problem could be to always rewrite anonymous function to the instance of a class with relevant function call operator (possibly delegating to the static function that actually does the job) and that in addition has a cast operator to the pointer to this function, like here (this is a rewrite for the last example above):

namespace // unnamed
{
    struct __unique_name
    {
        static bool __invoke(int a, int b) { return a < b; }

        // for use as a functor
        bool operator()(int a, int b) { return __invoke(a, b); }

        // for use via pointer to function
        typedef bool PF(int, int);
        operator PF() { return &__unique_name::__invoke; }
    };
}

// and later:
std::sort(b, e, __unique_name());

Above, the std::sort uses the default-initialized instance of the functor class, calling its operator(). This is likely to be inlined. On the other hand, if the same anonymous function was used in the context where only a pointer to function is accepted, then it would be automatically cast to the pointer to the static __invoke function.

Known problem: anonymous function cannot be recursive. This is because there is no name available that could be used to call it again. Some special support is needed for this, or we just agree that if some function needs to be called recursively (by name!) then it should not be unnamed.

Unnamed local functions

There is also a very interesting subject of allowing access to the local variables existing in the scope where the anonymous function is used. Consider:

void foo()
{
    string message = "Good bye!";

    atexit(
        void () { cout << message; }
    );
}

Above, the unnamed function defined in-place as a parameter to atexit is supposed to have access to the variable message that was declared in the enclosing scope. There are languages that support so-called local functions with various solutions to the problem of dangling references (the dangling reference can be created when the local function is used in the place or at the time when the given variable no longer exists) - these range from relying on the garbage collector and keeping objects alive as long as they can be possibly used, to binding the lifetime of functions with that of the scope.

The above example is dangerous, because at the time the unnamed function is called (sometime at the end of the program), the message variable no longer exists and the unnamed function would then refer to non-existent object. That would need to be classified as undefined behaviour and this possibility alone might be the reason to abandon this idea altogether.

Consider, however, the following variant of one of the earlier examples:

void foo()
{
    string separator = " ";

    std::for_each(b, e, void (int x) { std::cout << x << separator; });
}

Above, the unnamed local function that is used by for_each accesses the separator variable that is guaranteed to exist longer than the unnamed function is in use itself. The above example is therefore perfectly safe and does not lead to undefined behaviour - no dangling references are created here.

The real difference between these two examples is in the relation between the time when the function is used and the lifetime of the scope where it was defined. In the first case the unnamed local function is used when the scope where it was declared is already left (and the referenced variable is already destroyed). In the second case the function is used only within the scope where it was defined (so that all referenced variables still exist).

The big question here is whether these two examples should be distinguished at the language level and whether the dangerous variant should be detected at compile-time. In general, that does not seem to be possible. The possible solutions can range from those involving garbage collector (the "Java approach") or more elaborate type system for function pointers (the "Ada approach"). Relying on garbage collector will work only for reference-oriented types, because the relevant objects can then be kept alive as long as necessary - it will not, however, work for fundamental types and those objects which have automatic storage (which are created on the stack). In order to keep the existing language culture the only solutions in C++ would be to either:

disallow access to objects from enclosing scopes - this would be a big loss in functionality,
introduce yet another undefined behaviour opportunity - this would be a reliability misfeature, although not much different from many other parts of the language,
rely on implicit copying of all referenced variables from enclosing scopes - a similar approach was presented in another article, Possible Syntax for C++ threads, where the same problem had to be solved with asynchronous thread branches.

Anonymous Class

Similarly to anonymous function, the anonymous class is an unnamed class definition written at the place where the name of an already defined class is required or accepted. It should be replaced by the definition of a class (as if it was defined in an unnamed namespace) with compiler-generated unique name, and that name should be used where the unnamed class itself was used.

Examples:

int main()
{
    struct { int a, b; } s;
}

It should be replaced by:

namespace // unnamed
{
    struct __unique_name { int a, b; };
}

int main()
{
    __unique_name s;
}

struct { void foo() { puts("foo"); } }().foo();

(note () after anonymous class - this creates a temporary object of the unnamed type)

It should be replaced by:

namespace // unnamed
{
    struct __unique_name { void foo() { puts("foo"); } };
}

// and later:
__unique_name().foo();

Something with STL:

std::sort(b, e,
    struct {
        bool operator()(int a, int b)
        { return a < b; }
    }()
);

(again note () after anonymous class)

Some fun with templates and overloading:

foo(
    struct {
        template <typename T> void bar(T t) { /* ... */ }
        void bar(int i) { /* ... */ }
    }()
);

(again note () after anonymous class)

And why not:

class A
{
    // ...
    struct { int a, b; } pair;
};

or even:

struct base { virtual string what() const = 0; };

// and later:
try
{
    throw struct : base { string what() const { return "Oops!"; } }();
}
catch (base const &e)
{
    std::cerr << e.what() << std::endl;
}

(again note () after anonymous class)

Known problem: it is not possible to use the name of unnamed class in its own definition (this is needed for constructors and destructors). Some special support is needed for this or we just agree that classes with constructors and destructors are not good candidates for anonymous classes.

Unknown problems: it is very likely that the anonymous class does not make much sense in every context where the name of a class could appear.

Note: some of the above examples of anonymous class are already legal in C++. This means that the concept is not really new - it just needs to be consistently extended.

Advantages

The advantages of the above informal proposal are:

It is not really foreign to the language - it just extends the existing mechanisms.
It does not need lots of new grammar and is therefore (at least I imagine it to be) relatively easy to implement - to some extent, the idea is based on text substitution only.
The anonymous function is ready to work with existing, large base of C code, where function pointers are used to implement callbacks (see above example with atexit).
It allows maximum control due to the fact that many things (like types used in anonymous function) are explicit.
It looks like C++, which means that it is easy to learn and adapt. Moreover, anonymous function and anonymous class can be easily moved or refactored to form named entities (or the other way round).

The above points should be also the test-questions for any formal proposal in this area.