Projection objects for the STL

20 May 2023

By now, everyone is familiar with the functions under algorithm, such as std::sort or std::max_element, and using them is very straightforward:

1std::vector<int> v{1,4,6,2,6,9,3};
2std::sort(v.begin(), v.end());
3auto m = std::max_element(v.begin(), v.end());

Functions that operate on containers (specifically, iterator pairs) also usually take an extra argument, Compare, that determines the way the pair-wise element comparisons for std::max_element, std::unique, etc., are made:

std::sort(v.begin(), v.end(), [](int x, int y) { return x > y; });

or, with operator objects:

std::sort(v.begin(), v.end(), std::greater<>{});

If not defined, the Compare parameter defaults to < (std::less<>). This works fine for simple types that the standard library is aware of, but if we have our own, more complicated types, it quickly becomes a lot more cumbersome.

Say we have the class Test:

1class Test
2{
3    double a_;
4    int b_;
5public:
6    Test(double a, int b) : a_{a}, b_{b} { }
7    double getA() const { return a_; }
8    int    getB() const { return b_; }
9};

Sorting by a single field in Test is a chore:

std::sort(v.begin(), v.end(), [](const Test& t1, const Test& t2) {
    return t1.getA() < t2.getA();
});

Can we do better? What I really want is to be able to just specify the accessor function and let the comparator call it.

Range algorithms like std::ranges::upper_bound, std::ranges::sort, std::ranges::max_element, etc., do something like this with an explicit projection argument:

std::ranges::sort(v, std::less<>{}, &Test::getA);

This has a couple of problems though: 1) they are not usable outside of range algorithms, 2) due to the parameter order, you have to specify the comparator function even if you just want to change the accessor.

Can we create a projection function that takes an accessor, which, for now, should be a pointer to a member function, and returns a comparator function that we can use with C++11 algorithms?

Test-specific

Of course we can; it’s not too difficult either.

1template <auto Projection>
2auto projection()
3{
4    return [](const Test& x1, const Test& x2) {
5        return (x1.*Projection)() < (x2.*Projection)();
6    };
7}

std::sort(v.begin(), v.end(), projection<&Test::getA>());

A couple of points:

We’re using pointers to member functions, which are called using a different operator: .*.
auto used as a template parameter binds to values, not types. The type of a pointer to a member function (ReturnType (Class::*) (ParameterList) CVQualifiers) is different from a pointer to a regular function (ReturnType (*) (ParameterList)), so we can’t create a type pattern to put as a template parameter to bind to the object; we need to use auto to bind to the pointer to a member function directly.
The accessor is passed as a template argument, so it’s determined at compile-time. This gives us type-checking, ensuring the compatibility between the accessor function and the container’s value type. The disadvantage is, of course, lack of dynamism. Template arguments need to be constant expressions, which can’t be mutated in runtime-space.
The lambda’s parameters are const l-value references since we don’t want Projection to be able to change the container’s object it’s operating on. This may contaminate our class with const member functions, but we should require this in order to keep the invariants of whichever algorithm we’re using projection for (sort, max_element, etc.).

Generalization

Of course, there are many things we can do immediately to generalize things. Straightaway, we can use a generic lambda, replacing Test by auto. The lambda should accept a const l-value reference instead of a forwarding reference (auto&&), since, as above, we don’t want the possibility of mutation. We can also generalize the comparator function: we should accept a comparator object, defaulting to std::less<>, allowing the user to provide their own comparator:

1template <auto Projection, typename Comparator = std::less<> >
2auto projection()
3{
4    return [](const auto& x1, const auto& x2) {
5        return Comparator{}((x1.*Projection)(),
6                            (x2.*Projection)());
7    };
8}

std::sort(v.begin(), v.end(), projection<&Test::getA, std::greater<>>());

Why restrict ourselves to pointers to member functions though? It’s not hard to imagine situations were we would like to call standalone functions. Say I have a distance function defined for Test:

double distance(const Test& t1, const Test& t2) {
    return std::hypot(t2.getA() - t1.getA(), t2.getB() - t1.getB());
}

I would like to be able to do this:

Test tx{1.0,1};
std::max_element(v.begin(), v.end(), projection<distance>(tx));

to get the vector’s element with the greatest distance to a second object, tx.

Functions are objects, so they bind to auto. Of course, our standalone functions (and even our member functions, although it’s harder to imagine this with a getter) can have take multiple arguments — and, in the case of distance, they take an extra argument (t2) in addition to the object, in the container, that we’re calling the function on (t1) — so we need to add an extra variadic template parameter to our lambda-returning function.

Our current syntax poses a problem, however: if Projection is a pointer to a member function (of an object x, taking arguments ... args), then it has to be called like (x.*Projection)(args ...); if it is a pointer to a standalone function that takes an x and additional arguments ... args, it has to be called like Projection(x, args ...).

Thankfully, the standard library has a function specifically for this use-case: std::invoke(). Our function then becomes:

1template <auto Projection, typename Comparator = std::less<>, typename ... ProjArgs>
2auto projection(ProjArgs&& ... args)
3{
4    return [... args = std::forward<decltype(args)>(args)](const auto& x1, const auto& x2) {
5        return Comparator{}(
6            std::invoke(Projection, x1, std::forward<decltype(args)>(args) ...),
7            std::invoke(Projection, x2, std::forward<decltype(args)>(args) ...));
8    };
9}

We forward the projection arguments in the lambda capture-list and then forward them again to std::invoke.

This works!

 1std::vector<Test> v{{12.2,2}, {20.4,43}, {3.1,64}, {210.32,6}, {0.11,3}};
 2Test tx{1.0,1};
 3
 4std::sort(v.begin(), v.end(), projection<&Test::getA>());
 5for (const Test& t : v) { std::cout << t << ": " << distance(tx, t) << '\n'; }
 6// prints:
 7// Test(0.11, 3): 2.18909
 8// Test(3.1, 64): 63.035
 9// Test(12.2, 2): 11.2446
10// Test(20.4, 43): 46.264
11// Test(210.32, 6): 209.38
12
13std::sort(v.begin(), v.end(), projection<distance, std::greater<>>(tx));
14for (const Test& t : v) { std::cout << t << ": " << distance(tx, t) << '\n'; }
15// prints:
16// Test(210.32, 6): 209.38
17// Test(3.1, 64): 63.035
18// Test(20.4, 43): 46.264
19// Test(12.2, 2): 11.2446
20// Test(0.11, 3): 2.18909

Conclusion

It’s strange to me why such a function isn’t in the standard library in the first place, especially in such a niche-oriented library such as C++’s. I thought about writing a standard library enhancement proposal, but I figure I’ll just get told “oh, use the range versions”. Still, I hope this is useful to someone.

The full code is on my GitHub.

Addendum: decomposing the pointer to member function type

While I was trying to come up with a pattern to bind to the pointer to member function type, so I could get the class out, I came up with this solution:

 1template <typename Return_, typename Class_, typename ... Args_>
 2struct DecompCTAD
 3{
 4    using Return = Return_;
 5    using Class  = Class_;
 6    using Arguments = std::tuple<Args_ ...>;
 7
 8    template <typename R, typename C, typename ... A>
 9    explicit DecompCTAD(R (C::*) (A ...)) {}
10
11    template <typename R, typename C, typename ... A>
12    explicit DecompCTAD(R (C::*) (A ...) const) {}
13};
14
15template <typename Return, typename Class, typename ... Arguments>
16DecompCTAD(Return (Class::*) (Arguments ...) const) -> DecompCTAD<Return, Class, Arguments ...>;
17
18template <typename Return, typename Class, typename ... Arguments>
19DecompCTAD(Return (Class::*) (Arguments ...)) -> DecompCTAD<Return, Class, Arguments ...>;

Making use of user-defined deduction guides. You use it like this:

using ptr_ret = decltype(DecompCTAD(&Test::sum))::Return;
using ptr_class = decltype(DecompCTAD(&Test::getA))::Class;

This was before I realized you can actually do this the traditional way with SFINAE:

 1template <typename>
 2struct DecompSFINAE {};
 3
 4template <typename Return_, typename Class_, typename ... Args_>
 5struct DecompSFINAE<Return_ (Class_::*) (Args_ ...)> {
 6    using Return = Return_;
 7    using Class  = Class_;
 8    using Arguments = std::tuple<Args_ ...>;
 9};
10
11template <typename Return_, typename Class_, typename ... Args_>
12struct DecompSFINAE<Return_ (Class_::*) (Args_ ...) const> {
13    using Return = Return_;
14    using Class  = Class_;
15    using Arguments = std::tuple<Args_ ...>;
16};

using ptr_ret = DecompSFINAE<decltype(&Test::sum)>::Return;
using ptr_class = DecompSFINAE<decltype(&Test::getA)>::Class;

We need to have a constructor overload and a deduction guide for the CTAD version, or partial specialization for the SFINAE version, for each different way you can qualify member functions: all possible combinations of const, volatile (even though this makes no sense in this context), &, and &&, for a total of 12 overloads/partial specializations… that’s a lot of boilerplate. You might think that using std::remove_ref() and std::remove_cv() in an implementation class would help us, but they don’t work for member function qualifiers. This seems like a real missed opportunity…

Why should we use one version over the other? To use DecompCTAD we have to construct an object in the decltype argument, whereas DecompSFINAE works purely in runtime. Are they really equivalent to one another?

Let’s look at the assembly, with no optimizations:

1int main()
2{
3    using ctad_ver = decltype(DecompCTAD(&Test::getA))::Return; // int
4    using sfinae_ver = DecompSFINAE<decltype(&Test::getA)>::Return; // int
5    ctad_ver x = 1;
6    sfinae_ver y = 1;
7}

1main:
2# compare.cpp:52:     ctad_ver x = 1;
3	movl	$1, -4(%rsp)
4# compare.cpp:53:     sfinae_ver y = 1;
5	movl	$1, -8(%rsp)
6	movl	$0, %eax
7	ret

Of course they are! Arguments to decltype are expressions which are never evaluated. The SFINAE version is closest to the style the standard library uses, so that should be more familiar to most people.

Actually, there is room for improvement to the SFINAE version: we can refactor the use of decltype to an implementation class:

template <auto ptr>
struct DecompSFINAE : DecompSFINAEImpl<decltype(ptr)> {};

This way the call site is much cleaner:

using sfinae_ver = DecompSFINAE<&Test::getA>::Arguments;

Tags: #programming #C++