I’ve uploaded on GitHub some C++ code of mine, implementing a custom string pool allocator.
The basic idea is to allocate big chunks of memory, and then serve single string allocations carving memory from inside those blocks, with a simple fast pointer increase.
There’s also a benchmark comparing this custom allocator vs. STL’s strings.

The results clearly show that both allocating strings that way, and sorting them, is faster than using the default std::wstring class.