Passing std::vector’s Underlying Array to C APIs

Often, there’s a need to pass some data stored as an array from C++ to C-interface APIs. The “default” first-choice STL container for storing arrays in C++ is std::vector. So, how to pass the array content managed by std::vector to a C-interface API?

The Wrong Way I saw that kind of C++ code:

// v is a std::vector<BYTE>.
// Pass it to a C-interface API: pointer + size in bytes
DoSomethingC( 
  /* Some cast, e.g.: (BYTE*) */ &v, 
  sizeof(v) 
);

That’s wrong, in two ways: for both the pointer and the size. Let’s talk the size first: sizeof(v) represents the size, in bytes, of an instance of std::vector, which is in general different from the size in bytes of the array data managed by the vector. For example, suppose that a std::vector is implemented using three pointers, e.g. to begin of data, to end of data, and to end of reserved capacity; in this case, sizeof(v) would be sizeof(pointer) * 3, which is 8 (pointer size, in bytes, in 64-bit architectures) * 3 = 24 bytes on 64-bit architectures (4*3 = 12 bytes on 32-bit).

But what the author of that piece of code actually wanted was the size in bytes of the array data managed (pointed to) by the std::vector, which you can get multiplying the vector’s element count returned from v.size() by the size in bytes of a single vector element. For a vector<BYTE>, the value returned by v.size() is just fine (in fact, sizeof(BYTE) is one).

Now let’s discuss the address (pointer) problem. “&v” points to the beginning of the std::vector’s internal representation (i.e. the internal “guts” of std::vector), which is implementation-defined, and isn’t interesting at all for the purpose of that piece of code. Actually, misinterpreting the std::vector’s internal implementation details with the array data managed by the vector is dangerous, as in case of write access the called function will end up stomping the vector’s internal state with unrelated bytes. So, on return, the vector object will be in a corrupted and unusable state, and the memory previously owned by the vector will be leaked.

In case of read access, the vector’s internal state will be read, instead of the intended actual std::vector’s array content.

The presence of a cast is also a signal that something may be wrong in the user’s code, and maybe the C++ compiler was actually helping with a warning or an error message, but it was silenced instead.

So, how to fix that? Well, the pointer to the array data managed by std::vector can be retrieved calling the vector::data() method. This method is offered in both a const version for read-only access to the vector’s content, and in a non-const version, for read-write access.

The Correct Way So, the correct code to pass the std::vector’s underlying array data to a C-interface API expecting a pointer and a size would be for the case discussed above:

DoSomethingC(v.data(), v.size());

Or, if you have e.g. a std::vector<double> and the size parameter is expressed in bytes (instead of element count):

DoSomethingC(v.data(), v.size() * sizeof(double));

An alternative syntax to calling vector::data() would be “&v[0]”, although the intent using vector::data() seems clearer to me. Moreover, vector::data() works also for empty vectors, returning nullptr in this case. Instead, “&v[0]” triggers a “vector subscript out of range” debug assertion failure in MSVC when used on an empty vector (in fact, for an empty vector it doesn’t make sense to access the first item at index zero, as the vector is empty and there’s no first item).

&v[0] on an empty vector: debug assertion failure
&v[0] on an empty vector: debug assertion failure

Leave a Reply

Your email address will not be published. Required fields are marked *