Data Structures in C++ — STL Containers

Ryonald Teofilo
8 min readDec 3, 2023

--

Source: ProgrammerCave

Understanding data structures and how they are manipulated as we add, remove and modify data is important to help make better decisions when writing code.

Choosing the right data structure depending on its purpose may help achieve optimal runtime. Although performance should always be measured, time and space complexity acts as a guide for what to expect from our code.

Due to time constraints, we can’t always profile every possible solution to a problem, and choose the best based on the results (depends on the scope and significance of the problem of course). Therefore, understanding the options you have and their costs can be a great asset to write better code.

Big O — short prerequisite

The big O notation is often used to denote the worst-case time and space complexity i.e. complexity may be faster or equal to the value.

The most common misconception of this notation is that it measures how fast or how much space an algorithm consumes. It is important to understand that big O describes how an algorithm scales.

For instance, algo A with time O(log n) scales better than an algo B with time O(n), but this doesn’t necessarily mean that operation A is faster than operation B under all circumstances (depends on factors like number of objects in container, memory locality, etc.). Therefore, it is important to always measure when it comes to performance.

Common Data Structures in C++

Source: ProgrammerCave

STL provides us with a number of data structures (known formally as containers). These general-purpose containers provide a simple way to construct and interface with common data structures in RAII style — which alleviates worry about managing resources.

Side note before we begin:

There are obviously more complex data structures not available in the STL such as tries, ropes and graphs, which we will not be looking at in this overview. But most of these contain building blocks or are extensions of the data structures we will be looking at. Thus, understanding these will definitely help with the apprehension of other more complex data structures.

Static arrays

Array elements stored in contiguous memory.

Arrays are containers that store elements in contiguous memory. This is available through std::array in the STL, which is effectively a light wrapper around a raw C array.

The main advantage of using std::array over raw C arrays is that it provides a modern interface just like any other STL containers e.g. size() and at(). It also has friendly value semantics, allowing itself to be passed or returned by value (it doesn’t decay into a pointer implicitly).

Dynamic Arrays

Copying over elements during “resizing” is O(n)

Available through std::vector, these are similar to static arrays other than being able to resize. However, this resize does come at the cost of having to allocate a larger array and copying all the elements over — this process of copying over is linear in time i.e. O(n).

More on Arrays

The main advantage of arrays is its constant O(1) access time, as each element can be accessed directly through indexing. However, random insertion and deletion are O(n), due to the need for shifting elements.

Back insertion and deletion can both be constant time, but this is amortised for insertion due to the need to resize the array which is linear in time — by doubling in size every time, it can be amortised to infinity as O(1).

The logical size and physical size or the array can be accessed through size() and capacity() respectively.

Searching arrays is also linear in time, though sorted arrays can be knocked down to O(log n) through binary search.

It is worth noting that front insertion and deletion can be improved to O(1) by using circular arrays (or ring buffers). An example of this is Boost’s Circular buffer.

Linked List

Doubly-linked list, each node holds a pointer to previous and next node.

Available through std::forward_list (singly-linked) and std::list (doubly-linked) — linked lists offer constant time insertion and deletion at the cost of knowing where you would like to operate.

Accessing an element requires traversal i.e. O(n), which is the same for searching, regardless of it being sorted.

std::deque — best of Linked List and Array

A std::deque, or a double-ended queue is most commonly implemented as a sequence of fixed-size arrays.

Source: StackOverflow

This allows it to have constant time insertions at both ends of the structure, as it avoids having to copy over elements to a larger array like std::vector has to.

This also allows O(1) access through indexing, though this is through another layer of indirection of dereferencing a pointer provided by the bookkeeping data structure. Again, this is why it is important to remember that time complexity notation refers to the scalability. If what you’re after is performance, always measure!

Essentially being linked arrays, the deque also suffers from linear time O(n) random insertion and deletion due the need to shift elements. Deques also typically have larger minimal memory cost, as a deque holding just one element has to allocate the full internal array.

Strings

Array of null-terminated (\0) characters.

Available as std::string, strings are arrays of null-terminated characters. This means they share similar time complexities on operations as arrays.

C++17’s note on strings — std::string_view :

It’s worth mentioning that from C++17 onwards, consider the usage of std::string_view to improve efficiency in your code. A std::string_view is an object that acts as a non-owning/read-only view of an existing string.

This avoids unnecessary copying or allocation of data when passing around strings, while still providing access to common string functions such as substr() and comparison operators.

An example is preferring it over const std::string& as a function parameter when passing string literals. This way, we get to avoid allocations and exceptions introduced by having to construct a std::string object first.

Example of where std::string_view would be preferred

Stacks and Queues

Stack — Last In First Out (LIFO)
Queue — First In First Out (FIFO)

The container adaptors std::stack and std::queue are LIFO and FIFO data structures respectively. These are called adaptors as they act as wrappers around the aforementioned data structures.

By default, both are implemented with std::deque, but std::stack can be created with a std::vector or std::list through its template parameter.

For completeness, std::queue couldn’t be implemented with std::vector, because front deletion isn’t constant.

Pushing and popping of data are done in constant time, with the caveat that pushing being an amortised O(1) for std::stack implement with a std::vector due to resizing.

Heaps

Heaps are implemented as an array, but could be represented in binary tree form — Source: Wikipedia

A heap is a binary tree where each parent node is smaller (min heap) / larger (max heap) or equal to its children nodes.

Available through std::priority_queue, these can be implemented with either std::vector or std::deque due to the need for O(1) random access.

Appropriately named a priority queue, it allows the queue to be sorted in order of priority with the choice of the lowest or highest being in the front (or top of the heap), no matter the order of the data being pushed.

Pushing and popping is done in O(log n) time, due to the need to heapify every time we remove or add data.

Heapifying process — Source: Codecademy

STL also provide heapifying algorithms through std::make_heap, std::push_heap and std::pop_heap.

Hash Tables

Hash table — using chaining as means to handle hashing collision.

Hash tables are available in the STL’s unordered associative containers, std::unordered_map, std::unordered_set, and their multi counterparts that allow duplicate keys — std::unordered_multiset and std::unordered_multimap .

Though hash tables are known for their O(1) lookups, it is important to be cognizant that the worst case is O(n).

Worst case for hash tables — when all elements are stored in a linked-list-like fashion, due to collision.

An example scenario (depends on implementation) might be where our hash table ends up becoming more of a linked list due to chaining as means to handle hashing collision.

The same applies to insertion and deletion, where by average it would be O(1), but in the worst case scenario, it could be linear in time O(n).

Avoiding Collisions:

Fortunately, we do not have to be hashing algorithm experts to help avoid collisions as member functions like max_load_factor() and reserve() are made available.

max_load_factor() sets the maximum load facto, which is the ratio between number of elements to number of buckets that decides when the hash table would automatically increase its number of buckets. This defaults to 1.

reserve() resizes the number of buckets to contain the specified number of elements, while still conforming to the max load factor, and also rehashes the container.

Trees

Binary tree structure.

Just like their unordered counterpart, associative containers std::map, std::set also have multi versions that support duplicate keys — std::multimap and std::multiset.

However, just like the name suggests, these containers keep stored data in a sorted order in a binary tree structure i.e. traversals will see elements in sorted order.

But just like hash tables in the worst case, the lookup time complexity of a binary tree can also be O(n), when the tree degenerates into a linear linked-list like structure.

Worst-case for binary tree, nodes form a linked-list-like structure.

This is why most implementations of these containers use some type of self-balancing binary tree like the red-black tree or AVL tree.

Red-black tree keeps tree balanced to maintain logarithmic time complexity — Source: Javatpoint

These trees adhere to specific rules to keep the tree balance, and thus maintain logarithmic time complexity for searches.

This also means that insertion and deletion happen in O(log n) time, which comes from placing the new element in the right position + rebalancing the tree.

It is important to take time to decide which data structure best fit its purpose in your code — consider where insertion and deletion most often happen? are quick lookups important? do we need these data to be sorted or does it not matter? is quick direct access is a priority?

In most cases, the more complex the data structure, the more expensive it gets (memory + runtime). So, when deciding which data structure suits best, it is always good to start from basic arrays, working your way up to more complex data structures.

Feel free to leave a comment if there are any questions or something you would like to add!

--

--