# Data Structures in C++ — STL Containers

Understanding data structures and how they are manipulated as we add, remove and modify data is important to help make better decisions when writing code.

Choosing the right data structure depending on its purpose may help achieve optimal runtime. Although **performance should always be measured**, time and space complexity acts as a guide for what to expect from our code.

Due to time constraints, we can’t always profile every possible solution to a problem, and choose the best based on the results (depends on the scope and significance of the problem of course). Therefore, understanding the options you have and their costs can be a great asset to write better code.

## Big O — short prerequisite

The big O notation is often used to denote the **worst-case** time and space complexity i.e. complexity may be *faster or equal* to the value.

The most common misconception of this notation is that it measures how fast or how much space an algorithm consumes. It is important to understand that big O describes how an algorithm **scales.**

For instance, algo A with time `O(log n)`

scales better than an algo B with time `O(n)`

, but this doesn’t necessarily mean that operation A is faster than operation B under all circumstances (depends on factors like number of objects in container, memory locality, etc.). Therefore, it is important to **always measure** when it comes to performance.

**Common Data Structures in C++**

STL provides us with a number of data structures (known formally as containers). These general-purpose containers provide a simple way to construct and interface with common data structures in RAII style — which alleviates worry about managing resources.

*Side note before we begin:*

There are obviously more complex data structures not available in the STL such as tries, ropes and graphs, which we will not be looking at in this overview. But most of these contain building blocks or are extensions of the data structures we will be looking at. Thus, understanding these will definitely help with the apprehension of other more complex data structures.

## Static arrays

Arrays are containers that store elements in contiguous memory. This is available through

in the STL, which is effectively a light wrapper around a raw C array.**std::array**

The main advantage of using **std::array*** *over raw C arrays is that it provides a modern interface just like any other STL containers e.g. `size()`

and `at()`

. It also has friendly value semantics, allowing itself to be passed or returned by value (it doesn’t decay into a pointer implicitly).

**Dynamic Arrays**

Available through

, these are similar to static arrays other than being able to resize. However, this resize does come at the cost of having to allocate a larger array and copying all the elements over — this process of copying over is linear in time i.e. **std::vector**`O(n)`

.

## More on Arrays

The main advantage of arrays is its constant `O(1)`

access time, as each element can be accessed directly through indexing. However, random insertion and deletion are `O(n)`

, due to the need for shifting elements.

Back insertion and deletion can both be constant time, but this is amortised for insertion due to the need to resize the array which is linear in time — by doubling in size every time, it can be amortised to infinity as `O(1)`

.

The logical size and physical size or the array can be accessed through `size()`

and `capacity()`

respectively.

Searching arrays is also linear in time, though sorted arrays can be knocked down to `O(log n)`

through binary search.

It is worth noting that front insertion and deletion can be improved to `O(1)`

by using *circular arrays* (or *ring buffers*). An example of this is *Boost’s Circular buffer**.*

**Linked List**

Available through

(singly-linked) and **std::forward_list**

(doubly-linked) — linked lists offer constant time insertion and deletion at the cost of knowing where you would like to operate.**std::list**

Accessing an element requires traversal i.e. `O(n)`

, which is the same for searching, regardless of it being sorted.

**std::deque**

** — best of Linked List and Array**

**std::deque**

A

, or a double-ended queue is most commonly implemented as a sequence of fixed-size arrays.**std::deque**

This allows it to have constant time insertions at both ends of the structure, as it avoids having to copy over elements to a larger array like

has to.**std::vector**

This also allows `O(1)`

access through indexing, though this is through another layer of indirection of dereferencing a pointer provided by the bookkeeping data structure. Again, this is why it is important to remember that **time complexity notation refers to the scalability**. If what you’re after is performance, always measure!

Essentially being linked arrays, the deque also suffers from linear time `O(n)`

random insertion and deletion due the need to shift elements. Deques also typically have larger minimal memory cost, as a deque holding just one element has to allocate the full internal array.

## Strings

Available as

, strings are arrays of null-terminated characters. This means they share similar time complexities on operations as arrays.**std::string**

*C++17’s note on strings — *

:**std::string_view**

It’s worth mentioning that from C++17 onwards, consider the usage of

to improve efficiency in your code. A **std::string_view**

is an object that acts as a non-owning/read-only view of an existing string.**std::string_view**

This avoids unnecessary copying or allocation of data when passing around strings, while still providing access to common string functions such as `substr()`

and comparison operators.

An example is preferring it over

as a function parameter when passing string literals. This way, we get to avoid allocations and exceptions introduced by having to construct a **const std::string&**

object first.**std::string**

## Stacks and Queues

The container adaptors

and **std::stack**

are LIFO and FIFO data structures respectively. These are called adaptors as they act as wrappers around the aforementioned data structures.**std::queue**

By default, both are implemented with

, but **std::deque**

can be created with a **std::stack**

or **std::vector**

through its template parameter.**std::list**

For completeness,

couldn’t be implemented with **std::queue**

, because front deletion isn’t constant.**std::vector**

Pushing and popping of data are done in constant time, with the caveat that pushing being an amortised `O(1)`

for

implement with a **std::stack****std::vector** due to resizing.

**Heaps**

A heap is a binary tree where each parent node is smaller (min heap) / larger (max heap) or equal to its children nodes.

Available through

, these can be implemented with either **std::priority_queue**

or **std::vector**

due to the need for **std::deque**`O(1)`

random access.

Appropriately named a priority queue, it allows the queue to be sorted in order of *priority* with the choice of the lowest or highest being in the front (or top of the heap), no matter the order of the data being pushed.

Pushing and popping is done in `O(log n)`

time, due to the need to *heapify* every time we remove or add data.

STL also provide *heapifying* algorithms through `std::make_heap`

, `std::push_heap`

and `std::pop_heap`

.

**Hash Tables**

Hash tables are available in the STL’s *unordered associative containers,*

, **std::unordered_map**

, and their multi counterparts that allow duplicate keys — **std::unordered_set****std::unordered_multiset**** **and

.**std::unordered_multimap**

Though hash tables are known for their `O(1)`

lookups, it is important to be cognizant that the worst case is `O(n)`

.

An example scenario (depends on implementation) might be where our hash table ends up becoming more of a linked list due to chaining as means to handle hashing collision.

The same applies to insertion and deletion, where by average it would be `O(1)`

, but in the worst case scenario, it could be linear in time `O(n)`

.

*Avoiding Collisions:*

Fortunately, we do not have to be hashing algorithm experts to help avoid collisions as member functions like `max_load_factor()`

and `reserve()`

are made available.

`max_load_factor()`

sets the maximum load facto, which is the ratio between number of elements to number of buckets that decides when the hash table would automatically increase its number of buckets. This defaults to 1.

`reserve()`

resizes the number of buckets to contain the specified number of elements, while still conforming to the max load factor, and also rehashes the container.

## Trees

Just like their *unordered *counterpart,* associative containers*

, **std::map****std::set*** *also have multi versions that support duplicate keys —

and **std::multimap**

.**std::multiset**

However, just like the name suggests, these containers keep stored data in a sorted order in a binary tree structure i.e. traversals will see elements in sorted order.

But just like hash tables in the worst case, the lookup time complexity of a binary tree can also be O(n), when the tree degenerates into a linear linked-list like structure.

This is why most implementations of these containers use some type of self-balancing binary tree like the *red-black tree* or *AVL tree**.*

These trees adhere to specific rules to keep the tree balance, and thus maintain logarithmic time complexity for searches.

This also means that insertion and deletion happen in `O(log n)`

time, which comes from placing the new element in the right position + rebalancing the tree.

It is important to take time to decide which data structure best fit its purpose in your code — consider where insertion and deletion most often happen? are quick lookups important? do we need these data to be sorted or does it not matter? is quick direct access is a priority?

In most cases, the more complex the data structure, the more expensive it gets (memory + runtime). So, when deciding which data structure suits best, it is always good to start from basic arrays, working your way up to more complex data structures.

Feel free to leave a comment if there are any questions or something you would like to add!