Ch9.9: deque
Overview
A ::fast_io::deque<T> (pronounced “deck”)
provides efficient insertion and removal at both ends
while still supporting random access via operator[].
1. Internal Structure
Unlike a vector, which stores all elements in one contiguous block, a deque uses a map of blocks design:
- Elements are stored in fixed-size blocks (chunks).
- A central controller maintains an array of pointers to these blocks.
- Each block holds
block_sizeelements (implementation-defined, typically large enough to fill a cache line or page).
// Conceptual layout of a deque:
//
// Controller: [ptr_to_block0] [ptr_to_block1] [ptr_to_block2] ...
// | | |
// Block 0: Block 1: Block 2:
// [e0][e1]..[eN] [e0][e1]..[eN] [e0][e1]..[eN]
When the deque needs to grow at the front, it allocates a new block and
adds a pointer to the front of the controller. No existing elements need
to be moved. The same applies at the back. This is the key advantage over
vector, which must copy all elements when growing at the front.
2. Deque-Specific Capacity
Because a deque grows at both ends, it has separate capacity for each end:
#include <fast_io_dsal/deque.h>
#include <fast_io.h>
int main() {
using namespace ::fast_io::iomnp;
::fast_io::deque<::std::size_t> dq;
// Reserve at each end independently
dq.reserve_back(1000zu);
dq.reserve_front(500zu);
println("Front capacity: ", dq.front_capacity());
println("Back capacity: ", dq.back_capacity());
// Segment introspection
println("Block size: ", decltype(dq)::block_size);
println("Segments: ", dq.segments_count());
}
3. When Is deque Superior to vector?
| Scenario | Prefer | Reason |
|---|---|---|
| Most general-purpose use | vector |
Better cache locality, simpler, faster iteration |
| Need to grow at the front | deque |
vector cannot push_front; deque does it in Θ(1) amortized |
| Need a stack or queue | deque |
Default underlying container for stack and queue |
| Very large data sets with push_back | deque |
vector reallocation copies all elements; deque only allocates a new block |
| Tight iteration performance | vector |
Contiguous memory is prefetch-friendly |
Why deque allocates more efficiently
A vector grows by doubling its entire buffer:
when capacity is exhausted, it allocates a new buffer of 2× the
current capacity and copies every existing element into it. For
10 million elements, the final reallocation moves ~5 million
elements, the one before ~2.5 million, and so on. The total number
of element moves across all reallocations is roughly equal to the final
size.
A deque grows by allocating a new block
(typically 4 KB worth of elements) and adding a pointer to the
controller. No existing elements are ever moved. Each
block is an independent, small allocation. This means:
- No copying cost — the work per insertion is just one constructor call, regardless of how many elements already exist.
- No large allocations — the allocator never needs to find a single contiguous region large enough for the entire dataset. Many small allocations are easier to satisfy than one huge one.
- No memory overshoot — a
vectorcan use up to 2× the needed memory right after a reallocation (size = N, capacity = 2N). Adequeonly allocates the blocks it actually fills.
The effect is visible in benchmarks. Pushing 10 million
::std::size_t elements:
| Container | push_back time | Total time |
|---|---|---|
vector (no reserve) |
0.187 s | 0.233 s |
vector + reserve() |
0.095 s | 0.126 s |
deque (plain) |
0.095 s | 0.146 s |
deque + resize_for_overwrite() |
0.080 s | 0.120 s |
Plain deque (0.095 s) is already 2×
faster than vector without reserve
(0.187 s), because it pays no copying cost. And deque
with pre-allocation (0.080 s) is even faster than
vector with reserve (0.095 s), because
the deque avoids the single large allocation that reserve
requires.
Drawbacks of deque
Despite its allocation advantages, deque has two significant
drawbacks that make vector the better choice in many
situations:
1. Not contiguous
A deque’s elements are spread across multiple heap-allocated blocks.
There is no single contiguous buffer you can hand to a
C API, a memcpy, SIMD intrinsics, or a library that expects
a T* with a count. The data() method does not
exist on deque for this reason.
If you need to pass elements to an API that requires contiguous memory,
you must either copy them into a temporary vector or use
the segments API to process one block at a time.
2. Large minimum footprint
Each deque allocates at least one full block, even if it contains only a
single element. The block size is typically 4 KB
(4096 bytes, chosen to match a page or cache line boundary). So a deque
holding a single ::std::size_t (8 bytes) still consumes
~4 KB of memory plus the controller overhead.
This makes deque a terrible choice as an inner
dimension in nested containers:
// BAD: 1000 inner deques, each allocating ~4 KB even if nearly empty
::fast_io::vector<::fast_io::deque<::std::size_t>> matrix(1000zu);
// Minimum memory: 1000 * 4 KB = 4 MB, even if each deque has only 1 element
// GOOD: 1000 inner vectors, each allocating only what they need
::fast_io::vector<::fast_io::vector<::std::size_t>> matrix(1000zu);
// Minimum memory: 1000 * ~24 bytes (vector overhead) = ~24 KB
As a rule of thumb:
- Use
dequeas a top-level container when you need fast growth at both ends. - Use
vectoras the inner dimension in nested containers (vector<vector<T>>,vector<array<T, N>>, etc.) to avoid the per-instance block overhead.
4. Fast Iteration with the Segments API
The regular deque iterator checks for block boundaries on every element increment:
// What deque_iterator::operator++ does internally:
if (++curr_ptr == block_end) [[unlikely]]
wrap_to_next_block(); // branch on every element
This branch is usually predicted correctly (blocks hold ~500 elements for 8-byte types), but it prevents the compiler from auto-vectorising the loop and adds latency. The segments API lets you iterate block-by-block instead, eliminating the per-element check entirely.
Segment methods
| Method | Returns | Description |
|---|---|---|
segments_count() |
size_type |
Number of blocks currently in use. |
nth_segment(pos) |
span<T> |
Mutable span over the live elements in block pos. |
const_nth_segment(pos) |
span<T const> |
Const span over the live elements in block pos. |
size_per_segment() |
size_type (static) |
Maximum elements per block (= block_size). |
size_bytes_per_segment() |
size_type (static) |
Byte capacity of a full block. |
The returned span has .data() and
.size(), so the inner loop compiles down to a plain pointer
walk — auto-vectorisable, memcpy-able, and branch-free
in the hot path. Only the first and last segments may be partial; middle
segments always contain exactly block_size elements.
Read-only iteration
#include <fast_io_dsal/deque.h>
#include <fast_io.h>
int main() {
::fast_io::deque<::std::size_t> dq;
for (::std::size_t i{}; i < 10000zu; ++i)
dq.push_back(i);
::std::size_t sum{};
// Segment iteration: no per-element bounds check
for (::std::size_t i{}, segs{dq.segments_count()}; i != segs; ++i)
{
for (auto const e : dq.const_nth_segment(i))
{
sum += e;
}
}
::fast_io::io::println("sum = ", sum);
}
Mutable iteration
// Write through segments
for (::std::size_t i{}, segs{dq.segments_count()}; i != segs; ++i)
{
for (auto& e : dq.nth_segment(i))
{
e *= 2zu;
}
}
Direct pointer access for memcpy / SIMD
// Get raw pointer + count from each segment
for (::std::size_t i{}, segs{dq.segments_count()}; i != segs; ++i)
{
auto seg = dq.const_nth_segment(i);
::std::size_t const* data = seg.data();
::std::size_t count = seg.size();
// pass (data, count) to memcpy, SIMD intrinsics, etc.
}
When to use segments vs regular iteration
| Pattern | Use |
|---|---|
Simple range-for, algorithms, std::ranges |
Regular iterator: for (auto const& e : dq) |
| Inner loop performance matters (sum, transform, SIMD) | Segments API: iterate block-by-block |
Need raw pointers for memcpy or C APIs |
Segments API: seg.data() / seg.size() |
5. Iterator Invalidation vs Pointer/Reference Stability
A deque’s block-map design gives it a unique invalidation
property that differs from vector:
| Operation | Iterators | Pointers / References to elements |
|---|---|---|
push_back() / push_front() |
All invalidated. The controller may be reallocated, and iterators encode block positions. | Survive (unless the element is erased). Elements are never moved when growing at the ends — only new blocks are allocated. |
emplace_back() / emplace_front() |
All invalidated. | Survive — same reason as push_*. |
pop_back() / pop_front() |
All invalidated. | Pointers/references to the removed element are invalidated. All others survive. |
insert_index() / erase_index() (middle) |
All invalidated. | Pointers/references to erased elements are invalidated. Others may or may not survive, depending on whether elements are shifted between blocks. |
clear() / clear_destroy() |
All invalidated. | All invalidated. |
Why iterators are always invalidated: A deque iterator internally holds a pointer into the controller (the array of block pointers). When the controller grows — which can happen on any insertion at the front or back — the old controller is freed and a new one is allocated. Every iterator that pointed into the old controller is now dangling, even though the elements themselves have not moved.
Why pointers and references survive end insertions: When a deque grows at an end, it either fills spare capacity in the existing end block or allocates a brand-new block. No existing element is copied or moved. So a pointer or reference obtained before the insertion still points to the same memory location.
#include <fast_io_dsal/deque.h>
#include <fast_io.h>
int main() {
::fast_io::deque<::std::size_t> dq{10zu, 20zu, 30zu};
::std::size_t& ref = dq[1zu]; // reference to element 20
// Iterator is invalidated, but the reference survives:
dq.push_back(40zu);
// ref is still valid and refers to 20
dq.push_front(5zu);
// ref is STILL valid — elements are never moved on end insertions
::fast_io::io::println("ref = ", ref); // 20
}
Contrast with vector: A
vector stores all elements in a single contiguous array.
When it grows beyond its capacity, it allocates a new array and
moves every element. This invalidates all
iterators, pointers, and references — there is no partial
stability.
Key takeaways
- A
dequeuses a map of fixed-size blocks, not one contiguous array. - It supports Θ(1) amortized insertion at both front and back, with no element copying.
dequeis not contiguous — nodata(), no direct use with C APIs or SIMD.- Each
dequeallocates at least one ~4 KB block, even for a single element. Avoid as an inner dimension in nested containers. reserve_front()andreserve_back()control capacity independently.dequesupports random access but with slightly worse cache locality thanvector.- Prefer
vectorby default; usedequewhen you need front insertion. - Use the segments API (
segments_count(),nth_segment(),const_nth_segment()) for branch-free, auto-vectorisable iteration over a deque. dequeiterators are always invalidated on insertion, but pointers and references to existing elements survivepush_front()/push_back()— unlikevector, which invalidates everything on reallocation.