Ch9.9: deque

Overview

A ::fast_io::deque<T> (pronounced “deck”) provides efficient insertion and removal at both ends while still supporting random access via operator[].

1. Internal Structure

Unlike a vector, which stores all elements in one contiguous block, a deque uses a map of blocks design:

Elements are stored in fixed-size blocks (chunks).
A central controller maintains an array of pointers to these blocks.
Each block holds block_size elements (implementation-defined, typically large enough to fill a cache line or page).


// Conceptual layout of a deque:
//
// Controller: [ptr_to_block0] [ptr_to_block1] [ptr_to_block2] ...
//                    |               |               |
//               Block 0:        Block 1:        Block 2:
//             [e0][e1]..[eN]  [e0][e1]..[eN]  [e0][e1]..[eN]

When the deque needs to grow at the front, it allocates a new block and adds a pointer to the front of the controller. No existing elements need to be moved. The same applies at the back. This is the key advantage over vector, which must copy all elements when growing at the front.

2. Deque-Specific Capacity

Because a deque grows at both ends, it has separate capacity for each end:


#include <fast_io_dsal/deque.h>
#include <fast_io.h>

int main() {
    using namespace ::fast_io::iomnp;

    ::fast_io::deque<::std::size_t> dq;

    // Reserve at each end independently
    dq.reserve_back(1000zu);
    dq.reserve_front(500zu);

    println("Front capacity: ", dq.front_capacity());
    println("Back capacity: ", dq.back_capacity());

    // Segment introspection
    println("Block size: ", decltype(dq)::block_size);
    println("Segments: ", dq.segments_count());
}

3. When Is deque Superior to vector?

Scenario	Prefer	Reason
Most general-purpose use	`vector`	Better cache locality, simpler, faster iteration
Need to grow at the front	`deque`	vector cannot push_front; deque does it in Θ(1) amortized
Need a stack or queue	`deque`	Default underlying container for `stack` and `queue`
Very large data sets with push_back	`deque`	vector reallocation copies all elements; deque only allocates a new block
Tight iteration performance	`vector`	Contiguous memory is prefetch-friendly

Why deque allocates more efficiently

A vector grows by doubling its entire buffer: when capacity is exhausted, it allocates a new buffer of 2× the current capacity and copies every existing element into it. For 10 million elements, the final reallocation moves ~5 million elements, the one before ~2.5 million, and so on. The total number of element moves across all reallocations is roughly equal to the final size.

A deque grows by allocating a new block (typically 4 KB worth of elements) and adding a pointer to the controller. No existing elements are ever moved. Each block is an independent, small allocation. This means:

No copying cost — the work per insertion is just one constructor call, regardless of how many elements already exist.
No large allocations — the allocator never needs to find a single contiguous region large enough for the entire dataset. Many small allocations are easier to satisfy than one huge one.
No memory overshoot — a vector can use up to 2× the needed memory right after a reallocation (size = N, capacity = 2N). A deque only allocates the blocks it actually fills.

The effect is visible in benchmarks. Pushing 10 million ::std::size_t elements:

Container	push_back time	Total time
`vector` (no reserve)	0.187 s	0.233 s
`vector` + `reserve()`	0.095 s	0.126 s
`deque` (plain)	0.095 s	0.146 s
`deque` + `resize_for_overwrite()`	0.080 s	0.120 s

Plain deque (0.095 s) is already 2× faster than vector without reserve (0.187 s), because it pays no copying cost. And deque with pre-allocation (0.080 s) is even faster than vector with reserve (0.095 s), because the deque avoids the single large allocation that reserve requires.

Drawbacks of deque

Despite its allocation advantages, deque has two significant drawbacks that make vector the better choice in many situations:

1. Not contiguous

A deque’s elements are spread across multiple heap-allocated blocks. There is no single contiguous buffer you can hand to a C API, a memcpy, SIMD intrinsics, or a library that expects a T* with a count. The data() method does not exist on deque for this reason.

If you need to pass elements to an API that requires contiguous memory, you must either copy them into a temporary vector or use the segments API to process one block at a time.

2. Large minimum footprint

Each deque allocates at least one full block, even if it contains only a single element. The block size is typically 4 KB (4096 bytes, chosen to match a page or cache line boundary). So a deque holding a single ::std::size_t (8 bytes) still consumes ~4 KB of memory plus the controller overhead.

This makes deque a terrible choice as an inner dimension in nested containers:


// BAD: 1000 inner deques, each allocating ~4 KB even if nearly empty
::fast_io::vector<::fast_io::deque<::std::size_t>> matrix(1000zu);
// Minimum memory: 1000 * 4 KB = 4 MB, even if each deque has only 1 element

// GOOD: 1000 inner vectors, each allocating only what they need
::fast_io::vector<::fast_io::vector<::std::size_t>> matrix(1000zu);
// Minimum memory: 1000 * ~24 bytes (vector overhead) = ~24 KB

As a rule of thumb:

Use deque as a top-level container when you need fast growth at both ends.
Use vector as the inner dimension in nested containers (vector<vector<T>>, vector<array<T, N>>, etc.) to avoid the per-instance block overhead.

4. Fast Iteration with the Segments API

The regular deque iterator checks for block boundaries on every element increment:


// What deque_iterator::operator++ does internally:
if (++curr_ptr == block_end) [[unlikely]]
    wrap_to_next_block(); // branch on every element

This branch is usually predicted correctly (blocks hold ~500 elements for 8-byte types), but it prevents the compiler from auto-vectorising the loop and adds latency. The segments API lets you iterate block-by-block instead, eliminating the per-element check entirely.

Segment methods

Method	Returns	Description
`segments_count()`	`size_type`	Number of blocks currently in use.
`nth_segment(pos)`	`span<T>`	Mutable span over the live elements in block `pos`.
`const_nth_segment(pos)`	`span<T const>`	Const span over the live elements in block `pos`.
`size_per_segment()`	`size_type` (static)	Maximum elements per block (= `block_size`).
`size_bytes_per_segment()`	`size_type` (static)	Byte capacity of a full block.

The returned span has .data() and .size(), so the inner loop compiles down to a plain pointer walk — auto-vectorisable, memcpy-able, and branch-free in the hot path. Only the first and last segments may be partial; middle segments always contain exactly block_size elements.

Read-only iteration


#include <fast_io_dsal/deque.h>
#include <fast_io.h>

int main() {
    ::fast_io::deque<::std::size_t> dq;
    for (::std::size_t i{}; i < 10000zu; ++i)
        dq.push_back(i);

    ::std::size_t sum{};

    // Segment iteration: no per-element bounds check
    for (::std::size_t i{}, segs{dq.segments_count()}; i != segs; ++i)
    {
        for (auto const e : dq.const_nth_segment(i))
        {
            sum += e;
        }
    }

    ::fast_io::io::println("sum = ", sum);
}

Mutable iteration


// Write through segments
for (::std::size_t i{}, segs{dq.segments_count()}; i != segs; ++i)
{
    for (auto& e : dq.nth_segment(i))
    {
        e *= 2zu;
    }
}

Direct pointer access for memcpy / SIMD


// Get raw pointer + count from each segment
for (::std::size_t i{}, segs{dq.segments_count()}; i != segs; ++i)
{
    auto seg = dq.const_nth_segment(i);
    ::std::size_t const* data = seg.data();
    ::std::size_t count = seg.size();
    // pass (data, count) to memcpy, SIMD intrinsics, etc.
}

When to use segments vs regular iteration

Pattern	Use
Simple range-for, algorithms, `std::ranges`	Regular iterator: `for (auto const& e : dq)`
Inner loop performance matters (sum, transform, SIMD)	Segments API: iterate block-by-block
Need raw pointers for `memcpy` or C APIs	Segments API: `seg.data()` / `seg.size()`

5. Iterator Invalidation vs Pointer/Reference Stability

A deque’s block-map design gives it a unique invalidation property that differs from vector:

Operation	Iterators	Pointers / References to elements
`push_back()` / `push_front()`	All invalidated. The controller may be reallocated, and iterators encode block positions.	Survive (unless the element is erased). Elements are never moved when growing at the ends — only new blocks are allocated.
`emplace_back()` / `emplace_front()`	All invalidated.	Survive — same reason as `push_*`.
`pop_back()` / `pop_front()`	All invalidated.	Pointers/references to the removed element are invalidated. All others survive.
`insert_index()` / `erase_index()` (middle)	All invalidated.	Pointers/references to erased elements are invalidated. Others may or may not survive, depending on whether elements are shifted between blocks.
`clear()` / `clear_destroy()`	All invalidated.	All invalidated.

Why iterators are always invalidated: A deque iterator internally holds a pointer into the controller (the array of block pointers). When the controller grows — which can happen on any insertion at the front or back — the old controller is freed and a new one is allocated. Every iterator that pointed into the old controller is now dangling, even though the elements themselves have not moved.

Why pointers and references survive end insertions: When a deque grows at an end, it either fills spare capacity in the existing end block or allocates a brand-new block. No existing element is copied or moved. So a pointer or reference obtained before the insertion still points to the same memory location.


#include <fast_io_dsal/deque.h>
#include <fast_io.h>

int main() {
    ::fast_io::deque<::std::size_t> dq{10zu, 20zu, 30zu};

    ::std::size_t& ref = dq[1zu]; // reference to element 20

    // Iterator is invalidated, but the reference survives:
    dq.push_back(40zu);
    // ref is still valid and refers to 20

    dq.push_front(5zu);
    // ref is STILL valid — elements are never moved on end insertions

    ::fast_io::io::println("ref = ", ref); // 20
}

Contrast with vector: A vector stores all elements in a single contiguous array. When it grows beyond its capacity, it allocates a new array and moves every element. This invalidates all iterators, pointers, and references — there is no partial stability.

Key takeaways

A deque uses a map of fixed-size blocks, not one contiguous array.
It supports Θ(1) amortized insertion at both front and back, with no element copying.
deque is not contiguous — no data(), no direct use with C APIs or SIMD.
Each deque allocates at least one ~4 KB block, even for a single element. Avoid as an inner dimension in nested containers.
reserve_front() and reserve_back() control capacity independently.
deque supports random access but with slightly worse cache locality than vector.
Prefer vector by default; use deque when you need front insertion.
Use the segments API (segments_count(), nth_segment(), const_nth_segment()) for branch-free, auto-vectorisable iteration over a deque.
deque iterators are always invalidated on insertion, but pointers and references to existing elements survive push_front() / push_back() — unlike vector, which invalidates everything on reallocation.