Ch9.9: deque

Overview

A ::fast_io::deque<T> (pronounced “deck”) provides efficient insertion and removal at both ends while still supporting random access via operator[].

1. Internal Structure

Unlike a vector, which stores all elements in one contiguous block, a deque uses a map of blocks design:


// Conceptual layout of a deque:
//
// Controller: [ptr_to_block0] [ptr_to_block1] [ptr_to_block2] ...
//                    |               |               |
//               Block 0:        Block 1:        Block 2:
//             [e0][e1]..[eN]  [e0][e1]..[eN]  [e0][e1]..[eN]

When the deque needs to grow at the front, it allocates a new block and adds a pointer to the front of the controller. No existing elements need to be moved. The same applies at the back. This is the key advantage over vector, which must copy all elements when growing at the front.

2. Deque-Specific Capacity

Because a deque grows at both ends, it has separate capacity for each end:


#include <fast_io_dsal/deque.h>
#include <fast_io.h>

int main() {
    using namespace ::fast_io::iomnp;

    ::fast_io::deque<::std::size_t> dq;

    // Reserve at each end independently
    dq.reserve_back(1000zu);
    dq.reserve_front(500zu);

    println("Front capacity: ", dq.front_capacity());
    println("Back capacity: ", dq.back_capacity());

    // Segment introspection
    println("Block size: ", decltype(dq)::block_size);
    println("Segments: ", dq.segments_count());
}

3. When Is deque Superior to vector?

Scenario Prefer Reason
Most general-purpose use vector Better cache locality, simpler, faster iteration
Need to grow at the front deque vector cannot push_front; deque does it in Θ(1) amortized
Need a stack or queue deque Default underlying container for stack and queue
Very large data sets with push_back deque vector reallocation copies all elements; deque only allocates a new block
Tight iteration performance vector Contiguous memory is prefetch-friendly

Why deque allocates more efficiently

A vector grows by doubling its entire buffer: when capacity is exhausted, it allocates a new buffer of 2× the current capacity and copies every existing element into it. For 10 million elements, the final reallocation moves ~5 million elements, the one before ~2.5 million, and so on. The total number of element moves across all reallocations is roughly equal to the final size.

A deque grows by allocating a new block (typically 4 KB worth of elements) and adding a pointer to the controller. No existing elements are ever moved. Each block is an independent, small allocation. This means:

The effect is visible in benchmarks. Pushing 10 million ::std::size_t elements:

Container push_back time Total time
vector (no reserve) 0.187 s 0.233 s
vector + reserve() 0.095 s 0.126 s
deque (plain) 0.095 s 0.146 s
deque + resize_for_overwrite() 0.080 s 0.120 s

Plain deque (0.095 s) is already 2× faster than vector without reserve (0.187 s), because it pays no copying cost. And deque with pre-allocation (0.080 s) is even faster than vector with reserve (0.095 s), because the deque avoids the single large allocation that reserve requires.

Drawbacks of deque

Despite its allocation advantages, deque has two significant drawbacks that make vector the better choice in many situations:

1. Not contiguous

A deque’s elements are spread across multiple heap-allocated blocks. There is no single contiguous buffer you can hand to a C API, a memcpy, SIMD intrinsics, or a library that expects a T* with a count. The data() method does not exist on deque for this reason.

If you need to pass elements to an API that requires contiguous memory, you must either copy them into a temporary vector or use the segments API to process one block at a time.

2. Large minimum footprint

Each deque allocates at least one full block, even if it contains only a single element. The block size is typically 4 KB (4096 bytes, chosen to match a page or cache line boundary). So a deque holding a single ::std::size_t (8 bytes) still consumes ~4 KB of memory plus the controller overhead.

This makes deque a terrible choice as an inner dimension in nested containers:


// BAD: 1000 inner deques, each allocating ~4 KB even if nearly empty
::fast_io::vector<::fast_io::deque<::std::size_t>> matrix(1000zu);
// Minimum memory: 1000 * 4 KB = 4 MB, even if each deque has only 1 element

// GOOD: 1000 inner vectors, each allocating only what they need
::fast_io::vector<::fast_io::vector<::std::size_t>> matrix(1000zu);
// Minimum memory: 1000 * ~24 bytes (vector overhead) = ~24 KB

As a rule of thumb:

4. Fast Iteration with the Segments API

The regular deque iterator checks for block boundaries on every element increment:


// What deque_iterator::operator++ does internally:
if (++curr_ptr == block_end) [[unlikely]]
    wrap_to_next_block(); // branch on every element

This branch is usually predicted correctly (blocks hold ~500 elements for 8-byte types), but it prevents the compiler from auto-vectorising the loop and adds latency. The segments API lets you iterate block-by-block instead, eliminating the per-element check entirely.

Segment methods

Method Returns Description
segments_count() size_type Number of blocks currently in use.
nth_segment(pos) span<T> Mutable span over the live elements in block pos.
const_nth_segment(pos) span<T const> Const span over the live elements in block pos.
size_per_segment() size_type (static) Maximum elements per block (= block_size).
size_bytes_per_segment() size_type (static) Byte capacity of a full block.

The returned span has .data() and .size(), so the inner loop compiles down to a plain pointer walk — auto-vectorisable, memcpy-able, and branch-free in the hot path. Only the first and last segments may be partial; middle segments always contain exactly block_size elements.

Read-only iteration


#include <fast_io_dsal/deque.h>
#include <fast_io.h>

int main() {
    ::fast_io::deque<::std::size_t> dq;
    for (::std::size_t i{}; i < 10000zu; ++i)
        dq.push_back(i);

    ::std::size_t sum{};

    // Segment iteration: no per-element bounds check
    for (::std::size_t i{}, segs{dq.segments_count()}; i != segs; ++i)
    {
        for (auto const e : dq.const_nth_segment(i))
        {
            sum += e;
        }
    }

    ::fast_io::io::println("sum = ", sum);
}

Mutable iteration


// Write through segments
for (::std::size_t i{}, segs{dq.segments_count()}; i != segs; ++i)
{
    for (auto& e : dq.nth_segment(i))
    {
        e *= 2zu;
    }
}

Direct pointer access for memcpy / SIMD


// Get raw pointer + count from each segment
for (::std::size_t i{}, segs{dq.segments_count()}; i != segs; ++i)
{
    auto seg = dq.const_nth_segment(i);
    ::std::size_t const* data = seg.data();
    ::std::size_t count = seg.size();
    // pass (data, count) to memcpy, SIMD intrinsics, etc.
}

When to use segments vs regular iteration

Pattern Use
Simple range-for, algorithms, std::ranges Regular iterator: for (auto const& e : dq)
Inner loop performance matters (sum, transform, SIMD) Segments API: iterate block-by-block
Need raw pointers for memcpy or C APIs Segments API: seg.data() / seg.size()

5. Iterator Invalidation vs Pointer/Reference Stability

A deque’s block-map design gives it a unique invalidation property that differs from vector:

Operation Iterators Pointers / References to elements
push_back() / push_front() All invalidated. The controller may be reallocated, and iterators encode block positions. Survive (unless the element is erased). Elements are never moved when growing at the ends — only new blocks are allocated.
emplace_back() / emplace_front() All invalidated. Survive — same reason as push_*.
pop_back() / pop_front() All invalidated. Pointers/references to the removed element are invalidated. All others survive.
insert_index() / erase_index() (middle) All invalidated. Pointers/references to erased elements are invalidated. Others may or may not survive, depending on whether elements are shifted between blocks.
clear() / clear_destroy() All invalidated. All invalidated.

Why iterators are always invalidated: A deque iterator internally holds a pointer into the controller (the array of block pointers). When the controller grows — which can happen on any insertion at the front or back — the old controller is freed and a new one is allocated. Every iterator that pointed into the old controller is now dangling, even though the elements themselves have not moved.

Why pointers and references survive end insertions: When a deque grows at an end, it either fills spare capacity in the existing end block or allocates a brand-new block. No existing element is copied or moved. So a pointer or reference obtained before the insertion still points to the same memory location.


#include <fast_io_dsal/deque.h>
#include <fast_io.h>

int main() {
    ::fast_io::deque<::std::size_t> dq{10zu, 20zu, 30zu};

    ::std::size_t& ref = dq[1zu]; // reference to element 20

    // Iterator is invalidated, but the reference survives:
    dq.push_back(40zu);
    // ref is still valid and refers to 20

    dq.push_front(5zu);
    // ref is STILL valid — elements are never moved on end insertions

    ::fast_io::io::println("ref = ", ref); // 20
}

Contrast with vector: A vector stores all elements in a single contiguous array. When it grows beyond its capacity, it allocates a new array and moves every element. This invalidates all iterators, pointers, and references — there is no partial stability.

Key takeaways