Ch4.2: ::fast_io::char_category

Overview

The ::fast_io::char_category module provides a set of constexpr character classification and transformation functions. These functions determine whether a character is lowercase, uppercase, a digit, whitespace, punctuation, and more.

All functions follow the execution charset of the program. This ensures correct behavior on both ASCII-based and EBCDIC-based systems.

1. Source charset vs execution charset

C++ distinguishes between:

::fast_io::char_category always uses the execution charset.

This means:

2. ::fast_io::char_literal_v and char_literal

The primary way to create a character constant that respects the execution charset is ::fast_io::char_literal_v. It takes a char8_t value at compile time and produces a character in the execution charset.


char c = ::fast_io::char_literal_v<u8'a', char>;

Use this form whenever you have a compile‑time character literal.

There is also a function form ::fast_io::char_literal that takes a char8_t value at runtime:


char8_t runtime_ch{/* ... */};
char c = ::fast_io::char_literal<char>(runtime_ch);

This version is more flexible but may be slower, because it must execute at runtime. Prefer char_literal_v when you have a literal known at compile time.

3. ::fast_io::arithmetic_char_literal_v and arithmetic_char_literal

::fast_io::arithmetic_char_literal behaves like char_literal, but is intended for arithmetic expressions, especially when wchar_t uses a non‑UTF execution charset.


char8_t runtime_ch{/* ... */};
char c = ::fast_io::arithmetic_char_literal<char>(runtime_ch);

The shorthand form arithmetic_char_literal_v is the compile‑time variant:


wchar_t w = ::fast_io::arithmetic_char_literal_v<u8'b', wchar_t>;

Use arithmetic_char_literal_v when you have a compile‑time char8_t literal and need a value suitable for arithmetic in the execution charset.

4. Basic classification

Each classification function takes a single character and returns true or false.


bool b1 = ::fast_io::char_category::is_c_lower('a');
bool b2 = ::fast_io::char_category::is_c_upper('A');
bool b3 = ::fast_io::char_category::is_c_digit('7');
bool b4 = ::fast_io::char_category::is_c_space(' ');

5. Using char_category with ::fast_io::string

::fast_io::string stores characters contiguously, so you can iterate through it and apply any classification function.


::fast_io::string s{"Hello World"};

for(char ch : s)
{
    if(::fast_io::char_category::is_c_lower(ch))
    {
        // lowercase letter
    }
}

6. Example: counting lowercase letters


::fast_io::string s{"Hello fast_io!"};

std::size_t count{};

for(char ch : s)
{
    if(::fast_io::char_category::is_c_lower(ch))
    {
        ++count;
    }
}

7. Why not compare ranges directly?

You might try to detect lowercase letters by comparing against a range:


if(ch >= ::fast_io::char_literal_v<u8'a', char> &&
   ch <= ::fast_io::char_literal_v<u8'z', char>)
{
    ++count;
}

Even though this uses char_literal_v and respects the execution charset for the endpoints, it still assumes that all lowercase letters form a single contiguous range. This is true for ASCII, but it is not guaranteed for all execution charsets (such as EBCDIC).

Always prefer:


if(::fast_io::char_category::is_c_lower(ch))
{
    ++count;
}

is_c_lower is implemented with correct knowledge of the execution charset and does not rely on naïve range assumptions.

8. Example: filtering alphabetic characters


::fast_io::string input{"Hello 123 World!"};
::fast_io::string letters{};

for(char ch : input)
{
    if(::fast_io::char_category::is_c_alpha(ch))
    {
        letters.push_back(ch);
    }
}

9. Transforming an entire string

The ::fast_io::char_category::ranges namespace provides functions that operate on an entire range at once.


::fast_io::string s{"Hello FAST_IO!"};

::fast_io::char_category::ranges::to_c_lower(s);

After this call, s becomes "hello fast_io!".


::fast_io::char_category::ranges::to_c_upper(s);
::fast_io::char_category::ranges::to_c_halfwidth(s);

10. Character classification functions

FunctionDescription
is_c_alnumLetter or digit
is_c_alphaAlphabetic letter
is_c_blankSpace or tab
is_c_cntrlControl character
is_c_digitDigit 0–9
is_c_fullwidthFull‑width character
is_c_graphVisible (non‑space) character
is_c_halfwidthHalf‑width character
is_c_lowerLowercase letter
is_c_printPrintable character
is_c_punctPunctuation
is_c_spaceWhitespace
is_c_upperUppercase letter
is_c_xdigitHex digit
is_html_whitespaceHTML whitespace
is_dos_file_invalid_characterInvalid in DOS filenames

11. Character transformation functions

FunctionDescription
to_c_lowerConvert a single character to lowercase
to_c_upperConvert a single character to uppercase
to_c_halfwidthConvert a single character to half‑width
ranges::to_c_lowerConvert an entire range to lowercase
ranges::to_c_upperConvert an entire range to uppercase
ranges::to_c_halfwidthConvert an entire range to half‑width

12. Using char_category_family and char_category_traits

12.1 char_category_family definition


enum class char_category_family : ::std::uint_least32_t
{
    c_alnum,                    // Alphanumeric characters (letters and digits)
    c_alpha,                    // Alphabetic Character
    c_blank,                    // Space or tab
    c_cntrl,                    // Control characters (ASCII 0x00-0x1F, 0x7F or EBCDIC equivalents)
    c_digit,                    // Numeric digits (0-9)
    c_fullwidth,                // Full-width character
    c_graph,                    // Graphical characters (alphanumeric + punctuation)
    c_halfwidth,                // Half-width character
    c_lower,                    // Lowercase alphabetic characters
    c_print,                    // Printable characters (includes space)
    c_punct,                    // Punctuation characters
    c_space,                    // Whitespace characters (space, tab, newline, etc.)
    c_upper,                    // Uppercase alphabetic characters
    c_xdigit,                   // Hexadecimal digits (0-9, A-F, a-f)
    dos_file_invalid_character, // DOS Path invalid character
    html_whitespace             // HTML whitespace
};

12.2 Creating a classifier


using lower_pred =
    ::fast_io::char_category::char_category_traits<
        ::fast_io::char_category::char_category_family::c_lower,
        false
    >;

lower_pred pred{};
bool a = pred('a');   // true
bool b = pred('Z');   // false

12.3 Negated classifiers


using not_lower_pred =
    ::fast_io::char_category::char_category_traits<
        ::fast_io::char_category::char_category_family::c_lower,
        true
    >;

not_lower_pred pred{};
bool a = pred('A');   // true (because 'A' is NOT lowercase)

12.4 Using traits with ::fast_io::string


::fast_io::string s{"Hello 123 World"};

using digit_pred =
    ::fast_io::char_category::char_category_traits<
        ::fast_io::char_category::char_category_family::c_digit,
        false
    >;

auto it = digit_pred::find(s.begin(), s.end());
if(it != s.end())
{
    // *it is the first digit in the string
}

This gives you a flexible, generic way to classify and search text while still respecting the execution charset.

Key takeaways