Ch4.2: ::fast_io::char_category

Overview

The ::fast_io::char_category module provides a set of constexpr character classification and transformation functions. These functions determine whether a character is lowercase, uppercase, a digit, whitespace, punctuation, and more.

All functions follow the execution charset of the program. This ensures correct behavior on both ASCII-based and EBCDIC-based systems.

1. Source charset vs execution charset

C++ distinguishes between:

source charset — how characters in your source file are encoded
execution charset — how characters are represented at runtime

::fast_io::char_category always uses the execution charset.

This means:

If 'A' has value 0x41, ASCII rules apply.
If 'A' has an EBCDIC value (not 0x41), EBCDIC rules apply.
All other charsets (UTF‑8, GB18030, Shift‑JIS, etc.) use the ASCII rule for code points [0, 127].

2. ::fast_io::char_literal_v and char_literal

The primary way to create a character constant that respects the execution charset is ::fast_io::char_literal_v. It takes a char8_t value at compile time and produces a character in the execution charset.


char c = ::fast_io::char_literal_v<u8'a', char>;

Use this form whenever you have a compile‑time character literal.

There is also a function form ::fast_io::char_literal that takes a char8_t value at runtime:


char8_t runtime_ch{/* ... */};
char c = ::fast_io::char_literal<char>(runtime_ch);

This version is more flexible but may be slower, because it must execute at runtime. Prefer char_literal_v when you have a literal known at compile time.

3. ::fast_io::arithmetic_char_literal_v and arithmetic_char_literal

::fast_io::arithmetic_char_literal behaves like char_literal, but is intended for arithmetic expressions, especially when wchar_t uses a non‑UTF execution charset.


char8_t runtime_ch{/* ... */};
char c = ::fast_io::arithmetic_char_literal<char>(runtime_ch);

The shorthand form arithmetic_char_literal_v is the compile‑time variant:


wchar_t w = ::fast_io::arithmetic_char_literal_v<u8'b', wchar_t>;

Use arithmetic_char_literal_v when you have a compile‑time char8_t literal and need a value suitable for arithmetic in the execution charset.

4. Basic classification

Each classification function takes a single character and returns true or false.


bool b1 = ::fast_io::char_category::is_c_lower('a');
bool b2 = ::fast_io::char_category::is_c_upper('A');
bool b3 = ::fast_io::char_category::is_c_digit('7');
bool b4 = ::fast_io::char_category::is_c_space(' ');

5. Using char_category with ::fast_io::string

::fast_io::string stores characters contiguously, so you can iterate through it and apply any classification function.


::fast_io::string s{"Hello World"};

for(char ch : s)
{
    if(::fast_io::char_category::is_c_lower(ch))
    {
        // lowercase letter
    }
}

6. Example: counting lowercase letters


::fast_io::string s{"Hello fast_io!"};

std::size_t count{};

for(char ch : s)
{
    if(::fast_io::char_category::is_c_lower(ch))
    {
        ++count;
    }
}

7. Why not compare ranges directly?

You might try to detect lowercase letters by comparing against a range:


if(ch >= ::fast_io::char_literal_v<u8'a', char> &&
   ch <= ::fast_io::char_literal_v<u8'z', char>)
{
    ++count;
}

Even though this uses char_literal_v and respects the execution charset for the endpoints, it still assumes that all lowercase letters form a single contiguous range. This is true for ASCII, but it is not guaranteed for all execution charsets (such as EBCDIC).

Always prefer:


if(::fast_io::char_category::is_c_lower(ch))
{
    ++count;
}

is_c_lower is implemented with correct knowledge of the execution charset and does not rely on naïve range assumptions.

8. Example: filtering alphabetic characters


::fast_io::string input{"Hello 123 World!"};
::fast_io::string letters{};

for(char ch : input)
{
    if(::fast_io::char_category::is_c_alpha(ch))
    {
        letters.push_back(ch);
    }
}

9. Transforming an entire string

The ::fast_io::char_category::ranges namespace provides functions that operate on an entire range at once.


::fast_io::string s{"Hello FAST_IO!"};

::fast_io::char_category::ranges::to_c_lower(s);

After this call, s becomes "hello fast_io!".


::fast_io::char_category::ranges::to_c_upper(s);
::fast_io::char_category::ranges::to_c_halfwidth(s);

10. Character classification functions

Function	Description
`is_c_alnum`	Letter or digit
`is_c_alpha`	Alphabetic letter
`is_c_blank`	Space or tab
`is_c_cntrl`	Control character
`is_c_digit`	Digit 0–9
`is_c_fullwidth`	Full‑width character
`is_c_graph`	Visible (non‑space) character
`is_c_halfwidth`	Half‑width character
`is_c_lower`	Lowercase letter
`is_c_print`	Printable character
`is_c_punct`	Punctuation
`is_c_space`	Whitespace
`is_c_upper`	Uppercase letter
`is_c_xdigit`	Hex digit
`is_html_whitespace`	HTML whitespace
`is_dos_file_invalid_character`	Invalid in DOS filenames

11. Character transformation functions

Function	Description
`to_c_lower`	Convert a single character to lowercase
`to_c_upper`	Convert a single character to uppercase
`to_c_halfwidth`	Convert a single character to half‑width
`ranges::to_c_lower`	Convert an entire range to lowercase
`ranges::to_c_upper`	Convert an entire range to uppercase
`ranges::to_c_halfwidth`	Convert an entire range to half‑width

12. Using char_category_family and char_category_traits

12.1 char_category_family definition


enum class char_category_family : ::std::uint_least32_t
{
    c_alnum,                    // Alphanumeric characters (letters and digits)
    c_alpha,                    // Alphabetic Character
    c_blank,                    // Space or tab
    c_cntrl,                    // Control characters (ASCII 0x00-0x1F, 0x7F or EBCDIC equivalents)
    c_digit,                    // Numeric digits (0-9)
    c_fullwidth,                // Full-width character
    c_graph,                    // Graphical characters (alphanumeric + punctuation)
    c_halfwidth,                // Half-width character
    c_lower,                    // Lowercase alphabetic characters
    c_print,                    // Printable characters (includes space)
    c_punct,                    // Punctuation characters
    c_space,                    // Whitespace characters (space, tab, newline, etc.)
    c_upper,                    // Uppercase alphabetic characters
    c_xdigit,                   // Hexadecimal digits (0-9, A-F, a-f)
    dos_file_invalid_character, // DOS Path invalid character
    html_whitespace             // HTML whitespace
};

12.2 Creating a classifier


using lower_pred =
    ::fast_io::char_category::char_category_traits<
        ::fast_io::char_category::char_category_family::c_lower,
        false
    >;

lower_pred pred{};
bool a = pred('a');   // true
bool b = pred('Z');   // false

12.3 Negated classifiers


using not_lower_pred =
    ::fast_io::char_category::char_category_traits<
        ::fast_io::char_category::char_category_family::c_lower,
        true
    >;

not_lower_pred pred{};
bool a = pred('A');   // true (because 'A' is NOT lowercase)

12.4 Using traits with ::fast_io::string


::fast_io::string s{"Hello 123 World"};

using digit_pred =
    ::fast_io::char_category::char_category_traits<
        ::fast_io::char_category::char_category_family::c_digit,
        false
    >;

auto it = digit_pred::find(s.begin(), s.end());
if(it != s.end())
{
    // *it is the first digit in the string
}

This gives you a flexible, generic way to classify and search text while still respecting the execution charset.

Key takeaways

::fast_io::char_category always follows the execution charset.
EBCDIC systems use EBCDIC rules; all other systems use ASCII rules for code points [0,127].
char_literal_v is the primary way to create execution‑charset literals from char8_t.
char_literal and arithmetic_char_literal are for runtime char8_t values.
arithmetic_char_literal_v is useful for arithmetic with wide characters.
Classification and transformation functions are constexpr and predictable.
Range‑based functions let you transform entire strings in place.
Avoid manual range checks; use is_c_lower and related functions instead.
char_category_family and char_category_traits let you build generic, execution‑charset‑aware predicates.