Ch4.2: ::fast_io::char_category
Overview
The ::fast_io::char_category module provides a set of
constexpr character classification and transformation functions.
These functions determine whether a character is lowercase, uppercase, a digit,
whitespace, punctuation, and more.
All functions follow the execution charset of the program. This ensures correct behavior on both ASCII-based and EBCDIC-based systems.
1. Source charset vs execution charset
C++ distinguishes between:
- source charset — how characters in your source file are encoded
- execution charset — how characters are represented at runtime
::fast_io::char_category always uses the execution charset.
This means:
- If
'A'has value0x41, ASCII rules apply. - If
'A'has an EBCDIC value (not0x41), EBCDIC rules apply. - All other charsets (UTF‑8, GB18030, Shift‑JIS, etc.) use the ASCII rule for
code points
[0, 127].
2. ::fast_io::char_literal_v and char_literal
The primary way to create a character constant that respects the execution
charset is ::fast_io::char_literal_v. It takes a char8_t
value at compile time and produces a character in the execution charset.
char c = ::fast_io::char_literal_v<u8'a', char>;
Use this form whenever you have a compile‑time character literal.
There is also a function form ::fast_io::char_literal that takes
a char8_t value at runtime:
char8_t runtime_ch{/* ... */};
char c = ::fast_io::char_literal<char>(runtime_ch);
This version is more flexible but may be slower, because it must execute at
runtime. Prefer char_literal_v when you have a literal known at
compile time.
3. ::fast_io::arithmetic_char_literal_v and arithmetic_char_literal
::fast_io::arithmetic_char_literal behaves like
char_literal, but is intended for arithmetic expressions,
especially when wchar_t uses a non‑UTF execution charset.
char8_t runtime_ch{/* ... */};
char c = ::fast_io::arithmetic_char_literal<char>(runtime_ch);
The shorthand form arithmetic_char_literal_v is the
compile‑time variant:
wchar_t w = ::fast_io::arithmetic_char_literal_v<u8'b', wchar_t>;
Use arithmetic_char_literal_v when you have a compile‑time
char8_t literal and need a value suitable for arithmetic in
the execution charset.
4. Basic classification
Each classification function takes a single character and returns
true or false.
bool b1 = ::fast_io::char_category::is_c_lower('a');
bool b2 = ::fast_io::char_category::is_c_upper('A');
bool b3 = ::fast_io::char_category::is_c_digit('7');
bool b4 = ::fast_io::char_category::is_c_space(' ');
5. Using char_category with ::fast_io::string
::fast_io::string stores characters contiguously, so you can
iterate through it and apply any classification function.
::fast_io::string s{"Hello World"};
for(char ch : s)
{
if(::fast_io::char_category::is_c_lower(ch))
{
// lowercase letter
}
}
6. Example: counting lowercase letters
::fast_io::string s{"Hello fast_io!"};
std::size_t count{};
for(char ch : s)
{
if(::fast_io::char_category::is_c_lower(ch))
{
++count;
}
}
7. Why not compare ranges directly?
You might try to detect lowercase letters by comparing against a range:
if(ch >= ::fast_io::char_literal_v<u8'a', char> &&
ch <= ::fast_io::char_literal_v<u8'z', char>)
{
++count;
}
Even though this uses char_literal_v and respects the execution
charset for the endpoints, it still assumes that all lowercase letters form
a single contiguous range. This is true for ASCII, but it is not
guaranteed for all execution charsets (such as EBCDIC).
Always prefer:
if(::fast_io::char_category::is_c_lower(ch))
{
++count;
}
is_c_lower is implemented with correct knowledge of the execution
charset and does not rely on naïve range assumptions.
8. Example: filtering alphabetic characters
::fast_io::string input{"Hello 123 World!"};
::fast_io::string letters{};
for(char ch : input)
{
if(::fast_io::char_category::is_c_alpha(ch))
{
letters.push_back(ch);
}
}
9. Transforming an entire string
The ::fast_io::char_category::ranges namespace provides
functions that operate on an entire range at once.
::fast_io::string s{"Hello FAST_IO!"};
::fast_io::char_category::ranges::to_c_lower(s);
After this call, s becomes "hello fast_io!".
::fast_io::char_category::ranges::to_c_upper(s);
::fast_io::char_category::ranges::to_c_halfwidth(s);
10. Character classification functions
| Function | Description |
|---|---|
is_c_alnum | Letter or digit |
is_c_alpha | Alphabetic letter |
is_c_blank | Space or tab |
is_c_cntrl | Control character |
is_c_digit | Digit 0–9 |
is_c_fullwidth | Full‑width character |
is_c_graph | Visible (non‑space) character |
is_c_halfwidth | Half‑width character |
is_c_lower | Lowercase letter |
is_c_print | Printable character |
is_c_punct | Punctuation |
is_c_space | Whitespace |
is_c_upper | Uppercase letter |
is_c_xdigit | Hex digit |
is_html_whitespace | HTML whitespace |
is_dos_file_invalid_character | Invalid in DOS filenames |
11. Character transformation functions
| Function | Description |
|---|---|
to_c_lower | Convert a single character to lowercase |
to_c_upper | Convert a single character to uppercase |
to_c_halfwidth | Convert a single character to half‑width |
ranges::to_c_lower | Convert an entire range to lowercase |
ranges::to_c_upper | Convert an entire range to uppercase |
ranges::to_c_halfwidth | Convert an entire range to half‑width |
12. Using char_category_family and char_category_traits
12.1 char_category_family definition
enum class char_category_family : ::std::uint_least32_t
{
c_alnum, // Alphanumeric characters (letters and digits)
c_alpha, // Alphabetic Character
c_blank, // Space or tab
c_cntrl, // Control characters (ASCII 0x00-0x1F, 0x7F or EBCDIC equivalents)
c_digit, // Numeric digits (0-9)
c_fullwidth, // Full-width character
c_graph, // Graphical characters (alphanumeric + punctuation)
c_halfwidth, // Half-width character
c_lower, // Lowercase alphabetic characters
c_print, // Printable characters (includes space)
c_punct, // Punctuation characters
c_space, // Whitespace characters (space, tab, newline, etc.)
c_upper, // Uppercase alphabetic characters
c_xdigit, // Hexadecimal digits (0-9, A-F, a-f)
dos_file_invalid_character, // DOS Path invalid character
html_whitespace // HTML whitespace
};
12.2 Creating a classifier
using lower_pred =
::fast_io::char_category::char_category_traits<
::fast_io::char_category::char_category_family::c_lower,
false
>;
lower_pred pred{};
bool a = pred('a'); // true
bool b = pred('Z'); // false
12.3 Negated classifiers
using not_lower_pred =
::fast_io::char_category::char_category_traits<
::fast_io::char_category::char_category_family::c_lower,
true
>;
not_lower_pred pred{};
bool a = pred('A'); // true (because 'A' is NOT lowercase)
12.4 Using traits with ::fast_io::string
::fast_io::string s{"Hello 123 World"};
using digit_pred =
::fast_io::char_category::char_category_traits<
::fast_io::char_category::char_category_family::c_digit,
false
>;
auto it = digit_pred::find(s.begin(), s.end());
if(it != s.end())
{
// *it is the first digit in the string
}
This gives you a flexible, generic way to classify and search text while still respecting the execution charset.
Key takeaways
::fast_io::char_categoryalways follows the execution charset.- EBCDIC systems use EBCDIC rules; all other systems use ASCII rules for code points
[0,127]. char_literal_vis the primary way to create execution‑charset literals fromchar8_t.char_literalandarithmetic_char_literalare for runtimechar8_tvalues.arithmetic_char_literal_vis useful for arithmetic with wide characters.- Classification and transformation functions are constexpr and predictable.
- Range‑based functions let you transform entire strings in place.
- Avoid manual range checks; use
is_c_lowerand related functions instead. char_category_familyandchar_category_traitslet you build generic, execution‑charset‑aware predicates.