5.2 Identifiers and Allowed Characters
In Rust, most item names (such as type names, module names, function names, and variable names) can use a wide range of Unicode characters, with a few important restrictions:
- First Character: Must be either an underscore (
_
) or a Unicode character in the XID_Start category (which includes letters from many alphabets around the world, such as Latin, Greek, and Cyrillic). - Subsequent Characters: May include characters in the XID_Continue category or
_
. This means letters, many diacritics, and certain numeric characters are generally allowed, but spaces, punctuation, and symbols like#
,?
, or!
are not. - Digits: Cannot appear as the first character unless used via raw identifiers (e.g.,
r#1variable
—though such usage is discouraged). After the first character, many scripts’ numeric characters are valid if they fall withinXID_Continue
, but standard ASCII digits (0-9
) still require that the first character be non-numeric. - Keywords: You cannot reuse Rust keywords (like
fn
,enum
, ormod
) as identifiers unless you use raw identifiers (prefixing withr#
), which override the keyword restriction. - Length and Encoding: Identifiers must be valid UTF-8 and cannot contain whitespace. There is no explicit limit on length, although extremely long names may affect readability and compilation time.
These rules let you write expressive identifiers in many languages or scripts while avoiding ambiguity in Rust syntax. For most English-based code, the practical rule is that identifiers can start with a letter or underscore, followed by letters, digits, or underscores—but Rust’s support extends well beyond ASCII.
Most Rust entities—such as keywords, as well as the names of modules, functions, variables, and primitive types—conventionally begin with a lowercase letter. In contrast, standard library types like Vec
and String
, user-defined types, constants, and global variables (statics) start with an uppercase letter.