5.3 Data Types
Rust is a statically and strongly typed language, meaning that the type of each variable is known at compile time and cannot change. This ensures both performance and safety. In statically typed languages like Rust, many errors are caught early at compile time, reducing runtime errors that might otherwise occur in dynamically typed languages. Additionally, strong typing enforces that operations on data are well-defined, avoiding unexpected behavior from implicit type conversions common in weakly typed languages. These characteristics allow Rust to produce highly efficient machine code, with direct support for many of its types in primitive CPU instructions, leading to predictable performance, especially in systems programming.
5.3.1 Scalar Types
Rust's scalar types are the simplest types, representing single values. They are analogous to the basic types in C, with some notable differences. Rust’s scalar types are categorized as integers, floating-point numbers, booleans, and characters. Here’s how they compare to C types:
Integers
Rust offers a wide range of integer types, both signed and unsigned, similar to C but with stricter definitions of behavior. In C, integer sizes can sometimes be platform-dependent, whereas Rust defines its types clearly, ensuring predictable size and behavior across platforms.
For fixed-size integer types, Rust uses type names that specify both the size and whether the type is signed or unsigned. Signed types begin with i
, while unsigned types begin with u
, followed by the number of bits they occupy. The available integer types are:
i8
,i16
,i32
,i64
, andi128
for signed integers (ranging from 8-bit to 128-bit).u8
,u16
,u32
,u64
, andu128
for unsigned integers.
By default, Rust uses the 32-bit signed integer type i32
for integer literals if no specific type is annotated. This default strikes a balance between memory usage and performance for most use cases.
usize
and isize
: Rust introduces two integer types that are specifically tied to the architecture of the machine. The usize
type is an unsigned integer, and the isize
type is a signed integer. These types are used in situations where the size of memory addresses is important, such as array indexing and pointer arithmetic. On a 64-bit system, usize
and isize
are 64 bits wide, while on a 32-bit system, they are 32 bits wide. The actual size is determined by the target architecture of the compiled program. These types are particularly useful in systems programming for tasks that involve memory management or when dealing with collections where the index size is architecture-dependent. Notably, usize
is the default type for indexing arrays and other collections in Rust, and you cannot use other integer types like i32
for indexing without an explicit cast.
Floating-Point Numbers
Rust follows the IEEE 754 standard for floating-point types, similar to C, but ensures stricter error handling and precision guarantees. Rust also uses clear type names for its floating-point types, which specify the bit size:
f32
for a 32-bit floating-point number.f64
for a 64-bit floating-point number (the default).
Rust defaults to f64
(64-bit) for floating-point numbers, as it provides better precision and is generally optimized for performance on modern processors. The explicit naming of floating-point types helps avoid confusion and ensures consistent behavior across platforms.
Booleans and Characters
-
Boolean (
bool
): Rust’s boolean type (bool
) is always 1 byte in size, even though it represents a value oftrue
orfalse
. While it might seem more efficient to represent a boolean as a single bit, modern CPUs generally operate more efficiently with byte-aligned memory. Using a full byte for a boolean simplifies memory access and allows for faster processing, particularly in situations where the boolean is stored in arrays or structs. -
Character (
char
): The character type (char
) in Rust represents a Unicode scalar value, differing from C’schar
, which holds a single byte (ASCII or UTF-8). Rust’schar
is 4 bytes, allowing for full Unicode support. This means it can represent characters from virtually any language, including emoji.
Scalar Types Table
Rust Type | Size | Range | Equivalent C Type | Comment |
---|---|---|---|---|
i8 | 8 bits | -128 to 127 | int8_t | Signed 8-bit integer |
u8 | 8 bits | 0 to 255 | uint8_t | Unsigned 8-bit integer |
i16 | 16 bits | -32,768 to 32,767 | int16_t | Signed 16-bit integer |
u16 | 16 bits | 0 to 65,535 | uint16_t | Unsigned 16-bit integer |
i32 | 32 bits | -2,147,483,648 to 2,147,483,647 | int32_t | Signed 32-bit integer (default integer type) |
u32 | 32 bits | 0 to 4,294,967,295 | uint32_t | Unsigned 32-bit integer |
i64 | 64 bits | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | int64_t | Signed 64-bit integer |
u64 | 64 bits | 0 to 18,446,744,073,709,551,615 | uint64_t | Unsigned 64-bit integer |
isize | Platform Dependent | Varies based on architecture (32-bit or 64-bit) | intptr_t | Signed pointer-sized integer |
usize | Platform Dependent | Varies based on architecture (32-bit or 64-bit) | uintptr_t | Unsigned pointer-sized integer |
f32 | 32 bits | ~1.4E-45 to ~3.4E+38 | float | 32-bit floating point, IEEE 754 |
f64 | 64 bits | ~5E-324 to ~1.8E+308 | double | 64-bit floating point (default) |
bool | 1 byte | true or false | _Bool | Boolean type, always 1 byte |
char | 4 bytes | Unicode scalar value (0 to 0x10FFFF) | None (C’s char is 1 byte) | Represents a Unicode character |
5.3.2 Primitive Compound Types: Tuple and Array
Rust also provides compound types, which allow you to group multiple values into a single type. The two most basic compound types are tuples and arrays.
Note that "tuple" and "array" are not Rust keywords, meaning they can be used as variable names.
Tuple
A tuple is a fixed-size collection of values of various types. In Rust, tuples are often used when you want to return multiple values from a function without using a struct. Since tuples may be unfamiliar to those coming from C or other languages that lack this data type, we will explore them in more detail.
Tuple Type Syntax
In Rust, a tuple's type is defined by listing the types of its elements within parentheses ()
, separated by commas. This defines the exact types and the number of elements the tuple will hold.
Example:
(i32, f64, char)
This tuple type consists of three elements:
- An
i32
(32-bit signed integer) - An
f64
(64-bit floating-point number) - A
char
(Unicode scalar value)
Tuple Value Syntax
To create a tuple value, you use the same parentheses ()
and provide the actual values, again separated by commas.
Example:
(500, 6.4, 'x')
This creates a tuple value with:
- The integer
500
- The floating-point number
6.4
- The character
'x'
Note on Single-Element Tuples and the Unit Type:
-
Singleton Tuples: To define a tuple with a single element, include a trailing comma to differentiate it from a value in parentheses.
#![allow(unused)] fn main() { let single_element_tuple = (5,); // A tuple containing one element let not_a_tuple = (5); // Just the value 5 in parentheses }
-
Unit Type
()
: The unit type is a special tuple with zero elements, represented by()
.#![allow(unused)] fn main() { let unit: () = (); // The unit type }
- Functions that don't return a value actually return the unit type
()
.
- Functions that don't return a value actually return the unit type
Combining Type Annotation and Value Assignment
When declaring a tuple variable with an explicit type and initializing it with values, you write:
#![allow(unused)] fn main() { let tuple: (i32, f64, char) = (500, 6.4, 'x'); }
let tuple:
Declares a new variable namedtuple
.(i32, f64, char)
Specifies the tuple's type.=
Assigns the value to the variable.(500, 6.4, 'x')
Provides the tuple's initial values.
This line tells Rust to create a variable tuple
that holds a tuple of type (i32, f64, char)
initialized with the values (500, 6.4, 'x')
. In this example, the tuple is initialized with constant values, but it is more common to use values evaluated at runtime.
Accessing Tuple Elements
Accessing individual elements of a tuple is done using dot notation followed by the index of the element, starting from zero. However, tuples can only be indexed using constants known at compile time. You cannot dynamically loop over a tuple’s components by index because each element may be of a different type.
Example:
#![allow(unused)] fn main() { let tuple: (i32, f64, char) = (500, 6.4, 'x'); let first_element = tuple.0; // Accesses the first element (500) let second_element = tuple.1; // Accesses the second element (6.4) let third_element = tuple.2; // Accesses the third element ('x') }
Mutability and Assignment of Tuple Elements
By default, variables in Rust are immutable. If you want to modify the elements of a tuple after its creation, you need to declare it as mutable using the mut
keyword.
Example:
#![allow(unused)] fn main() { let mut tuple = (500, 6.4, 'x'); tuple.0 = 600; // Changes the first element to 600 }
Important Notes:
-
Fixed Size and Types: Tuples have a fixed size, and their types are known at compile time. You cannot add or remove elements once the tuple is created.
-
Assignment at Creation: You must provide all the values for the tuple when you create it. You cannot declare an empty tuple and fill in its elements later.
This will NOT work:
// Attempting to declare an uninitialized tuple (Not allowed) let mut tuple: (i32, f64, char); tuple.0 = 500; // Error: tuple is not initialized
-
Assignment Step by Step: Rust does not allow assigning to individual tuple elements to build up the tuple after declaration without initial values.
Destructuring Tuples
It’s not possible to loop through a tuple’s elements by index, but you can unpack or "destructure" a tuple into individual variables for easier access.
Example:
#![allow(unused)] fn main() { let tuple: (i32, f64, char) = (500, 6.4, 'x'); let (x, y, z) = tuple; println!("x = {}, y = {}, z = {}", x, y, z); }
This assigns:
x
totuple.0
(500)y
totuple.1
(6.4)z
totuple.2
('x')
Memory Layout of Tuples
- Contiguous Memory: Tuples in Rust are stored contiguously in memory, meaning that all the elements of the tuple are laid out sequentially in a single block of memory.
- Element Order: The elements are stored in the order they are defined in the tuple.
- Alignment and Padding: Due to differing sizes and alignment requirements of the elements, there may be padding bytes inserted between elements to satisfy alignment constraints. This can lead to the tuple occupying more memory than the simple sum of the sizes of its elements.
Tuples in Functions
Tuples are often used to return multiple values from a function.
Example:
#![allow(unused)] fn main() { fn calculate(x: i32, y: i32) -> (i32, i32) { (x + y, x * y) } let (sum, product) = calculate(5, 10); println!("Sum = {}, Product = {}", sum, product); }
- The
calculate
function returns a tuple containing the sum and product of two numbers. - Destructuring is used to unpack the returned tuple.
Functions will be covered in full detail in a later chapter.
Comparison to C
In C, you might use structs
to group different types together. However, structs in C require you to define a new type with named fields, whereas Rust's tuples are anonymous and access their elements by position.
C Struct Example:
struct Tuple {
int a;
double b;
char c;
};
struct Tuple tuple = {500, 6.4, 'x'};
In C, you can assign to the fields individually after declaration because the struct has named fields.
Rust Equivalent with Structs:
If you need similar functionality in Rust (e.g., assigning values to fields individually), you might define a struct.
Rust Struct Example:
#![allow(unused)] fn main() { struct TupleStruct { a: i32, b: f64, c: char, } let mut tuple = TupleStruct { a: 0, b: 0.0, c: '\0' }; tuple.a = 500; tuple.b = 6.4; tuple.c = 'x'; }
We will cover the Rust struct type in greater detail in a later chapter.
When to Use Tuples vs. Structs
-
Tuples: Best when you have a small, fixed set of elements with different types and you don't need to refer to the elements by name.
-
Structs: Preferable when you need to:
- Assign or modify fields individually after creation.
- Access elements by names for clarity.
- Have more complex data structures.
Traits Implemented by Tuples
Tuples implement several traits if their component types implement them. For example, if all elements implement the Copy
trait, the tuple will also implement Copy
.
This is useful when you need to copy tuples without moving ownership.
The next chapter will cover ownership, references, borrowing, and move semantics, while Rust's traits will be discussed later.
Summary
-
Tuple Type Syntax:
(Type1, Type2, Type3)
-
Tuple Value Syntax:
(value1, value2, value3)
-
Declaration and Initialization: Must provide all elements at creation.
#![allow(unused)] fn main() { let tuple: (i32, f64, char) = (500, 6.4, 'x'); }
-
Mutability: Use
mut
to make the tuple mutable if you need to modify its elements.#![allow(unused)] fn main() { let mut tuple = (500, 6.4, 'x'); tuple.0 = 600; }
-
Accessing Elements: Use dot notation with the index.
#![allow(unused)] fn main() { let mut tuple = (500, 6.4, 'x'); let value = tuple.1; // Accesses the second element }
-
Destructuring: Unpack the tuple into variables.
#![allow(unused)] fn main() { let mut tuple = (500, 6.4, 'x'); let (a, b, c) = tuple; }
-
Fixed Size and Types: Cannot change the size or types of a tuple after creation.
Conclusion
In Rust, tuples are a simple way to group a few values with different types.
Array
An array in Rust is a fixed-size collection of elements of the same type, much like an array in C. Arrays are stored on the stack and are ideal when you know the size at compile time. Rust arrays are more strict than C arrays, enforcing bounds checking at runtime, which prevents out-of-bounds memory access, a common source of bugs in C.
Array Type and Initialization Syntax
In Rust, you declare an array's type using square brackets []
, specifying the element type and the array's length.
Syntax:
let array: [Type; Length] = [value1, value2, value3, ...];
[Type; Length]
: Specifies an array of elements ofType
with a fixedLength
.[value1, value2, value3, ...]
: Provides the initial values for the array elements.
Example:
#![allow(unused)] fn main() { let array: [i32; 3] = [1, 2, 3]; }
[i32; 3]
: An array ofi32
integers with 3 elements.[1, 2, 3]
: Initializes the array with values1
,2
, and3
.
Arrays with Arbitrary Values
In Rust, arrays can be initialized with values that are the result of expressions, not just literals or constants.
Example:
#![allow(unused)] fn main() { let x = 5; let y = x * 2; let array: [i32; 3] = [x, y, x + y]; }
This demonstrates that you can use any valid expression to initialize the elements of an array, providing flexibility in how you construct arrays.
Initializing Arrays with Default Values
You can initialize an array where all elements have the same value using the following syntax:
let array = [initial_value; array_length];
Example:
#![allow(unused)] fn main() { let zeros = [0; 5]; // Creates an array [0, 0, 0, 0, 0] }
This is particularly useful when you need an array filled with a default value.
Type Inference and Initialization
Rust often allows you to omit the type annotation if it can infer the type from the context.
Example with Type Inference:
#![allow(unused)] fn main() { let array = [1, 2, 3]; }
- Rust infers that
array
is of type[i32; 3]
because all elements arei32
literals and there are three of them.
Alternatively, you could use type inference in combination with an explicit type for one of the elements:
#![allow(unused)] fn main() { let array = [1u8, 2, 3]; }
Accessing Array Elements
To access elements of an array, you use indexing using square brackets []
with the index of the element, starting from zero. Arrays can be indexed by either compile-time constants or runtime-evaluated values, as long as the index is of type usize
.
#![allow(unused)] fn main() { let array: [i32; 3] = [1, 2, 3]; let index = 1; let second = array[index]; println!("Second element is {}", second); }
- Indexing starts at
0
, as in C. - Indices must be of type
usize
.
Bounds Checking
Unlike C, Rust performs runtime bounds checking on array accesses. If you attempt to access an index outside the array's bounds, Rust will panic and safely abort the program, preventing undefined behavior.
Example of Out-of-Bounds Access:
#![allow(unused)] fn main() { let array = [1, 2, 3]; let i = 3; // Content of variable i is evaluated at runtime let invalid = array[i]; // Panics at runtime: index out of bounds }
To safely handle potential out-of-bounds access, you can use the get
method, which returns an Option<&T>
:
#![allow(unused)] fn main() { if let Some(value) = array.get(3) { println!("Value: {}", value); } else { println!("Index out of bounds"); } }
Iterating Over Arrays
You can iterate over arrays using loops.
Using a for
Loop:
#![allow(unused)] fn main() { let array = [1, 2, 3]; for element in array.iter() { println!("Element: {}", element); } }
array.iter()
: Returns an iterator over the array's elements.
Using Indices:
#![allow(unused)] fn main() { let array = [1, 2, 3]; for i in 0..array.len() { println!("Element {}: {}", i, array[i]); } }
0..array.len()
: Creates a range from0
up to (but not including)array.len()
.array.len()
: Returns the number of elements in the array.
Memory Layout of Arrays
- Homogeneous Elements: Arrays contain elements of the same type, size, and alignment.
- Contiguous Memory: Stored in a single contiguous block without padding between elements (since all elements have the same alignment).
- Predictable Layout: Memory layout is straightforward because each element follows the previous one without any padding.
Arrays in Functions
Arrays can be passed to functions, but since the size is part of the array's type, it's often more flexible to use slices.
Example Using a Slice:
#![allow(unused)] fn main() { fn sum(array: &[i32]) -> i32 { array.iter().sum() } let array = [1, 2, 3]; let total = sum(&array); println!("Total sum is {}", total); }
- Slices allow functions to accept arrays of any size, as long as the element type matches.
Slices
A slice is a view into a block of memory represented as a pointer and a length. Slices can be used to reference a portion of an array or vector.
Example of Creating a Slice from an Array:
#![allow(unused)] fn main() { let array = [1, 2, 3, 4, 5]; let slice = &array[1..4]; // Slice containing elements [2, 3, 4] }
- Slices are similar to pointers in C but include length information, enhancing safety.
Slices and the use of the ampersand (&
) to denote references will be explored in greater detail in the next chapter of the book.
Mutable Arrays
By default, variables in Rust are immutable. To modify the contents of an array, declare it as mutable using the mut
keyword.
Example:
#![allow(unused)] fn main() { let mut array = [1, 2, 3]; array[0] = 10; // Now array is [10, 2, 3] }
Arrays with const
Length
The length of an array in Rust must be a constant value known at compile time.
Example with Constant Length:
#![allow(unused)] fn main() { const SIZE: usize = 3; let array: [i32; SIZE] = [1, 2, 3]; }
Multidimensional Arrays
Rust supports multidimensional arrays by nesting arrays within arrays.
Example of a 2D Array:
#![allow(unused)] fn main() { let matrix: [[i32; 3]; 2] = [ [1, 2, 3], [4, 5, 6], ]; }
- This creates a 2x3 matrix (2 rows, 3 columns).
Traits Implemented by Arrays
Arrays implement several traits if their element types implement them. For example, if the elements implement Copy
, the array will also implement Copy
.
Example:
#![allow(unused)] fn main() { fn duplicate(array: [i32; 3]) -> [i32; 3] { array // Copies the array because i32 implements Copy } }
This allows arrays to be copied by value, similar to how structs can derive the Copy
trait.
Comparing Rust and C Arrays
- Fixed Size: Both Rust and C arrays have a fixed size known at compile time.
- Type Safety: Rust arrays are type-safe; all elements must be of the same type.
- Bounds Checking: Rust performs bounds checking at runtime, preventing out-of-bounds memory access—a common issue in C.
- Memory Location: Rust arrays are stored on the stack by default.
- Slices: Rust introduces slices for safe and flexible array access, which is not directly available in C.
Other Initialization Methods
Initializing Arrays Without Specifying All Elements:
In Rust, you must initialize all elements of an array. Unlike in C, where uninitialized elements might be set to zero or garbage values, Rust requires explicit initialization.
Example:
#![allow(unused)] fn main() { let array: [i32; 5] = [1, 2, 3, 0, 0]; }
- Manually specify default values for unspecified elements.
Summary
- Declaration Syntax:
let array: [Type; Length] = [values];
- Type Inference: Rust can infer the type and length based on the values provided.
- Initialization with Default Values:
let array = [initial_value; array_length];
- Accessing Elements: Use
array[index]
, whereindex
is of typeusize
. - Mutability: Declare with
mut
if you need to modify elements after creation. - Bounds Checking: Rust checks array bounds at runtime to prevent invalid access.
- Iteration: Use loops to iterate over elements or indices.
- Slices: Use slices for flexible and safe access to arrays.
When to Use Tuples, Arrays, and Vectors
Rust provides tuples, arrays, and vectors to group multiple values, each serving distinct purposes:
-
Tuples: Fixed-size collections that can hold elements of different types. Ideal for grouping a small, fixed number of related values where each position has a specific meaning.
-
Arrays: Fixed-size collections of elements of the same type. Suitable for handling a known number of homogeneous items, allowing efficient indexed access and iteration.
-
Vectors (
Vec<T>
): Growable arrays stored on the heap. Use when you need a collection of elements of the same type, but the size can change at runtime.
Key Differences:
-
Homogeneity:
- Tuples: Heterogeneous elements (different types).
- Arrays and Vectors: Homogeneous elements (same type).
-
Size:
- Tuples and Arrays: Fixed size known at compile time.
- Vectors: Dynamic size that can grow or shrink at runtime.
-
Usage Scenarios:
- Tuples: Grouping related values with different types or meanings, like coordinates
(x, y)
. - Arrays: Collections of fixed-size homogeneous data, like days in a week.
- Vectors: Collections where the number of elements isn't known at compile time or can change, like lines read from a file.
- Tuples: Grouping related values with different types or meanings, like coordinates
Examples:
-
Tuple:
#![allow(unused)] fn main() { let point = (10.0, 20.0); // x and y coordinates let (x, y) = point; // Destructure into variables }
-
Array:
#![allow(unused)] fn main() { let weekdays = ["Mon", "Tue", "Wed", "Thu", "Fri"]; for day in weekdays.iter() { println!("{}", day); } }
-
Vector:
#![allow(unused)] fn main() { let mut numbers = vec![1, 2, 3]; numbers.push(4); // Now numbers is [1, 2, 3, 4] }
Choosing the Right Type:
- Use tuples when you have a small, fixed set of values with possibly different types or meanings.
- Use arrays when you have a fixed-size collection of the same type and need efficient access or iteration.
- Use vectors when dealing with a collection that can change in size.
Tuples or Arrays as Function Return Types
When a function needs to return multiple values, the choice between tuples and arrays depends on the nature of the data:
-
Tuples are preferable when:
- Returning a fixed number of values with distinct meanings.
- Each value may represent a different concept, even if they're the same type.
- You want to leverage destructuring for clarity.
Example: Returning Coordinates
#![allow(unused)] fn main() { fn get_coordinates() -> (f64, f64) { (10.0, 20.0) } let (x, y) = get_coordinates(); println!("x = {}, y = {}", x, y); }
-
Arrays are suitable when:
- Returning a fixed-size collection of homogeneous values.
- The elements represent the same kind of data.
- You might need to iterate over the elements.
Example: Returning a Row of Data
#![allow(unused)] fn main() { fn get_row() -> [i32; 3] { [1, 2, 3] } let row = get_row(); for value in row.iter() { println!("{}", value); } }
Why Choose Tuples for Coordinates:
- Semantic Clarity: Destructuring tuples into variables like
x
andy
makes the code more readable and self-explanatory. - Distinct Meanings: Even if
x
andy
are the same type, they represent different dimensions.
Alternative with Structs:
For enhanced clarity and scalability, especially with more complex data, consider using a struct
:
#![allow(unused)] fn main() { struct Coordinates { x: f64, y: f64, } fn get_coordinates() -> Coordinates { Coordinates { x: 10.0, y: 20.0 } } let coords = get_coordinates(); println!("x = {}, y = {}", coords.x, coords.y); }
- Advantages:
- Named Fields: Clearly indicate what each value represents.
- Extensibility: Easy to add more fields (e.g.,
z
for 3D coordinates). - Methods: Ability to implement associated functions or methods.
Summary:
- Use Tuples when returning multiple values with different meanings or when you want to unpack values into variables with meaningful names.
- Use Arrays when returning a collection of similar items that might be processed collectively.
- Use Structs for even greater clarity and when you might need to expand functionality.
Final Recommendation:
- For returning pairs like coordinates, tuples offer a good balance between simplicity and clarity.
- For collections of homogeneous data where iteration is needed, arrays (or vectors if the size is dynamic) are more appropriate.
By choosing the most suitable data structure, you enhance code readability, maintainability, and safety, aligning with Rust's emphasis on clarity and reliability.
Note: Using very large arrays can cause a stack overflow, as the default stack size is usually limited to a few megabytes and varies depending on the operating system. For large collections, consider using a vector (
Vec<T>
) instead.
Stack vs. Heap Allocation
Rust's primitive types—scalars, tuples, and arrays—are typically stack-allocated, providing fast access due to their predictability and locality. Rust takes advantage of direct CPU support for many of these primitive types, optimizing them for performance.
On the other hand, dynamic types like vectors (Vec<T>
) and dynamically sized strings (String
) use heap allocation to store their data, allowing for flexible and dynamic resizing at runtime. This heap allocation introduces overhead but is necessary for handling collections of unknown size at compile time.