5.3 Data Types

Rust is a statically and strongly typed language, meaning that the type of each variable is known at compile time and cannot change. This ensures both performance and safety. In statically typed languages like Rust, many errors are caught early at compile time, reducing runtime errors that might otherwise occur in dynamically typed languages. Additionally, strong typing enforces that operations on data are well-defined, avoiding unexpected behavior from implicit type conversions common in weakly typed languages. These characteristics allow Rust to produce highly efficient machine code, with direct support for many of its types in primitive CPU instructions, leading to predictable performance, especially in systems programming.

5.3.1 Scalar Types

Rust's scalar types are the simplest types, representing single values. They are analogous to the basic types in C, with some notable differences. Rust’s scalar types are categorized as integers, floating-point numbers, booleans, and characters. Here’s how they compare to C types:

Integers

Rust offers a wide range of integer types, both signed and unsigned, similar to C but with stricter definitions of behavior. In C, integer sizes can sometimes be platform-dependent, whereas Rust defines its types clearly, ensuring predictable size and behavior across platforms.

For fixed-size integer types, Rust uses type names that specify both the size and whether the type is signed or unsigned. Signed types begin with i, while unsigned types begin with u, followed by the number of bits they occupy. The available integer types are:

  • i8, i16, i32, i64, and i128 for signed integers (ranging from 8-bit to 128-bit).
  • u8, u16, u32, u64, and u128 for unsigned integers.

By default, Rust uses the 32-bit signed integer type i32 for integer literals if no specific type is annotated. This default strikes a balance between memory usage and performance for most use cases.

usize and isize: Rust introduces two integer types that are specifically tied to the architecture of the machine. The usize type is an unsigned integer, and the isize type is a signed integer. These types are used in situations where the size of memory addresses is important, such as array indexing and pointer arithmetic. On a 64-bit system, usize and isize are 64 bits wide, while on a 32-bit system, they are 32 bits wide. The actual size is determined by the target architecture of the compiled program. These types are particularly useful in systems programming for tasks that involve memory management or when dealing with collections where the index size is architecture-dependent. Notably, usize is the default type for indexing arrays and other collections in Rust, and you cannot use other integer types like i32 for indexing without an explicit cast.

Floating-Point Numbers

Rust follows the IEEE 754 standard for floating-point types, similar to C, but ensures stricter error handling and precision guarantees. Rust also uses clear type names for its floating-point types, which specify the bit size:

  • f32 for a 32-bit floating-point number.
  • f64 for a 64-bit floating-point number (the default).

Rust defaults to f64 (64-bit) for floating-point numbers, as it provides better precision and is generally optimized for performance on modern processors. The explicit naming of floating-point types helps avoid confusion and ensures consistent behavior across platforms.

Booleans and Characters

  • Boolean (bool): Rust’s boolean type (bool) is always 1 byte in size, even though it represents a value of true or false. While it might seem more efficient to represent a boolean as a single bit, modern CPUs generally operate more efficiently with byte-aligned memory. Using a full byte for a boolean simplifies memory access and allows for faster processing, particularly in situations where the boolean is stored in arrays or structs.

  • Character (char): The character type (char) in Rust represents a Unicode scalar value, differing from C’s char, which holds a single byte (ASCII or UTF-8). Rust’s char is 4 bytes, allowing for full Unicode support. This means it can represent characters from virtually any language, including emoji.

Scalar Types Table

Rust TypeSizeRangeEquivalent C TypeComment
i88 bits-128 to 127int8_tSigned 8-bit integer
u88 bits0 to 255uint8_tUnsigned 8-bit integer
i1616 bits-32,768 to 32,767int16_tSigned 16-bit integer
u1616 bits0 to 65,535uint16_tUnsigned 16-bit integer
i3232 bits-2,147,483,648 to 2,147,483,647int32_tSigned 32-bit integer (default integer type)
u3232 bits0 to 4,294,967,295uint32_tUnsigned 32-bit integer
i6464 bits-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807int64_tSigned 64-bit integer
u6464 bits0 to 18,446,744,073,709,551,615uint64_tUnsigned 64-bit integer
isizePlatform DependentVaries based on architecture (32-bit or 64-bit)intptr_tSigned pointer-sized integer
usizePlatform DependentVaries based on architecture (32-bit or 64-bit)uintptr_tUnsigned pointer-sized integer
f3232 bits~1.4E-45 to ~3.4E+38float32-bit floating point, IEEE 754
f6464 bits~5E-324 to ~1.8E+308double64-bit floating point (default)
bool1 bytetrue or false_BoolBoolean type, always 1 byte
char4 bytesUnicode scalar value (0 to 0x10FFFF)None (C’s char is 1 byte)Represents a Unicode character

5.3.2 Primitive Compound Types: Tuple and Array

Rust also provides compound types, which allow you to group multiple values into a single type. The two most basic compound types are tuples and arrays.

Note that "tuple" and "array" are not Rust keywords, meaning they can be used as variable names.

Tuple

A tuple is a fixed-size collection of values of various types. In Rust, tuples are often used when you want to return multiple values from a function without using a struct. Since tuples may be unfamiliar to those coming from C or other languages that lack this data type, we will explore them in more detail.

Tuple Type Syntax

In Rust, a tuple's type is defined by listing the types of its elements within parentheses (), separated by commas. This defines the exact types and the number of elements the tuple will hold.

Example:

(i32, f64, char)

This tuple type consists of three elements:

  • An i32 (32-bit signed integer)
  • An f64 (64-bit floating-point number)
  • A char (Unicode scalar value)

Tuple Value Syntax

To create a tuple value, you use the same parentheses () and provide the actual values, again separated by commas.

Example:

(500, 6.4, 'x')

This creates a tuple value with:

  • The integer 500
  • The floating-point number 6.4
  • The character 'x'

Note on Single-Element Tuples and the Unit Type:

  • Singleton Tuples: To define a tuple with a single element, include a trailing comma to differentiate it from a value in parentheses.

    #![allow(unused)]
    fn main() {
    let single_element_tuple = (5,); // A tuple containing one element
    let not_a_tuple = (5);           // Just the value 5 in parentheses
    }
  • Unit Type (): The unit type is a special tuple with zero elements, represented by ().

    #![allow(unused)]
    fn main() {
    let unit: () = (); // The unit type
    }
    • Functions that don't return a value actually return the unit type ().

Combining Type Annotation and Value Assignment

When declaring a tuple variable with an explicit type and initializing it with values, you write:

#![allow(unused)]
fn main() {
let tuple: (i32, f64, char) = (500, 6.4, 'x');
}
  • let tuple: Declares a new variable named tuple.
  • (i32, f64, char) Specifies the tuple's type.
  • = Assigns the value to the variable.
  • (500, 6.4, 'x') Provides the tuple's initial values.

This line tells Rust to create a variable tuple that holds a tuple of type (i32, f64, char) initialized with the values (500, 6.4, 'x'). In this example, the tuple is initialized with constant values, but it is more common to use values evaluated at runtime.

Accessing Tuple Elements

Accessing individual elements of a tuple is done using dot notation followed by the index of the element, starting from zero. However, tuples can only be indexed using constants known at compile time. You cannot dynamically loop over a tuple’s components by index because each element may be of a different type.

Example:

#![allow(unused)]
fn main() {
let tuple: (i32, f64, char) = (500, 6.4, 'x');
let first_element = tuple.0; // Accesses the first element (500)
let second_element = tuple.1; // Accesses the second element (6.4)
let third_element = tuple.2; // Accesses the third element ('x')
}

Mutability and Assignment of Tuple Elements

By default, variables in Rust are immutable. If you want to modify the elements of a tuple after its creation, you need to declare it as mutable using the mut keyword.

Example:

#![allow(unused)]
fn main() {
let mut tuple = (500, 6.4, 'x');
tuple.0 = 600; // Changes the first element to 600
}

Important Notes:

  • Fixed Size and Types: Tuples have a fixed size, and their types are known at compile time. You cannot add or remove elements once the tuple is created.

  • Assignment at Creation: You must provide all the values for the tuple when you create it. You cannot declare an empty tuple and fill in its elements later.

    This will NOT work:

    // Attempting to declare an uninitialized tuple (Not allowed)
    let mut tuple: (i32, f64, char);
    tuple.0 = 500; // Error: tuple is not initialized
  • Assignment Step by Step: Rust does not allow assigning to individual tuple elements to build up the tuple after declaration without initial values.

Destructuring Tuples

It’s not possible to loop through a tuple’s elements by index, but you can unpack or "destructure" a tuple into individual variables for easier access.

Example:

#![allow(unused)]
fn main() {
let tuple: (i32, f64, char) = (500, 6.4, 'x');
let (x, y, z) = tuple;
println!("x = {}, y = {}, z = {}", x, y, z);
}

This assigns:

  • x to tuple.0 (500)
  • y to tuple.1 (6.4)
  • z to tuple.2 ('x')

Memory Layout of Tuples

  • Contiguous Memory: Tuples in Rust are stored contiguously in memory, meaning that all the elements of the tuple are laid out sequentially in a single block of memory.
  • Element Order: The elements are stored in the order they are defined in the tuple.
  • Alignment and Padding: Due to differing sizes and alignment requirements of the elements, there may be padding bytes inserted between elements to satisfy alignment constraints. This can lead to the tuple occupying more memory than the simple sum of the sizes of its elements.

Tuples in Functions

Tuples are often used to return multiple values from a function.

Example:

#![allow(unused)]
fn main() {
fn calculate(x: i32, y: i32) -> (i32, i32) {
    (x + y, x * y)
}

let (sum, product) = calculate(5, 10);
println!("Sum = {}, Product = {}", sum, product);
}
  • The calculate function returns a tuple containing the sum and product of two numbers.
  • Destructuring is used to unpack the returned tuple.

Functions will be covered in full detail in a later chapter.

Comparison to C

In C, you might use structs to group different types together. However, structs in C require you to define a new type with named fields, whereas Rust's tuples are anonymous and access their elements by position.

C Struct Example:

struct Tuple {
    int a;
    double b;
    char c;
};

struct Tuple tuple = {500, 6.4, 'x'};

In C, you can assign to the fields individually after declaration because the struct has named fields.

Rust Equivalent with Structs:

If you need similar functionality in Rust (e.g., assigning values to fields individually), you might define a struct.

Rust Struct Example:

#![allow(unused)]
fn main() {
struct TupleStruct {
    a: i32,
    b: f64,
    c: char,
}

let mut tuple = TupleStruct { a: 0, b: 0.0, c: '\0' };
tuple.a = 500;
tuple.b = 6.4;
tuple.c = 'x';
}

We will cover the Rust struct type in greater detail in a later chapter.

When to Use Tuples vs. Structs

  • Tuples: Best when you have a small, fixed set of elements with different types and you don't need to refer to the elements by name.

  • Structs: Preferable when you need to:

    • Assign or modify fields individually after creation.
    • Access elements by names for clarity.
    • Have more complex data structures.

Traits Implemented by Tuples

Tuples implement several traits if their component types implement them. For example, if all elements implement the Copy trait, the tuple will also implement Copy. This is useful when you need to copy tuples without moving ownership.

The next chapter will cover ownership, references, borrowing, and move semantics, while Rust's traits will be discussed later.

Summary

  • Tuple Type Syntax: (Type1, Type2, Type3)

  • Tuple Value Syntax: (value1, value2, value3)

  • Declaration and Initialization: Must provide all elements at creation.

    #![allow(unused)]
    fn main() {
    let tuple: (i32, f64, char) = (500, 6.4, 'x');
    }
  • Mutability: Use mut to make the tuple mutable if you need to modify its elements.

    #![allow(unused)]
    fn main() {
    let mut tuple = (500, 6.4, 'x');
    tuple.0 = 600;
    }
  • Accessing Elements: Use dot notation with the index.

    #![allow(unused)]
    fn main() {
    let mut tuple = (500, 6.4, 'x');
    let value = tuple.1; // Accesses the second element
    }
  • Destructuring: Unpack the tuple into variables.

    #![allow(unused)]
    fn main() {
    let mut tuple = (500, 6.4, 'x');
    let (a, b, c) = tuple;
    }
  • Fixed Size and Types: Cannot change the size or types of a tuple after creation.

Conclusion

In Rust, tuples are a simple way to group a few values with different types.

Array

An array in Rust is a fixed-size collection of elements of the same type, much like an array in C. Arrays are stored on the stack and are ideal when you know the size at compile time. Rust arrays are more strict than C arrays, enforcing bounds checking at runtime, which prevents out-of-bounds memory access, a common source of bugs in C.

Array Type and Initialization Syntax

In Rust, you declare an array's type using square brackets [], specifying the element type and the array's length.

Syntax:

let array: [Type; Length] = [value1, value2, value3, ...];
  • [Type; Length]: Specifies an array of elements of Type with a fixed Length.
  • [value1, value2, value3, ...]: Provides the initial values for the array elements.

Example:

#![allow(unused)]
fn main() {
let array: [i32; 3] = [1, 2, 3];
}
  • [i32; 3]: An array of i32 integers with 3 elements.
  • [1, 2, 3]: Initializes the array with values 1, 2, and 3.

Arrays with Arbitrary Values

In Rust, arrays can be initialized with values that are the result of expressions, not just literals or constants.

Example:

#![allow(unused)]
fn main() {
let x = 5;
let y = x * 2;
let array: [i32; 3] = [x, y, x + y];
}

This demonstrates that you can use any valid expression to initialize the elements of an array, providing flexibility in how you construct arrays.

Initializing Arrays with Default Values

You can initialize an array where all elements have the same value using the following syntax:

let array = [initial_value; array_length];

Example:

#![allow(unused)]
fn main() {
let zeros = [0; 5]; // Creates an array [0, 0, 0, 0, 0]
}

This is particularly useful when you need an array filled with a default value.

Type Inference and Initialization

Rust often allows you to omit the type annotation if it can infer the type from the context.

Example with Type Inference:

#![allow(unused)]
fn main() {
let array = [1, 2, 3];
}
  • Rust infers that array is of type [i32; 3] because all elements are i32 literals and there are three of them.

Alternatively, you could use type inference in combination with an explicit type for one of the elements:

#![allow(unused)]
fn main() {
let array = [1u8, 2, 3];
}

Accessing Array Elements

To access elements of an array, you use indexing using square brackets [] with the index of the element, starting from zero. Arrays can be indexed by either compile-time constants or runtime-evaluated values, as long as the index is of type usize.

#![allow(unused)]
fn main() {
let array: [i32; 3] = [1, 2, 3];
let index = 1;
let second = array[index];
println!("Second element is {}", second);
}
  • Indexing starts at 0, as in C.
  • Indices must be of type usize.

Bounds Checking

Unlike C, Rust performs runtime bounds checking on array accesses. If you attempt to access an index outside the array's bounds, Rust will panic and safely abort the program, preventing undefined behavior.

Example of Out-of-Bounds Access:

#![allow(unused)]
fn main() {
let array = [1, 2, 3];
let i = 3; // Content of variable i is evaluated at runtime
let invalid = array[i]; // Panics at runtime: index out of bounds
}

To safely handle potential out-of-bounds access, you can use the get method, which returns an Option<&T>:

#![allow(unused)]
fn main() {
if let Some(value) = array.get(3) {
    println!("Value: {}", value);
} else {
    println!("Index out of bounds");
}
}

Iterating Over Arrays

You can iterate over arrays using loops.

Using a for Loop:

#![allow(unused)]
fn main() {
let array = [1, 2, 3];

for element in array.iter() {
    println!("Element: {}", element);
}
}
  • array.iter(): Returns an iterator over the array's elements.

Using Indices:

#![allow(unused)]
fn main() {
let array = [1, 2, 3];
for i in 0..array.len() {
    println!("Element {}: {}", i, array[i]);
}
}
  • 0..array.len(): Creates a range from 0 up to (but not including) array.len().
  • array.len(): Returns the number of elements in the array.

Memory Layout of Arrays

  • Homogeneous Elements: Arrays contain elements of the same type, size, and alignment.
  • Contiguous Memory: Stored in a single contiguous block without padding between elements (since all elements have the same alignment).
  • Predictable Layout: Memory layout is straightforward because each element follows the previous one without any padding.

Arrays in Functions

Arrays can be passed to functions, but since the size is part of the array's type, it's often more flexible to use slices.

Example Using a Slice:

#![allow(unused)]
fn main() {
fn sum(array: &[i32]) -> i32 {
    array.iter().sum()
}

let array = [1, 2, 3];
let total = sum(&array);
println!("Total sum is {}", total);
}
  • Slices allow functions to accept arrays of any size, as long as the element type matches.

Slices

A slice is a view into a block of memory represented as a pointer and a length. Slices can be used to reference a portion of an array or vector.

Example of Creating a Slice from an Array:

#![allow(unused)]
fn main() {
let array = [1, 2, 3, 4, 5];
let slice = &array[1..4]; // Slice containing elements [2, 3, 4]
}
  • Slices are similar to pointers in C but include length information, enhancing safety.

Slices and the use of the ampersand (&) to denote references will be explored in greater detail in the next chapter of the book.

Mutable Arrays

By default, variables in Rust are immutable. To modify the contents of an array, declare it as mutable using the mut keyword.

Example:

#![allow(unused)]
fn main() {
let mut array = [1, 2, 3];
array[0] = 10; // Now array is [10, 2, 3]
}

Arrays with const Length

The length of an array in Rust must be a constant value known at compile time.

Example with Constant Length:

#![allow(unused)]
fn main() {
const SIZE: usize = 3;
let array: [i32; SIZE] = [1, 2, 3];
}

Multidimensional Arrays

Rust supports multidimensional arrays by nesting arrays within arrays.

Example of a 2D Array:

#![allow(unused)]
fn main() {
let matrix: [[i32; 3]; 2] = [
    [1, 2, 3],
    [4, 5, 6],
];
}
  • This creates a 2x3 matrix (2 rows, 3 columns).

Traits Implemented by Arrays

Arrays implement several traits if their element types implement them. For example, if the elements implement Copy, the array will also implement Copy.

Example:

#![allow(unused)]
fn main() {
fn duplicate(array: [i32; 3]) -> [i32; 3] {
    array // Copies the array because i32 implements Copy
}
}

This allows arrays to be copied by value, similar to how structs can derive the Copy trait.

Comparing Rust and C Arrays

  • Fixed Size: Both Rust and C arrays have a fixed size known at compile time.
  • Type Safety: Rust arrays are type-safe; all elements must be of the same type.
  • Bounds Checking: Rust performs bounds checking at runtime, preventing out-of-bounds memory access—a common issue in C.
  • Memory Location: Rust arrays are stored on the stack by default.
  • Slices: Rust introduces slices for safe and flexible array access, which is not directly available in C.

Other Initialization Methods

Initializing Arrays Without Specifying All Elements:

In Rust, you must initialize all elements of an array. Unlike in C, where uninitialized elements might be set to zero or garbage values, Rust requires explicit initialization.

Example:

#![allow(unused)]
fn main() {
let array: [i32; 5] = [1, 2, 3, 0, 0];
}
  • Manually specify default values for unspecified elements.

Summary

  • Declaration Syntax: let array: [Type; Length] = [values];
  • Type Inference: Rust can infer the type and length based on the values provided.
  • Initialization with Default Values: let array = [initial_value; array_length];
  • Accessing Elements: Use array[index], where index is of type usize.
  • Mutability: Declare with mut if you need to modify elements after creation.
  • Bounds Checking: Rust checks array bounds at runtime to prevent invalid access.
  • Iteration: Use loops to iterate over elements or indices.
  • Slices: Use slices for flexible and safe access to arrays.

When to Use Tuples, Arrays, and Vectors

Rust provides tuples, arrays, and vectors to group multiple values, each serving distinct purposes:

  • Tuples: Fixed-size collections that can hold elements of different types. Ideal for grouping a small, fixed number of related values where each position has a specific meaning.

  • Arrays: Fixed-size collections of elements of the same type. Suitable for handling a known number of homogeneous items, allowing efficient indexed access and iteration.

  • Vectors (Vec<T>): Growable arrays stored on the heap. Use when you need a collection of elements of the same type, but the size can change at runtime.

Key Differences:

  • Homogeneity:

    • Tuples: Heterogeneous elements (different types).
    • Arrays and Vectors: Homogeneous elements (same type).
  • Size:

    • Tuples and Arrays: Fixed size known at compile time.
    • Vectors: Dynamic size that can grow or shrink at runtime.
  • Usage Scenarios:

    • Tuples: Grouping related values with different types or meanings, like coordinates (x, y).
    • Arrays: Collections of fixed-size homogeneous data, like days in a week.
    • Vectors: Collections where the number of elements isn't known at compile time or can change, like lines read from a file.

Examples:

  • Tuple:

    #![allow(unused)]
    fn main() {
    let point = (10.0, 20.0); // x and y coordinates
    let (x, y) = point;       // Destructure into variables
    }
  • Array:

    #![allow(unused)]
    fn main() {
    let weekdays = ["Mon", "Tue", "Wed", "Thu", "Fri"];
    for day in weekdays.iter() {
        println!("{}", day);
    }
    }
  • Vector:

    #![allow(unused)]
    fn main() {
    let mut numbers = vec![1, 2, 3];
    numbers.push(4); // Now numbers is [1, 2, 3, 4]
    }

Choosing the Right Type:

  • Use tuples when you have a small, fixed set of values with possibly different types or meanings.
  • Use arrays when you have a fixed-size collection of the same type and need efficient access or iteration.
  • Use vectors when dealing with a collection that can change in size.

Tuples or Arrays as Function Return Types

When a function needs to return multiple values, the choice between tuples and arrays depends on the nature of the data:

  • Tuples are preferable when:

    • Returning a fixed number of values with distinct meanings.
    • Each value may represent a different concept, even if they're the same type.
    • You want to leverage destructuring for clarity.

    Example: Returning Coordinates

    #![allow(unused)]
    fn main() {
    fn get_coordinates() -> (f64, f64) {
        (10.0, 20.0)
    }
    
    let (x, y) = get_coordinates();
    println!("x = {}, y = {}", x, y);
    }
  • Arrays are suitable when:

    • Returning a fixed-size collection of homogeneous values.
    • The elements represent the same kind of data.
    • You might need to iterate over the elements.

    Example: Returning a Row of Data

    #![allow(unused)]
    fn main() {
    fn get_row() -> [i32; 3] {
        [1, 2, 3]
    }
    
    let row = get_row();
    for value in row.iter() {
        println!("{}", value);
    }
    }

Why Choose Tuples for Coordinates:

  • Semantic Clarity: Destructuring tuples into variables like x and y makes the code more readable and self-explanatory.
  • Distinct Meanings: Even if x and y are the same type, they represent different dimensions.

Alternative with Structs:

For enhanced clarity and scalability, especially with more complex data, consider using a struct:

#![allow(unused)]
fn main() {
struct Coordinates {
    x: f64,
    y: f64,
}

fn get_coordinates() -> Coordinates {
    Coordinates { x: 10.0, y: 20.0 }
}

let coords = get_coordinates();
println!("x = {}, y = {}", coords.x, coords.y);
}
  • Advantages:
    • Named Fields: Clearly indicate what each value represents.
    • Extensibility: Easy to add more fields (e.g., z for 3D coordinates).
    • Methods: Ability to implement associated functions or methods.

Summary:

  • Use Tuples when returning multiple values with different meanings or when you want to unpack values into variables with meaningful names.
  • Use Arrays when returning a collection of similar items that might be processed collectively.
  • Use Structs for even greater clarity and when you might need to expand functionality.

Final Recommendation:

  • For returning pairs like coordinates, tuples offer a good balance between simplicity and clarity.
  • For collections of homogeneous data where iteration is needed, arrays (or vectors if the size is dynamic) are more appropriate.

By choosing the most suitable data structure, you enhance code readability, maintainability, and safety, aligning with Rust's emphasis on clarity and reliability.

Note: Using very large arrays can cause a stack overflow, as the default stack size is usually limited to a few megabytes and varies depending on the operating system. For large collections, consider using a vector (Vec<T>) instead.

Stack vs. Heap Allocation

Rust's primitive types—scalars, tuples, and arrays—are typically stack-allocated, providing fast access due to their predictability and locality. Rust takes advantage of direct CPU support for many of these primitive types, optimizing them for performance.

On the other hand, dynamic types like vectors (Vec<T>) and dynamically sized strings (String) use heap allocation to store their data, allowing for flexible and dynamic resizing at runtime. This heap allocation introduces overhead but is necessary for handling collections of unknown size at compile time.