22.8 Introduction to Rayon for Data Parallelism

Writing multi-threaded code manually can be error-prone and verbose, particularly when processing large datasets. The Rayon crate simplifies this by providing parallel iterators that enable data parallelism with minimal code changes.

Rayon is a lightweight and efficient data-parallelism library that transforms sequential computations into parallel ones. It ensures data-race-free execution and dynamically balances workloads based on runtime conditions. This makes it an ideal choice for introducing parallelism into existing Rust code.

Rayon provides parallel versions of Rust's standard iterators, allowing seamless integration with existing code. It includes methods like par_sort for parallel sorting of mutable slices and the join() function for efficient parallel execution of two tasks.

22.8.1 Using Rayon in Your Project

To use Rayon, add the crate to your Cargo.toml file and import the necessary traits. The traits are grouped under the rayon::prelude module, typically imported with use rayon::prelude::*.

The rayon::prelude module provides access to methods like par_iter, par_iter_mut, and into_par_iter, which enable parallel implementations of standard iterative functions such as map, for_each, filter, and fold.

Parallel iterators are used similarly to regular iterators. To convert a sequential iterator to a parallel one, replace .iter(), .iter_mut(), or .into_iter() with .par_iter(), .par_iter_mut(), or .into_par_iter(), respectively.

Example: Summing Squares in Parallel

use rayon::prelude::*;

fn main() {
    let numbers: Vec<u64> = (0..1_000_000).collect();
    let sum: u64 = numbers
        .par_iter()
        .map(|x| x.pow(2))
        .sum();
    println!("Sum of squares: {}", sum);
}

Example: Incrementing Values in Parallel

use rayon::prelude::*;

fn increment_all(input: &mut [i32]) {
    input.par_iter_mut().for_each(|p| *p += 1);
}

Rayon automatically divides the input data across a thread pool, reducing the overhead of thread creation and destruction. This makes it particularly effective for data-parallel tasks such as summations, mappings, and filters.

22.8.2 When Rayon's Parallel Iterators Might Be Slower

For small workloads, the overhead of setting up parallel execution may outweigh the performance gains. Always benchmark your code to verify whether parallelization improves performance in your specific case.

22.8.3 The join() Construct

The rayon::join function allows efficient parallel execution of two closures. It returns a pair of results from those closures and is particularly useful for divide-and-conquer algorithms like quicksort.

Conceptually, calling join() is similar to spawning two threads, with each executing a closure. However, Rayon’s implementation uses a technique called work stealing, which utilizes a fixed pool of worker threads and executes tasks in parallel only when there are idle CPUs.

Example: Parallel Quicksort Using join()

use rayon::prelude::*;

fn main() {
    let mut v = vec![5, 1, 8, 22, 0, 44];
    quick_sort(&mut v);
    assert_eq!(v, vec![0, 1, 5, 8, 22, 44]);
    println!("{:?}", v);
}

fn quick_sort<T: PartialOrd + Send>(v: &mut [T]) {
    if v.len() > 1 {
        let mid = partition(v);
        let (lo, hi) = v.split_at_mut(mid);
        rayon::join(|| quick_sort(lo), || quick_sort(hi));
    }
}

fn partition<T: PartialOrd + Send>(v: &mut [T]) -> usize {
    let pivot = v.len() - 1;
    let mut i = 0;
    for j in 0..pivot {
        if v[j] <= v[pivot] {
            v.swap(i, j);
            i += 1;
        }
    }
    v.swap(i, pivot);
    i
}

Although this example demonstrates parallel quicksort, for real-world sorting tasks, the par_sort method is generally more efficient and should be preferred.