Will Murphy's personal home page

It’s time to wrap up the series on calculating pi using a Monte Carlo simulation in Rust. Here’s what I think we’ve learned so far:

The intuition of the problem, namely choosing random points in a unit square and counting the ones that are also in the unit circle can yield pi, is really cool, but it’s a very inefficient way to calculate the value of pi.
The problem seems parallelizable, namely that whether each point is in the circle is an independent question, so we can decide separately for each of the points and make things faster.
Parallelizing it doesn’t provide enough speedup to make it worth while.

I’m making performance claims without benchmarks! Numbers or it didn’t happen!

So here are our 3 implementations of calculating pi in Rust: single-threaded Monte Carlo, multi-threaded Monte Carlo, and summing a convergin series:

Single Threaded Monte Carlo:

pub fn find_pi(iterations: u64) -> f64 {
    let b_count = (0..iterations)
        .map(|_| random_point())
        .filter(|point| !is_in_circle(&point))
        .count() as f64;

    let b = (b_count as f64) / iterations as f64;

    4.0 - (4.0 * b)
}

Multi-threaded Monte Carlo

pub fn find_pi_par_iter(iterations: u64) -> f64 {
    let b_count = (0..iterations)
        .into_par_iter() // let Rayon crate handle parallelism
        .map(|_| random_point())
        .filter(|point| !is_in_circle(&point))
        .count() as f64;

    let b = (b_count as f64) / iterations as f64;

    4.0 - (4.0 * b)
}

Sum of infinite series

pub fn find_pi_analytically(iterations: u64) -> f64 {
    // π=(4/1)-(4/3)+(4/5)-(4/7)+(4/9)-(4/11)+(4/13)-(4/15)...
    let mut pi: f64 = 4.0;
    let four: f64 = 4.0;
    let mut minus = true;
    let max = iterations * 2;
    for i in (3..max).step_by(2) {
        if minus {
            pi = pi - (four / (i as f64));
        } else {
            pi = pi + (four / (i as f64));
        }
        minus = !minus;
    }
    pi
}

The benchmarks

These are produced by criterion.rs with the following source code:

pub fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("find_pi single threaded Monte Carlo 5k", |b| {
        b.iter(|| find_pi(black_box(5000)))
    });
    c.bench_function("find_pi multi threaded Monte Carlo 5k", |b| {
        b.iter(|| find_pi_par_iter(5000))
    });
    c.bench_function("find_pi sum of series 5k", |b| {
        b.iter(|| find_pi_analytically(5000))
    });
}

find_pi single threaded Monte Carlo 5k                                                                            
                        time:   [114.55 µs 115.95 µs 117.63 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

find_pi multi threaded Monte Carlo 5k                                                                            
                        time:   [95.813 µs 133.03 µs 192.49 µs]
Found 19 outliers among 100 measurements (19.00%)
  19 (19.00%) high severe

find_pi sum of series 5k                                                                             
                        time:   [8.6428 µs 8.7483 µs 8.8570 µs]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

There’s one surprise here, namely that the multi-threaded iteration is slower. Let’s bump things to more iterations and see if we get a speedup, since it could be there’s a minimum number of iterations before the gains from multi-threading exceed its overhead.

This benchmark is produced by the same code as above, but running for 50K iterations instead of 5K:

find_pi single threaded Monte Carlo 50k                                                                             
                        time:   [1.1389 ms 1.1493 ms 1.1617 ms]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

find_pi multi threaded Monte Carlo 50k                                                                            
                        time:   [502.39 µs 658.09 µs 880.81 µs]
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

find_pi sum of series 50k                                                                            
                        time:   [87.382 µs 88.691 µs 90.145 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

These are more like the numbers I expected: The sum of series method takes ~10x longer, because there are ten times as many numbers to add up. The parellel Monte Carlo method is faster than the single-threaded version, but not so much so that it makes up for choosing the wrong algorithm.

The conclusions

Rust is fun
Math is fun
Parallelism in the wrong algorithm is a lot worse than the right algorithm.

Till next week, happy learning!
– Will

Rust Monte Carlo Pi 3

Single Threaded Monte Carlo:

Multi-threaded Monte Carlo

Sum of infinite series

The benchmarks

The conclusions

Discussion