Fisher–Yates Shuffle

Say you had a fresh pack of cards:

If you want to play a game of Texas Hold ‘em with friends, you should shuffle the deck first to randomize the order and insure a fair game. But how?

A simple but effective way of doing this is to pull a random card from the deck repeatedly and set it aside, incrementally building a new stack. As long as you pick each remaining card from the deck with equal probability, you’ll have a perfectly-unbiased random stack when you’re done:

But let’s say instead of a physical deck of cards, you wanted to write code to perform this same task with an in-memory array of n elements. Sounds straightforward (in part), but how would you pick a random remaining element from the original deck, exactly?

One slow option—gotta start somewhere: pick a random element from the array (in [0, n - 1]) and then check if you’ve shuffled that element already. This works, but it becomes increasingly slow as the remaining elements dwindle; you’ll keep randomly picking elements that have already been shuffled. Watch how those duplicate selections (in red) cause the shuffle to crawl to a halt:

Here’s what the implementation looks like in JavaScript, not that you should use it:

function shuffle(array) {
  var copy = [], n = array.length, i;

  // While there remain elements to shuffle…
  while (n) {

    // Pick a remaining element…
    i = Math.floor(Math.random() * array.length);

    // If not already shuffled, move it to the new array.
    if (i in array) {
      copy.push(array[i]);
      delete array[i];
      n--;
    }
  }

  return copy;
}

This is bad, and we can do better. You can avoid duplicate selection by picking only remaining elements: pick a random number in the range [0, m - 1], where m starts at n and decreases by one with each iteration. In other words, m represents the number of remaining cards to shuffle. Compact the remaining deck as you move cards so that you can easily pick out the next card for shuffling:

function shuffle(array) {
  var copy = [], n = array.length, i;

  // While there remain elements to shuffle…
  while (n) {

    // Pick a remaining element…
    i = Math.floor(Math.random() * n--);

    // And move it to the new array.
    copy.push(array.splice(i, 1)[0]);
  }

  return copy;
}

This works pretty well, but it still has relatively poor quadratic performance. The problem is that when you remove each element from the original array (array.splice), you have to shift all the subsequent elements down to compact the array. On average, that’s n / 2 elements to shift per element to shuffle, giving O(n2).

But here’s an interesting, if obvious, insight: the number of shuffled elements (n - m) plus the number of remaining elements (m) is always equal to n. This means we can do the entire shuffle in-place, without any extra space! We use the back of the array to store the shuffled elements, and the front of the array to store the remaining elements. We don’t care about the order of the remaining elements as long as we sample uniformly when picking!

To implement the in-place O(n) shuffle, then, pick a random remaining element (from the front) and place in its new location (in the back). The unshuffled element in the back is swapped to the front, where it waits for subsequent shuffling:

function shuffle(array) {
  var m = array.length, t, i;

  // While there remain elements to shuffle…
  while (m) {

    // Pick a remaining element…
    i = Math.floor(Math.random() * m--);

    // And swap it with the current element.
    t = array[m];
    array[m] = array[i];
    array[i] = t;
  }

  return array;
}

For more about the Fisher–Yates shuffle, see the Wikipedia article and Jeff Atwood’s post, “The Danger of Naïveté” (2007). The visualizations in this post were built with d3.js and inspired by sort algorithm visualizations in Robert Sedgewick’s Algorithms in C (1998). See as well these visualizations of merge sort and quicksort.

Comments? Discuss on HN!