In Part One of this series, I suggested that you imagine what happens when a wonky die is rolled over and over again. Cancer probabilities are kind of wonky this way, with a low probability of happening, followed by an unpredictable course when it kicks in, possibly including death.
The mathematical principle illustrated here is called the Central Limit Theorem. To simplify the principle, we can say that even small probabilities, and even those with unusual random patterns, multiplied by many occurrences, start to look “statistically normal” (approaching a bell-shaped curve) as the number of repetitions grow. To demonstrate this, we will use this odd die that has a “6” on one side, but is blank on the other five sides, something like this:
Now you will simulate the rolling this die 100 times, adding six only when a six comes up, otherwise, you add nothing. This is a very “skewed” probability distribution, all or nothing, as compared to a standard die with sides numbered one through six. What do you predict will be the sum of 100 rolls of this unusual die?
Click here to bring up a browser tab that will simulate the roll this die 100 times adding up the results, and then it will repeat that 100-roll sequence for 50 repetitions, creating a histogram of your results. You can click the “Roll Again” button at the bottom of the screen to see if you can perceive a pattern with successive sets of rolls. Notice that each sum of 100 rolls is always a multiple of six. Also, each histogram comes out different because of the randomness, but some patterns are more common than others.
If the simulation let you run up to 500 repetitions, creating a histogram of the sums, it might start to look something like this (prettied up and rotated):
This histogram will look slightly different every time we run this exercise. But even though our original probability distribution was very skewed (five zeroes for every one “6”), the tally of sums starts to look statistically normal. In my sample, the average of all observations was 100.9. No one set of 100 rolls will ever exactly total to 100, because 100 is not a multiple of six, but the average will gravitate toward that value the more repetitions we make.
If we keep doing this, the average of the summed dice will get ever closer to 100 and the curve will look even more bell-shaped. Here’s how my test looks at 2000 runs. The average is now 100.35:
What do you think this pattern might look like after two million runs? Highly-skewed probability distributions are abundant in nature, and in the scheme of life, two million repetitions are peanuts, because nature has had billions of years of time to roll its dice.
Note that this 2000-sample set is still a bit skewed. It is theoretically possible, though very unlikely, to roll 100 blank faces in a row, for a total of zero. It is also possible, though even less likely, to roll 100 sixes in a row, for a total of 600. These are the very improbable, though very possible, “tails” on the curve. But if I go for two million runs of the simulation, I will get close to either of both extreme tail values at least once, because of Poisson’s Law of Large Numbers. I will have “lottery winners” among my dice rolls if I roll long enough.
There are many probabilistic factors that impact our daily lives, right down to how the individual cells in our body replicate, either normally or out of control (and thus, cancer). And most of these factors individually are not normal, rather they are often greatly skewed toward one end of the probability scale.
The point of the Central Limit Theorem, however, is that these individual factors start to appear statistically normal, and we can often use this characteristic to our advantage. Now you know that when something “appears normal,” there are likely factors in play that might be anything but. There are just lots of dice rolls and lots of time.
Watch for Part Three of this series of posts, where we see what happens when there are multiple genetic (or traffic) issues happening all at once.
Part Three of this series is now posted.