Even if you remember a few things from that college statistics class (and confess, you probably don’t) the barrage of election statistics around this time of year can get overwhelming and confusing. I will admit that I have fivethirtyeight.com and other election statistics sites in my news feed, but I have learned a few limitations to human knowledge about this obsession. This post suggests a few caveats as you try to interpret the statistical prognosticators.

**It’s not a lottery or a sack of billiard balls**

Most of the first college course in statistics (and likely the last) you may have taken focused on the inference of information about *known data sets*, like a sack of billiard balls from which we are drawing samples. In election terms, we often start out with this assumption that there is also this “sack” of “red billiard ball voters” and “blue billiard ball voters,” and those many polls are sampling from these “voter sacks” sacks to tell us how our favored “ball color” is doing.

However, in the real world of elections, we don’t know exactly how big this sack of balls really is, come election day. Not only that, but the balls might change color at the last minute, and there are some pink ones and green ones (third parties) mixed into our sack. Often our “statistical inferences” about this “sack” via the polls comes out correct, but sometimes we elect Donald Trump.

In lottery statistics or casino slot machines, the *Law of Large Numbers* is our friend in the end. [1] The more participants in a state lottery, the more the inherent probabilities of the lottery system will prove true. Lottery managers depend on knowing the payout, and the longer we play, the more certain it gets, even if they can’t predict exactly *who* will win. The more coins that are inserted in a slot machine, the greater the probability that the casino will wind up with your money. In elections, however, more votes can easily add to the overall uncertainty of the outcome, because they are often coming from un-polled places, and because, as noted above, the nature of the voting population itself is uncertain and always changing.

**Why it is more like “a horse race”**

What the political polls are really measuring in elections is *uncertainty*. In effect, we are measuring the confidence level in our sampling methods. When Nate Silver at fivethirtyeight.com creates his “poll of polls” he is measuring the “tendency toward the mean” for imperfect sampling methods of an unknown and changing population. Statistically, the “mean of the means” can reduce the perception of uncertainty, and it *can* correlate better than any one poll with eventual outcomes, but it still likely falls well short of “statistical probability.”

When we have enough “repeats” in uncertain forecasts the long run, we can turn uncertainty in *measurable risk*. [2] The problem here is “the long run.” Biennial elections in a changing demographic population provide far too few samples of the electorate to move from “uncertainty” to “risk”.

A better comparison here than “billiard ball sampling” is *parimutuel betting* in horse racing. The “probabilities” are really just samples of the *bettors’ uncertainty*. Thrown together they make for the *appearance* of probability in determining “the odds,” but the outcome of the race is more *uncertain* than it is *probabilistic*.

**The “double effect” of switched votes**

Say that our candidate is ahead by a 52% to 48% margin in a two-person race. The press will often report this as a “four percentage point spread,” which sounds pretty good. But it is not. It is really a “two-point spread.” To see this difference, imagine that we have polled a difference of 100 voters between the two candidates in coming up with this spread. If just one of those prospective voters on the winning side switches candidates the spread drops not to 99 votes, but rather to *98* votes, with the top candidate losing one vote and the bottom candidate gaining one vote.

In that earlier example, only 2% of the voters need to flip (in a clean two-person race) and then the race will be tied. So, whenever you hear “the spread” announced on television, divide it by two, and you may find the race is, for good or bad, much closer than you think.

In a series of earlier posts, I used the statistical concept of *Markov chains* as a better way to see elections and changing political positions. [3] There are four different “rates of change” going on with every new bit of information about our candidate (Circle A below) and the opposing candidate (Circle B, assuming that there are only two). A certain percentage of voters may solidify their loyalty to their current position as new information emerges, but a certain percentage will “change horses,” defecting in either direction.

You can read those earlier posts to see how the math shakes out, but in short, each shift in these percentages will “settle down” into a new “steady state” level for each “circle” of candidates or political positions, but that new steady state level may not be intuitively obvious. This is, I suggest, how we got to a Republican Party that loves Russians, supports huge fiscal deficits, and celebrates unapologetic sexual indiscretions.

**You may be more behind than you think you are**

This may sound the opposite of the prior section on “double effect,” but this refers to what is happening *as the vote is counted*. Say that there is only one other candidate and that you will need 50% of the vote plus one to be “first past the post.” However you only have 49% of the vote so far and half of the vote has been counted. Can we still win?

This graph shows the math of what happens when we are behind, as more and more of the vote gets counted. If we have 49% of the vote so far, and only 50% of the vote is counted so far, we can still win if we garner in excess of 51% of the remaining vote. However, if we are still at 49% when 90% of the vote is in, we now need 60% of the remaining vote, and the odds quickly deteriorate from there, with the win eventually getting out of reach. And if we have only 48% of the vote to start with, this curve gets uglier even more quickly.

You can also use the same math if you are ahead and you need to get comfortable with how good that lead is as the vote gets counted. On election day I will post a handy online tool for you to use as the votes come in on your favored candidates using this math. This tool will also let you tweak the variables to account for third party candidates, as well as play with the vote percentages needed at various points in the counting process.

Be forewarned, I used this tool during the vote counting of the 2016 presidential election and thus called it an early night and went home disappointed/distraught. We rolled the dice and lost badly.

If you want to get notification when this vote-counting tool is posted, enter your email address in the box to the left of this post, or click on the Facebook or Twitter icons to follow this blog.

Notes:

- I have written several posts on Siméon Denis Poisson’s
*Law of Large Numbers*, including one about lotteries and “Why there is always a winner, but it’s probably not you.” - To see what happens when you can generate thousands of repeats of “uncertain events” in order to determine “quantifiable risk,” see this post on “Visualizing 7% investment risk”.
- A series of posts about the political uses of Markov chains starts with one entitled “The math of changing your mind.”