Note: An update regarding the misplaced use of Benford’s Law in 2020 election conspiracies has been posted here.
The record-breaking Bernie Madoff Ponzi scheme, which collapsed in 2008, is in news again. The trustee trying to recover assets has clawed back $76.5 million from a Bermuda/Austrian investment fund that had profited from the scheme. [1]
Most people don’t know that this fraud could have been taken down eight years earlier. An analyst named Harry Markopolos had tried to warn the Securities and Exchange Commission that something was afoul at Madoff’s hedge fund, and even bragged that one audit technique just took a few minutes to prove fraud. It is likely that Markopolos used as part of his analysis a little-known calculation called Benford’s Law, which is so counter-intuitive that even the SEC disregarded his “proof.” But Markopolous was right, and Benford’s Law is a still a very interesting way to ferret out financial fraud.
Something odd in a Rand McNally atlas
I mentioned in an earlier post that nature itself does not usually count linearly in the base-ten addition that is deeply ingrained in human thinking. Most natural growth is instead exponential and logarithmic in form. To illustrate this, let’s take some numbers that one would think would be very random. We will take pages from the index of towns and populations from the back of any Rand McNally geographic atlas, something like this fragment.
To make things even more random, we will make a list of the first digit from each town’s population regardless of whether the total population is in the hundreds, thousands, ten-thousands or even larger. So, from the image above, my sample of random numbers from the first column will begin 1, 3, 4, 5, 2, 2, 7, etc.
Let’s gather several hundred of these town population numbers, taken from different pages of the index, and count the occurrences of each first digit. My sample set of 400 first digits, converted to percentages of the total towns counted, tallies like this:
Do you notice something odd? Why are there so many more ones than twos, and more twos than threes, with even smaller numbers sampled from the larger digits? My intuition tells me that by taking a bunch of random populations from random cities, the populations should at least roughly start with a balanced probability of first digits, somewhere near 1/9th each, or around 11%. Certainly, if I were to just “make up” a set of population values at random for my “fake Rand McNally Atlas” I would make sure the numbers were close to balanced, right?
The math of Benford’s Law
It turns out that my intuition is dead wrong here. If I were to expand my data set even larger, it would begin to approach a probability distribution called Benford’s Law, whose predicted counts look like this, compared to my Rand McNally numbers.
Notice that the first predicted interval, from one to two, is the biggest, with each subsequent interval getting smaller. If I graph it out, I get what is called a “negative exponential” curve that looks like the one below. [2]
My set of 400 numbers gets pretty close to the Benford predictions. If I sampled more pages from the atlas, I would likely get even closer. What is happening here? I have read multiple attempts to explain Benford’s Law, most not very convincing or clear. My explanation, perhaps equally unconvincing to you, is this:
In the Rand McNally case, recognize that city populations usually grow or decline in fairly constant percentages, at least in the short term. Part of this is natural birth and death. There is also some interesting “community switching” math that will have to wait for another day. So let’s say that your community has 1000 people and it is growing quickly and consistently at 10% per year. Next year, you would expect 1100 people in this town, but in year three that number would be 1210, rather than 1200 (1.10 times 1100). The next year it would be 1331 (1.10 times 1210). It will take seven years before you will cross the 2000 population threshold. [3]
However, what if the starting population were only 800? In this case, we would grow to 880 by the second year, but already to 968 by year three, and then to 1064 the next year. And so, we have cycled through three beginning digits in four years. And then, from that point crossing to “1”, it will take seven more years to get up to 2000.
So the problem here is our human counting system. We count digitally, having chosen base ten math, likely because of our fingers and toes, and we expect nice even-interval number sets. Nature, however, was here first, and often counts exponentially, whether it be the replication of cells in the human body, the replication of human populations, or the replication of human “economic choices” (i.e., money). In these natural growth processes, the difference between “one of a thing” and “two of a thing” is greater than the natural distance between “eight of a thing” and “nine of a thing.”
And it turns out that the “natural math” used by the lion to chase down a gazelle, as well as the “natural math” of the gazelle trying to escape from that lion, are both naturally exponential as well, calculating speed, acceleration and trajectory on the fly using “brain hardware” to accomplish math we can’t do on our fingers and toes. We require some tool that will help us with exponents and logarithms, like a calculator, or an Excel spreadsheet. Or for many years a slide rule, which is simply a set of movable logarithmic rulers.
And back to Bernie Madoff
What does this have to do with the Bernie Madoff fraud?
It turns out that Benford’s Law is a handy tool for rooting out fraudulent financial statements, because most of us, if we were to just “make up” some fake numbers in a financial statement in order to fool somebody, would not follow the patterns of Benford’s Law. We would more likely see the Benford pattern, with so many ones and twos, as suspicious, and so we would artificially “balance” our fake numbers with more sevens and eights. But invested money usually grows and declines in exponential terms like populations do.
Thanks to Excel and the SEC’s online database of corporation financial statements, I quickly downloaded several financial statements from various random companies and extracted the revenue, expense, asset, liability and equity dollar amounts only, yielding 1000 numbers of widely varying sizes and descriptions. I then just stripped off the first digit of each number, as I did for the Rand McNally exercise, yielding digits from one to nine.
Here are my results, the percentages of “hits” for each starting digit, and the prediction of Benford’s Law:
Well, that came out even closer than the Rand McNally exercise! Allegedly, Bernie Madoff and his assistant tried to “fudge” many of their reported numbers, not knowing Benford’s Law, and the variation from predicted values was much larger than normal statistical variation like my samples showed, a variation apparently discovered by Harry Markopolos.
Oops, I think I just revealed a secret that only wonkish auditors know. Don’t tell anybody!
Notes:
- Smythe, Christie, and Eric Larson. “Madoff Trustee Gets $76.5 Million From Austrian Feeder Fund.” Bloomberg.com, 12 Feb. 2018.
- Here’s the math: when you have a negative exponential curve, you are seeing a logarithmic numeric relationship at work. In this case, the percentage probability of one digit occurring as the first gravitates toward a value that is the logarithm of one (the interval) plus one divided by the digit, or: log10 (1 + 1/d)
- This is the “Rule of 72” at work, as described in an earlier post.
For additional posts on probability, volition and ethics, follow the Dice icon back or forward where it appears.
Pingback: Hearing, seeing, and choosing in logarithms – part 1 – When God Plays Dice
Pingback: Will you choose the cake or the fruit? – When God Plays Dice
Pingback: Chasing Benford’s Law down an election rabbit hole – When God Plays Dice