For a long time, we’ve been interested in using mathematics (and computers) to detect and deter fraud. It is related to our earlier work on identifying perpetrators of terrorist attacks. (Yeah, I know it’s not as cool, but it’s some similar math!)
Today, I want to talk about some approaches to detecting fraud that we talked about on a beautiful summer day, in the engineering room at Aiaioo Labs.
That day, in the afternoon, somebody had rung the bell. A colleague had answered the bell and then come and handed me a sheet of paper, saying that a lady at the door was asking for donations.
The paper bore the letterhead of an organization in a script that I couldn’t read. However the text in English stated that the bearer was a student collecting money to feed a few thousand refugees living in a refugee camp in Hyderabad (the refugees’ homes had been destroyed in artillery shelling on the India-Pakistan border and that there were a few thousand families without shelter who needed food and medicines urgently).
On the sheet were the names and signatures of about 20 donors who had each donated around 1000 rupees.
Now the problem before us was to figure out if the lady was a genuine student volunteer or a fraudster out to make some quick money.
There was one thing about the document that looked decidedly suspicious.
It was that the amounts donated were all very similar – 1000, 1200, 1300, 1000, 1000, 1000, 1000.
All the numbers had unnaturally high values.
So, I called a friend of mine who came from the place she claimed the refugees (and the student volunteers) were from and asked him to talk to her and tell me if her story checked out.
He spoke to her over the phone for a few minutes and then told me that her story was not entirely true.
She was from the place that she claimed the refugees came from, but she was in fact collecting money for her own family (they had come south because one of them had needed a medical operation and were now collecting money to travel back to their home town).
When we asked her why she had lied, she just shrugged.
We felt it would be fine to help a family in need, so we gave her some money.
However, the whole affair gave us an interesting problem to solve.
How do you tell if a set of numbers is ‘natural’ or if it has been made up by a person intent on making them look natural?
Well, it turns out that statistics can give you the tools to do that.
In nature, many processes result in random numbers that follow a certain distribution. And there are standard distributions that almost all numbers found in nature belong to.
For example, on the sheet of paper that the lady had presented, the figures for the money donated should have followed a normal distribution. There should have been a few high values and a few low values and a lot of the values in the middle.
Since that wasn’t the case I could easily tell that the numbers had been made up.
But you don’t need a human to tell you that. There are statistical tests that can be done to see if a set of numbers belongs to any expected distribution.
I looked around online and found an article that tells you about methods that can be used to check if a set of numbers belongs to a normal distribution (a distribution that occurs very frequently in nature): http://mathforum.org/library/drmath/view/72065.html
Some of the methods it talks about are the Kolmogorov-Smirnov test, the Chi-square test, the D’Agostino-Pearson test and the Jarque-Bera test.
Details of each can be found at these links (taken from the article):
One common test for normality with which I am personally NOT familiar, is the Kolmogorov-Smirnov test. The math behind it is very involved, and I would suggest you refer to other resources such as this page Wikipedia: Kolmogorov-Smirnov Test http://en.wikipedia.org/wiki/Kolmogorov-Smirnov_test You can read more about the D'Agostino-Pearson test and get a table that can be used in Excel here: Wikipedia: Normality Test http://en.wikipedia.org/wiki/User:Xargque#Normality_Test Wikipedia: Jarque-Bera Test http://en.wikipedia.org/wiki/Jarque-Bera_test One item of note: depending on how your stats program calculates kurtosis, you may or may not need to subtract 3 from kurtosis. See: Wikipedia Talk: Jarque-Bera Test http://en.wikipedia.org/wiki/Talk:Jarque-Bera_test
On to the next method:
Another property of many naturally occurring numbers is that about one third of them start with the number 1 !!! Surprising isn’t it?!!
Well, it turns out that this applies to population numbers, electricity bills, stock prices and the lengths of rivers.
It applies to all numbers that come from power law distributions (power laws govern the distribution of wealth, connections on facebook, the numbers of speakers of a language, and lot of numbers related to society).
This is called Benford’s law: http://en.wikipedia.org/wiki/Benford’s_law
(I believe that Benford’s law would have applied to the above case as well – donations would have a power law distribution – if you assumed that all donors donated money proportional to their wealth).
When I read about Benford’s law on Wikipedia (while writing this article), I found that it is already being used for accounting fraud detection.
The Wikipedia says:
Accounting fraud detection
In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford’s Law ought to show up any anomalous results. Following this idea, Mark Nigrini showed that Benford’s Law could be used in forensic accounting and auditing as an indicator of accounting and expenses fraud. In practice, applications of Benford’s Law for fraud detection routinely use more than the first digit.
There are also methods that can be used by governments and large organizations to prevent fraud in the issuing of tenders.
More about that in my next article.