A few weeks ago, I was invited to Mysore to give a talk. I wanted to leave the audience with a good idea of quality measurement in Machine Learning. After talking about Accuracy as a measure of quality and listing some of its drawbacks, I began to talk about Precision and Recall, two very important measures of quality of an AI system.
The slides were really complicated, so I had to simplify the explanation and translate it into layman’s terms:
The Life Lessons Explanation
The explanation went something like this:
You are here listening to me talk today. But there are so many other events taking place in the city of Mysore right now.
Some of them may be events worth going to (opportunities). Some may not.
If you attended 10 events out of a possible 200 events taking place, and 5 of them were worth it, your precision would be 50%.
A precision score of 50% indicates that you’re right 50% of the time.
Let’s say there were 200 events around Mysore that you could have attended, and 20 of them were really worth attending.
Since you attended 5 of the 20 events worth attending, your recall was 5/20, that is, 25%.
There are many opportunities for positive experiences out there waiting for you right now.
How many of them did you actually go out and make use of?
That’s what Recall tells you.
Precision is a measure of how well you chose those experiences.
You can aim to experience as many things as you possibly can. Or you could be very careful about what you expose yourself to.
Recall is about quantity.
Precision is about quality.
The Complicated Version
Now, I’ll try the formal version of the explanation on you to see if you like it.
The slide on Precision and Recall looked like this:
The terms True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN), are the key concepts one needs to understand to get a good grasp of what precision and recall are all about.
To understand TP, TN, FP and FN, you need to pick a point of view – a category that you are interested in.
Then, TP is the number of times that someone (or something) picks that category when the data is really of that category.
TN is the number of times that someone says that the data does not fall into the category when the data really does not.
Note that TP + TN represents the number of times they got it right.
FP is the number of times that someone picks the category when the data is really not of that category.
FN is the number of times that someone does not pick the category when the data falls into the category of interest.
Note that FP + FN is the number of times they got it wrong.
Note moreover that TP + FP is the number of times the person thought the data fell into the category of interest.
Also, TN + FN is the number of times the person thought the data did not fall into the category of interest.
There are two more interesting combinations.
TP + FN is the number of data items in the category of interest.
TN + FP is the number of data items not in the category of interest.
Precision is calculated as TP divided by TP + FP.
That is, precision is the fraction of the time someone is right when they claim that some data belongs to a certain category.
Recall is calculated as TP divided by TP + FN.
Recall is the number of data items belonging to the category of interest that the system being measured is able to identify.
In this explanation, you not only have truth being split into perceived truth and real truth, but also have perceived truth being split into two, and real truth also being split into two.
People have thought deeply about truth since times immemorial. According to the Wikipedia article on Dialectics, “The Sophists taught arête (Greek: ἀρετή, quality, excellence) as the highest value, and the determinant of one’s actions in life.”
But there lived in Greece a man who disagreed with that notion: “Socrates favoured truth as the highest value, proposing that it could be discovered through reason and logic in discussion: ergo, dialectic.”
It is surprising that it took till the late 1900s for that conception of truth to be advanced to where you split truth. Once you began to think of categorisation as a machine learning thing, you suddenly discover how truth has many sides to it.