Month: July 2012

How can you find out who wants to buy your products?

One way to understand a brand’s strengths is to see how often people say they want to buy that brand, on Twitter or other social media.

We have a demo of Twitter analysis for finding customers.

The following link takes you to a live demonstration of purchase intention detection being run on a Twitter stream:

http://www.aiaioo.com/demos/demo_use_case.html

What you will see on the page is tweets of people wanting to purchase something.

It is a fully automatic way of learning how many people want to buy your products, and who they are.

How product firms can expand into new markets with Worldjumper.com

This blog post is about a client of ours in Japan called Worldjumper.com.

Japanese businesses need a way to communicate with customers from outside Japan, to learn about their problems, and to find out more about their needs.

I was invited to visit Worldjumper’s office in Tokyo in late Winter to help with a tool that could deliver rapid and easy localization at high speeds and very low cost.

Localization is interesting because businesses need to adapt and change their message to suit different regions, cultures and languages.

For example, the message “WorldJumper can make customer queries in any language understandable to a service person who speaks Japanese” would be a great message to convey (in the Japanese language) to a client in Japan.

But in China, that message should say (in the Chinese language) “WorldJumper can make customer queries in any language understandable to a service person who only speaks Chinese.”

Worldjumper can help customers there. It can identify and prioritize messages that need localization and use crowd sourcing platforms to channel a huge volume of human effort to the task.

Worldjumper can also integrate very easily and quickly into a website.

If you are a product firm and have a website that needs to be readable in multiple languages, all you need to do is sign up for an account with Worldjumper and insert a snippet of HTML code into your website.

Now, your website will be readable in any language.

Moreover, you’ll be given a list of localization tasks, which you can manage directly from the Worldjumper console.

If you are a product firm, all you need to do to start selling your products to customers in countries where people speak languages that you do not understand, is take 5 minutes to insert the Worldjumper code into your website.

Once you have inserted the HTML, your site becomes multilingual. There will also be a contact form for you that can convert customer inquiries into your language, and then convert your responses back into the customer’s own language. I believe there are lots of plugins in the works – like chat tools and Facebook page plugins.

So, when you get a chance, do check out Worldjumper.com.

What life lessons can you learn from ‘precision’ and ‘recall’?

A few weeks ago, I was invited to Mysore to give a talk. I wanted to leave the audience with a good idea of quality measurement in Machine Learning. After talking about Accuracy as a measure of quality and listing some of its drawbacks, I began to talk about Precision and Recall, two very important measures of quality of an AI system.

The slides were really complicated, so I had to simplify the explanation and translate it into layman’s terms:

The Life Lessons Explanation

The explanation went something like this:

You are here listening to me talk today. But there are so many other events taking place in the city of Mysore right now.

Some of them may be events worth going to (opportunities). Some may not.

If you attended 10 events out of a possible 200 events taking place, and 5 of them were worth it, your precision would be 50%.

A precision score of 50% indicates that you’re right 50% of the time.

Let’s say there were 200 events around Mysore that you could have attended, and 20 of them were really worth attending.

Since you attended 5 of the 20 events worth attending, your recall was 5/20, that is, 25%.

There are many opportunities for positive experiences out there waiting for you right now.

How many of them did you actually go out and make use of?

That’s what Recall tells you.

Precision is a measure of how well you chose those experiences.

You can aim to experience as many things as you possibly can. Or you could be very careful about what you expose yourself to.

Recall is about quantity.

Precision is about quality.

The Complicated Version

Now, I’ll try the formal version of the explanation on you to see if you like it.

The slide on Precision and Recall looked like this:

The terms True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN), are the key concepts one needs to understand to get a good grasp of what precision and recall are all about.

To understand TP, TN, FP and FN, you need to pick a point of view – a category that you are interested in.

Then, TP is the number of times that someone (or something) picks that category when the data is really of that category.

TN is the number of times that someone says that the data does not fall into the category when the data really does not.

Note that TP + TN represents the number of times they got it right.

FP is the number of times that someone picks the category when the data is really not of that category.

FN is the number of times that someone does not pick the category when the data falls into the category of interest.

Note that FP + FN is the number of times they got it wrong.

Note moreover that TP + FP is the number of times the person thought the data fell into the category of interest.

Also, TN + FN is the number of times the person thought the data did not fall into the category of interest.

There are two more interesting combinations.

TP + FN is the number of data items in the category of interest.

TN + FP is the number of data items not in the category of interest.

Precision is calculated as TP divided by TP + FP.

That is, precision is the fraction of the time someone is right when they claim that some data belongs to a certain category.

Recall is calculated as TP divided by TP + FN.

Recall is the number of data items belonging to the category of interest that the system being measured is able to identify.

In this explanation, you not only have truth being split into perceived truth and real truth, but also have perceived truth being split into two, and real truth also being split into two.

People have thought deeply about truth since times immemorial. According to the Wikipedia article on Dialectics, “The Sophists taught arête (Greek: ἀρετή, quality, excellence) as the highest value, and the determinant of one’s actions in life.”

But there lived in Greece a man who disagreed with that notion: “Socrates favoured truth as the highest value, proposing that it could be discovered through reason and logic in discussion: ergo, dialectic.”

It is surprising that it took till the late 1900s for that conception of truth to be advanced to where you split truth. Once you began to think of categorisation as a machine learning thing, you suddenly discover how truth has many sides to it.