Month: August 2013

Can you find B2B prospects on Twitter?

This article was penned in response to a question that we came across on a LinkedIn group: “Is anyone in B2B getting twitter right?”

We posted the following reply:

“It’s something we’ve been grappling with, as a social media marketing product vendor.

Time and again, we’ve had users come back to us to say that they can find leads on Twitter (with the help of the filtering methods that we offer), but that the inquiries and leads that they find do not turn into profitable sales (that they are low-value inquiries, in that they come from people who don’t have a lot of money to spend).

After a long period of study we believe we’ve found out the reason for that. We believe that the reason is just that when buyers have the money to spend and believe in the ROI enough to be willing to put down a good sum of money on it, they do their research over a search engine and then reach out to vendors directly.

It is *possible* that it is only people who can’t afford the better solutions or are not sufficiently convinced of their benefits to be willing to put down a lot of money on them, who advertise their needs on Twitter in the hope of finding inexpensive alternatives.

That being said, we find that B2B customers are definitely able to use Twitter to spread awareness about their products or solutions.

Spreading awareness helps because it increases the probability that a person with the means will know of your solution when they consider a purchase. It is also possible that your awareness campaign on Twitter might lead to someone realizing that they have a need.

So, in our work with Twitter, we focus on helping B2B firms run highly targeted and inexpensive (not very time consuming) campaigns to spread awareness about their products and solutions. Our customers do not have to spend too much time on their work on Twitter precisely because we provide them with tools to target prospects very precisely and help them execute automated strategies for educating prospects. But they all do put in about one man hour a day.

Can you find B2B prospects on Twitter? Yes of course! You’ll be surprised at how many business owners are on Twitter even in India. We usually run 24 hour studies for prospective customers, and I’ve never had trouble picking out a few hundred people who need to be included in an awareness campaign in that time.“

This article was also posted on the Selasdia product blog: http://www.selasdia.com/blog/?p=179

In trust we god

Can trust affect the outcome of political events (war), business transactions (pricing) and economic affairs (poverty)?

This is a problem that I’ve been very interested in for many years.

A few years ago I came across papers in economics and game theory that supplied the mathematical tools that we need to analyse such problems.

So, I’ll take each area of interest 1) politics 2) business and 3) economics and explain how trust matters in each case.

1. Politics

Can the outcome of something like war be determined by trust?

Let’s assume an army of 2 soldiers.

In a war, the benefits to each soldier can be modeled as a bi-matrix (normal-form game) as follows:

	soldier 2 fights	soldier 2 flees
soldier 1 fights	5, 5	–5, 0
soldier 1 flees	0, -5	0, 0
Normal form or payoff matrix of a 2-player, 2-strategy game

The first of the two numbers in the matrix represents the payoff to soldier 1.

The second of the two numbers in the matrix represents the payoff to soldier 2.

(The soldiers win something (represented by 5 points) if their army wins; they win nothing if their army loses; and they lose their life (represented by -5 points) if they do not flee and their army loses; we assume the army wins if both soldiers do not flee and loses if one or both flee).

If soldier 1 trusts soldier 2 not to flee the battlefield, the best strategy for soldier 1 is to stay and fight as well (since he will then get more benefits than if he flees).

If soldier 1 does not trust soldier 2 to stay on the battlefield (if he suspects that soldier 2 will run away), then the best strategy for soldier 1 is to run away himself (so that he does not remain on the battlefield and get killed).

So, this model shows that if two equal 2 man armies meet on a battlefield, the one whose soldiers trust each other more will win.

2. Business (Pricing)

There is a very interesting paper by George A. Akerlof (‘The Market for “Lemons”: Quality Uncertainty and the Market Mechanism’).

It tries to explain why the price of a new car in a show room is so much higher than the price of a new car in the second-hand car market.

For example, a car costing $25,000 fresh out of the showroom, might fetch $18,000 if sold as a used car in the used car market.

Akerlof’s paper tries to explain why the price dropped so sharply.

Akerlof suggests that the price drop is a result of the uncertainty surrounding the quality of the car in the used-car market.

A certain percentage of cars in a used-car market will be defective (since anyone can sell a car in an unregulated market, and unscrupulous people would have put defective cars up for sale).

Let’s say 50% of the cars in the used car market are defective.

Now, a person buying a used car a day old will only be prepared to risk paying 50% of the showroom price for the car (because of the 50% chance that the car is worth nothing).

The Price of Trust

This result has the following unintended consequence:

The more a person trusts a seller, the higher the price he will be willing to offer for a car.

I’ll give you an example of that. (I’m sorry, but this is a bit racist).

When I was a student in North Carolina, and I was looking to buy a used car, I was given the following piece of advice by my fellow students.

They said, “Go for a car that an American is selling because they will tell you about any problems that it has. Don’t buy a car from an Asian or an Indian unless you know them well. They won’t tell you if there are any problems.”

I see the same effect even when doing business in India today – a lot of business happens through connections.

Price Sensitivity

It might also explain why Indians are so price sensitive.

Indians are said to be very price-sensitive, preferring the less expensive offerings over more expensive ones that promise better quality (I recall Richard Branson said that at one point while explaining why he didn’t want to enter India).

I think the price sensitivity is a result of Indians not being able to trust promises of higher quality from their countrymen.

Price becomes the only measure that Indian buyers are able to trust to when making a purchasing decision, leading to extreme price-sensitivity in the Indian market.

Hiring and ‘Brain Drain’

Even in hiring, this can have the effect of driving down salaries.

When hiring someone, an Indian firm is likely to offer a lower salary than the market, because they don’t trust in the abilities of the person being hired.

In Akerlof’s paper, he talks about a side-effect of a lack of trust. He says that good quality cars will just stop being sold on the low-trust markets.

The applies to the job market in India as well: Indian firms tend to offer lower salaries, which might lead to the best engineers choosing MNCs over Indian firms or leaving Indian shores altogether.

3. Economics

I’ve described in an earlier blog how man-in-the-middle systems of government can fail to work efficiently if the man-in-the-middle is corrupt.

I’ve described in that post how resources can be wrongly allocated in the presence of corruption.

https://aiaioo.wordpress.com/2013/08/15/who-betrayed-ekalavya-2/

The result of an inefficient allocation of our resources is poverty.

For example, the Indian government has tripled defence spending in the last 10 years – through heavy borrowing – when it is possible to show that we need to allocate whatever money we have to education (see our arguments for that https://aiaioo.wordpress.com/2012/06/04/algorithms-to-combat-poverty/).

World Bank studies (that you can get off an Indian Reserve Bank website) show that corrupt governments spend more on arms (because of how easy it is to hide kickbacks from arms deals) than honest governments.

So, the economic prosperity of a country can be impacted by corruption.

Causes of Corruption

But we can ask a deeper question: “What causes corruption?”

I’ll try to show right here that it is a lack of trust.

Take for example two players in a bidding war (let’s say that they are bidding for a government contract).

Each has the choice to give a bribe or not to give a bribe.

Player 1 is more likely to give a bribe if player 1 does not trust player 2 to not offer a bribe to the government official.

It’s the same decision matrix that I have used for the case of the 2 soldier army.

So you get it?

Everything depends on trust.

Philosophy

I am probably way out of my depth on this, but the ancient Greeks seem to have had two views on the supreme ideal that man should strive for.

According to the Wikipedia article on Dialectics:

“The Sophists taught arête (Greek: ἀρετή, quality, excellence) as the highest value, and the determinant of one’s actions in life.”

But there lived in Greece a man who disagreed with that notion: ”Socrates favoured truth as the highest value, proposing that it could be discovered through reason and logic in discussion: ergo, dialectic.”

But the above models seem to suggest that truth (honesty) results in trust (you know that the guy next to you is honest and won’t lie about the quality of a car or bribe a government official to get ahead of you).

And what the Akerlof paper shows is that trust rewards and promotes quality.

In other words, the two Greek concepts of quality (of the values mankind must uphold for its own good) are probably one and the same.

1. Framework for evaluating values

2. What traffic can reveal about society

3. Who betrayed Ekalavya?

4. Can economics change the world?

5. Is there an algorithm to combat poverty?

6. Why dance is undervalued

7. Is 5 very far from 4?

Who betrayed Ekalavya?

Sometimes, historical and literary narratives shed more light on the silent spectators to the events described than on the main actors.

One such narrative is that of Ekalavya in the Indian epic “Mahabharata”.

Ekalavya’s tale is a deeply distressing one.

It is the story of a young man and of his society’s inexplicable indifference to quality (inexplicable because their indifference is bad for them).

In the story, Ekalavya is sentenced to lose his thumb for the sole reason that he is a better archer (through his own efforts) than the royal princes of the land.

The story is scary because if you put a little thought into it, you realise that:

a) the people around Ekalavya valued the influence represented by the princes more than the quality represented by Ekalavya.

b) they failed to realize that should their country ever be invaded, there would be one less good archer to defend them.

c) they proved incapable of realizing how demotivating the sentence would have been to all the other archers in the land (whose help they’d need in times of trouble).

d) they did not stand up for one of their own – did not defend a vulnerable member of their team – did not protect a kid.

e) they condoned nepotism.

f) they approved of a teacher misusing a student’s trust.

Each time we retell the story of Ekalavya without realizing this, we become, in a sense, complicit in it.

Corruption in India leads to a situation where low quality is very likely to be rewarded.

We had written about a model of corruption (involving three parties https://aiaioo.wordpress.com/2012/11/18/tools-for-the-mind-and-how-you-can-change-the-world/) where the person offering the lowest quality of service is the one who is most likely to be rewarded.

Here is a brief description of the same:

—

Man-In-The-Middle Corruption

This is corruption where someone is appointed a trustee over a common pool of resources. He is now a middle-man who must allocate those resources fairly.

In the realm of public services, like the construction of roads and schools, that middle-man is government.

In the presence of corruption, the middle-man ends up selecting the service-provider who pays the highest bribes, not the service-provider who does the best job.

This leads to a market where the lowest-quality service provider wins and the higher quality providers leave the market altogether – a result that follows from the work of George A. Akerlof (‘The Market for “Lemons”: Quality Uncertainty and the Market Mechanism’).

—

But we must remember that there is a price we must pay for choosing low quality. Bad roads, delays and poor infrastructure can all be traced back to low quality preference.

But another price we pay is poverty.

To illustrate that, I must point you to an article http://boingboing.net/2008/08/08/california-supreme-c-1.html on the California supreme court directive making non-compete clauses unenforceable in California.

The article shows that supporting and protecting quality helps the economy:

“I’m reminded of the study from the Duke Center for the Public Domain that concluded that the reason that the tech corridor on Route 128 near Boston had grown so much more slowly than Silicon Valley was that Massachusetts has enforceable non-competes, while California does not. The researcher concluded that in California, the best talent moved to the best companies, while on Route 128, crummy companies could lock up great people for years at a time through non-compete agreements.”

Each time someone undeserving is preferred for a job, each time kickbacks are given, we have – in a sense – betrayed Ekalavya.

POST EDIT:

(Here is an article on how one might compute the quality of value systems. References some very interesting work by Daphne Koller on using graphical models and game theory to model multi-agent decision-making frameworks).

Building Machine Learning Models that can help with Customer Service and Supply Chain Management

The Laptop that Stopped Working

One fine day, a couple of months ago, a laptop that we owned stopped working. We heard 4 beeps coming from the machine at intervals but nothing appeared on the screen.

Customer Service

The service person quickly looked up the symptoms in his knowledge base and informed us that 4 beeps meant a memory error.

I replaced first the two memory modules one by one, but the machine still wouldn’t start. I tried two spare memory modules that I had in the cupboard but the computer wouldn’t start.

I had a brand new computer with me that used the same type and speed of memory as the one we were fixing. I pulled out its memory chips and inserted them into the faulty computer, but still no luck.

At that point, the service person told me that it must be the mother board itself that was not working.

Second Attempt at Triage

So the next day, a mother board and some memory arrived at my office. A little later a field engineer showed up and replaced the mother board. The computer still wouldn’t start up.

When the field engineer heard 4 beeps, the engineer said it MUST BE THE MEMORY.

Third Attempt at Triage

A few days later, a new set of memory modules arrived.

The engineer returned and tried inserting the new memory in. Still no luck. The computer would not start and you could still hear the 4 beeps.

A third set of brand new memory modules and a new mother board were sent over.

Fourth Attempt at Triage

The engineer tried both motherboards and various combinations of memory modules, but still, all you could hear were 4 beeps and the computer would not start.

During one of his attempts to combine memory and motherboards, the engineer noticed that though the computer did not start, it did not beep either.

So, the engineer guessed that it was the screen that was not working. But just to be safe, he’d ask them to send another motherboard and another set of memory modules to go with it.

Fifth Attempt at Triage

The screen, the third motherboard and the fourth set of memory modules arrived in our office and an engineer spent the day trying various combinations of screens, motherboards and memory modules.

But the man on the phone said: “Sir, 4 beeps means there is something wrong with your memory. I will have them replaced.”

I had to take out my new laptop’s memory and pop it into the faulty machine to convince the engineer and support staff that replacing the memory would not fix the problem.

All the parts were now sent over – the memory, motherboard, processor, drive, and screen.

Sixth Attempt at Triage

Finally, the field engineer found that when he had replaced the processor, the computer was able to boot up with no problems.

Better Root Cause Analysis

The manufacturer could have spared themselves all that expense, time and effort had they used an expert system that relied on a probabilistic model of the symptoms and their causes.

Such a model would be able to tell, given the symptoms, which component was the most likely to have failed.

Such a model would be able to direct a field engineer to the component or components whose replacement would be most likely to fix the problem.

If the attempted fix did not work, the model would simply update its understanding of the problem and recommend a different course of action.

I will illustrate the process using what is known in the machine learning community as a directed probabilistic graphical model.

Run-Through of Root Cause Analysis

Let’s say a failure has occurred and there is only one symptom that can be observed: the laptop won’t start and emits 4 beeps.

The first step is to enter this information into the probabilistic graphical model. From a list of symptoms, we select the ones that we observe (all observed symptoms are represented as yellow circles in this document).

So the following diagram has only one circle (observed symptom).

Model 1: The symptom of 4 beeps is modeled in a probabilistic graphical model with a yellow circle as follows:

Now, let’s assume that this symptom can be caused by the failure of memory, the motherboard or the processor.

Model 2: I can add that information to the predictive model, so that the model now looks like this:

The model captures the belief that the causes of the symptom – processor / memory / motherboard failure are (in the absence of any symptoms) independent of each other.

It also captures the belief that given a symptom like 4 beeps, evidence for one cause will explain away (or decrease the probability of) the other causes.

Once such a model is built, it can tell a field engineer the most probable cause of a symptom, the second most probable cause and so on.

So, the engineer will only have to look at the output of the model’s analysis to know whether he needs to replace one component, or two, and which ones.

When the field engineer goes out and replaces the components, his actions can also be fed into the model.

Model 3: Below is an extended model into which attempts to fix the problem by replacing the memory can be incorporated.

If a field engineer were to feed into the system the fact that the memory was replaced with a new module and it didn’t fix the problem, the system would be able to immediately figure out that the memory could not be the cause of the problem, and it would suggest the next most probable cause of failure.

Model 4

Finally, in case new memory modules being sent to customers for repairs frequently turned out to be defective, that information could also be added to the model as follows:

Now, if the error rate for new memory modules in the supply chain happens to be high for a particular type of memory, then if memory replacement failed to fix a 4-beep problem, the model would understand that faulty memory could still be the cause of the problem.

Applications to Supply Chain Management

The probabilities of all the nodes adjust themselves all the time and this information can actually be used to detect if the error rates in new memory module deliveries suddenly go up.

Benefits to a Customer Service Process

1. Formal capture and storage of triage history

2. Suggestion of cause(s) given the effects (symptoms)

3. Suggestion of other causes given triage steps performed

What the system will seem to be doing (to the layman):

1. Recording symptoms

2. Recommending a course of action

3. Recording the outcome of the course of action

4. Recommending next steps

Should Cecilia have said “insecure” instead of “unsecure”?

In this funny PhD Comic, the main character – Cecilia (the girl in red) – says:

“Do you realize how unsecure your coffee distribution system is?”

That made me wonder – should she have said ‘insecure’?

Even the WordPress spell-checker has a problem with “unsecure”.

It thinks that “unsecure” is a spelling error.

However, the word “insecure” doesn’t sound as if it were the right term to use in the context of computer security.

That is because the word “insecure” is usually used in the context of a person to mean a person who is not confident and self-assured.

To call a computer “insecure” would be a bit like saying that the computer had self-image issues.

Others have written about this cognitive dissonance as well (see http://english.stackexchange.com/questions/19653/insecure-or-unsecure-when-dealing-with-security for a nice discussion).

Given the problem, the author of the cartoon seems to be justified in using a newly-minted word (one not found in any dictionary) in order to describe the lack of security.

This is also very interesting because it throws some light on how words are born.

Before I can explain what I mean, I’ll need you to take a look the Oxford dictionary’s definitions of the word “insecure” (from the Oxford English Dictionary online search at http://oxforddictionaries.com/definition/english/insecure?q=insecure):

insecure

Pronunciation: /ˌɪnsɪˈkjʊə, ˌɪnsɪˈkjɔː/

adjective

1 uncertain or anxious about oneself; not confident: a rather gauche, insecure young man, a top model who is notoriously insecure about her looks

2 (of a thing) not firm or fixed; liable to give way or break: an insecure footbridge

not sufficiently protected; easily broken into: an insecure computer system

3 (of a job or situation) liable to change for the worse; not permanent or settled: badly paid and insecure jobsa financially insecure period

There are three ways in which the word “insecure” can be used.

The second usage would have been perfect for the context of computer security.

But the first usage might be conflated with the second in that context.

And that is because (sorry, I no longer recall the references to support this claim) computers appear to the human mind to have human-like characteristics (we say things like “Google tells me that …” or “my computer has gone to sleep”).

So, the only word in the dictionary that can do the job – the word “insecure” – has a conflict of interest.

And therefore, a new word needs to be coined that is not susceptible to the same sort of ambiguity.

And if the new word “unsecure” catches on, then one day, the second sense of the word “insecure” could become extinct in the context of computers.

Oh well, “it’s only words!”

POST EDIT

A friend pointed out that the Google NGram Viewer shows a history of the use of the word “unsecure”: http://books.google.com/ngrams/graph?content=unsecure.

The word seems to have been in use between 1650 and 1850 (there is evidence of use in literature), and has in more recent times simply fallen out of circulation (being eclipsed by “insecure” in around 1750). Thanks, Prashant.

(You can also search for those early usages in books – http://books.google.com/books?id=WmpCAAAAcAAJ&pg=PA12&dq=%22unsecure%22&hl=en&sa=X&ei=aOcLUq7aA-3iyAHu8YGwAg&ved=0CDMQ6AEwAA#v=onepage&q=%22unsecure%22&f=false)

Analysing documents for non-obvious differences

The ease of classification of documents depends on the categories you are looking to classify documents into.

A few days ago, an engineer wrote about a problem where the analysis that needed to be performed on documents was not the most straight-forward.

He described the problem in a forum as follows: “I am working on sub classification. We already crawled sites using focused crawling. So we know domain, broad category for the site. Sometimes site is also tagged with broad category. So I don’t require to predict broad class for individual site. I am interested in sub-classification. For example, I don’t want to find if post is related to sports, politics, cricket etc. I am interested in to find if post is related to Indian cricket, Australia cricket, given that I already know post is related to cricket. Since in cricket post may contains frequent words like runs, six, fours, out,score etc, which are common across all cricket related posts. So I also want to consider rare terms which can help me in sub-classification. I agree that I may also require frequent words for classification. But I don’t want to skip rare terms for classification.”

If you’re dealing with categories like sports, politics and finance, then using machine learning for classification is very easy. That’s because all the nouns and verbs in the document give you clues as to the category that the document belongs to.

But if you’re given a set of categories for which there are few indicators in the text, you end up with no easy way to categorize it.

After spending a few days thinking about it, I realized that something I had learnt in college could be applied to the problem. It’s a technique called Feature Selection.

I am going to share the reply I posted to the question, because it might be useful to others working on the classification of documents:

“You seem to have a data set that looks as follows (letters are categories and numbers are features):

A P 2 4
A Q 2 5
B P 3 4
B Q 3 5

Let’s say the 2s and the 3s are features that occur very frequently in your corpus while the 4s and the 5s are features that occur far less frequently in your corpus.

When you use the ‘bag of words’ model as your feature vector, your classifier will only learn to tell A apart from B (because the 4s and 5s will not matter much to the classifier, being overwhelmed as it is by the 2s and 3s which are far more frequent).

I think that is why you have come to the conclusion that you need to look for rare words to be able to accomplish your goal of distinguishing category P from category Q.

But in reality, perhaps what you need to do is identify all the features like 4 and 5 that might be able to help you distinguish P from Q and you might even find some frequent features that could help you do that (it might turn out that some frequent features might also have a fairly healthy ability to resolve these categories).

So, now the question just boils down to how you would go about finding the set of features that resolves any given categorization scheme.

The answer seems to be something that literature refers to as ‘Feature Selection’.

As the name says, you select features that help you break data points apart in the way you want.

Wikipedia has an article on Feature Selection:

http://en.wikipedia.org/wiki/Feature_selection

And Mark Hall’s thesis http://www.cs.waikato.ac.nz/~mhall/thesis.pdf seems to be highly referenced.

Mark Hall’s thesis – “A good feature subset is one that contains features highly correlated with (predictive of) the class, yet uncorrelated with (not predictive of) each other.”

To be honest to you, I’d heard about Feature Selection, but never connected it to the problem it solves until now, so I’m just looking up reading material as I write.

Best of luck with it.“

Wishful Thinking and Leprechauns

I recently came across a lovely cartoon on Leprechauns and social media.

Fortunately for us, we have a leprechaun in the office.

(So, now you know where we get our startup funding from).

Here’s a picture of the guy (that’s the cubicle he shares with Selasdia):

Just kidding!

One of our business partners brought the little pewter leprechaun in the picture back to India for us from Ireland.

It might have once been popularly believed in Ireland that leprechauns had the ability to grant Wishes.

And we find Wishes immensely interesting because some of the earliest work on Intention Analysis started out as an attempt to detect and classify Wishes.

In fact, one of the loveliest papers on the subject started out with an attempt to study what people wished for (wanted) on New Years Day.

You can read the paper here: http://pages.cs.wisc.edu/~jerryzhu/pub/wish.pdf

It has a very beautiful title: “May All Your Wishes Come True: A Study of Wishes and How to Recognize Them”

You also find the word Wishes in the title of one of the first attempts in research literature to find “buy” intentions:

http://www.aclweb.org/anthology-new/W/W10/W10-0207.pdf

It is a paper titled, again quite poetically (what’s with Wishes and beautiful titles!) “Wishful Thinking – Finding suggestions and ‘buy’ wishes from product reviews”.

This paper was written by a research team working at Cognizant (India) in 2010.