The Perplexed Bayes Classifier


We are proud to announce our latest research paper.

This paper describes an improved version of the grand old Naive Bayesian classification algorithm with rather interesting mathematical implications.

We have shown in this paper that it is possible to create a classifier that can classify data in exactly the same way as a Naive Bayes classifier, but without using the same “naive” independence assumptions from which the old classifier got its name.

Now smoke that for a second.  What we’re saying is that it was completely unnecessary to make those independence assumptions to get that accuracy after all.

So you take them out.

And what results is vastly better posterior probabilities. In other words, the classifier’s confidence in a prediction becomes vastly more believable.

But since it makes the same decisions as a Naive Bayes algorithm, its accuracy is provably the same!

We call the new classifier the Perplexed Bayes classifier because it uses the reciprocal of perplexity (the geometric mean) as the combination operator for individual feature probabilities.

It turns out moreover, that the mathematical implications of using this operator are that the naive independence assumptions disappear.

The Perplexed Bayes classifier and the math thereof are described in this draft research paper:

This new classifier is as easy to build as a Naive Bayes, has the same accuracy, learns fast, and returns confidence scores that are way better.

And the math is amenable to retrofitting into Hidden Markov Models and Probabilistic Graphical Models to dissolve invalid independence assumptions.

Yes, as I said, the mathematical implications are very interesting.

The paper will be presented at ICON in Trivandrum in December 2015:

Protecting against deadly stampedes during the Hajj and other religious festivals

More than 700 people were killed in a stampede during the Hajj pilgrimage of 2015, which took place just a few days ago.

There have also been stampedes during religious events in India that have cost us hundreds of lives.

So pin-pointing the causes of death and injury in stampedes, and devising methods of prevention is of great importance to a large number of people.

In our earlier posts on stampedes, we looked at possible causes of deaths on sloping paths:

  1. How to prevent death and injury in stampedes – Part 1
  2. How to prevent death and injury in stampedes – Part 2

However, we had not been able to explain how people could die on flat ground.

In this article, we present a model for how forces on people in a crowd of flat ground can increase to such a magnitude that people would be crushed to death.

We also present a number of mechanisms for preventing deaths due to excess pressure in crowds.

On flat ground, as long as everyone in the crowd remains standing and stationary, there would be no horizontal crushing force.

However, a person can generate a force by trying to move in any direction.

Let’s say one person can generate 10 Kilograms of lateral force.

Now, if ten people stood one behind the other, in contact with each other, and pushed in the same direction, they could be expected to generate approximately ten times the 10 Kg of lateral force.

In other words, they’d be exerting 100 Kg of force on anything ahead of them, as shown in the figure below.


When people in a crowd experience such accumulated forces, they are either injured physically or asphyxiated.

Autopsies of victims of asphyxiation in stampedes showed that they could have experienced pressures on their chests of around 6.4 psi.

If the area of the torso coming in contact with another person in a crush is 2 square feet, about 1 ton of force (about 1000 Kg) would be needed to exert a force of 6.4 psi.

A tightly packed column of 100 people could generate that kind of cumulative lateral crushing force.

So, if a tightly packed column of people say a 100 men/women deep were to suddenly be obstructed, say by a barrier or by another group of people crossing their path, the forces experienced by those in front (or at the intersection) could be as high as 1 ton.

This seems to have been what happened in Mecca a few days ago.

How people were injured

According to eye witness accounts of the stampede during the Hajj, the deaths occurred on a flat road, and there had been pushing and jostling at the start of the stampede:

“As our group started to head back, taking Road 204, another group, coming from Road 206, crossed our way,” said another worshipper, Ahmed Mohammed Amer.

“Heavy pushing ensued. I’m at a loss of words to describe what happened. This massive pushing is what caused the high number of casualties among the pilgrims.”

Something very similar seems to have been reported by a witness to the Hajj stampede of 2006 where 350 people died:

On January 12, as we were returning to Mina for the last ritual of Haj, we saw the big stampede from a distance as waves of people collided.

Mathematical / Physical Models

I will now attempt to show that in a constrained space, even higher forces can be generated by a wedging effect.

The Wedge Effect

A wedge is a mechanical device that can amplify forces.

If a wedge that is four times as long as it is tall is used, and a force of 10 Kg applied along its longer edge, it can generate a force of 40 Kg in the direction of the shorter edge, as shown in the following diagram.


Restraints of any kind (railings, barriers, fences, chains) can act as wedges and increase the pressure within a crowd perpendicular to the direction of movement of the crowd.

So, a column of 20 people can generate a force of one ton if they were wedged in between fences of an aspect ratio of 1:5 (the fence closed in by 1 meter for every 5 meters of road), as shown in the following diagram (for space, we have demonstrated that a column of 5 people can generate a lateral force of 250 Kg on account of the wedging).


Wedging could also occur if the path had no constrictions, if people in the crowd moved in opposite directions, as shown in the following figure.


The  above kind of wedging is probably what caused the deaths at a Love Parade in a crowd that had been standing still.

So, the following need to be eliminated to prevent deadly crushes:

  1. Obstructions to the movement of a tightly packed column of people
  2. Any wedges that can amplify pressures

A Partial Solution

The organizers could therefore probably improve the safety of their events by doing the following:

Parallel Channel Movement

Organizers could close off all intersections, and keeping all movement going along completely parallel, non-intersecting channels.

This would ensure that there could be no obstructions to movement.

Prevention of Wedging

Organizers would need to ensure that routes never constrict.

So, gates and converging roads would need to be avoided.

Also all traffic would have to be one-way.

This would prevent the formation of wedges.


The Hajj Stampede is a Fluid Dynamics Problem

Why Crowds Turn Deadly

The Science of Human Stampedes
How to Survive a Stampede



Manipulation of the Net Neutrality debate in India by Facebook?

For the past few days, I have been seeing the following at the top of my Facebook page:


Is this merely a harmless tool provided to Indians by Facebook to express their nationalism better?

Or could it be something more sinister?

I suspect the latter for the following reasons:

  1. False Choices!

The choices afforded to the Indian Facebook user are baffling:

You can either say “Yes, I’m in”.

Or you can say “Not now”.

You can’t say “No”.

So, Facebook have made it clear that they are not interested in hearing an objection.

And they’ve made it very clear what they want to hear.

2.  Leading Question

The question is “Do you support Digital India?”

You can’t really say no to that, can you?

But it has been argued in some recent news articles that what Facebook really means is: “Do you support Facebook’s initiative?”

And that’s entirely another thing.

They’ve just swapped one question for another (and you’ll see a list of recent news articles discussing this swap if you scroll down a bit).

What is Facebook’s initiative?

It’s an initiative to allow ISPs in India to allow content providers (websites) to pay for better access to internet users.

This clashes with one of the guiding principles of the internet, which is that anyone should be free to throw up a website on the internet, and everyone should be able to access it.

What wishes to do is permit ISPs in India to provide free access or high speed access to some websites (who pay for this access), and make it more difficult (slower, costlier or simply impossible) to access other websites (the ones who do not pay up).

That will affect smaller firms and help larger firms by making it more difficult for offerings from smaller vendors (who can’t pay comparable access charges) to be discovered by a potential buyer.

In other words, it will help the big get bigger and make it harder for the small fish to survive (propagate/magnify inequality).

Therefore, this is similar to the effect of the possible Google search engine bias discussed in a previous post.

Advocates of ‘net neutrality’ in India have been fighting a campaign to impress upon people that this goes against the spirit of the internet.

3.  Disguised Survey

The third and most serious problem I have with this device (the question that appeared at the top of my Facebook page) is that Facebook seem to be hiding the fact that they are doing a survey.

A few months ago, a similar box had appeared at the top of my Facebook feed.

It contained a question:

“Would you like to see poor people in India get free internet access?”

… or something like that.

A large number of people clicked the “Yes” button (the only other choice was “Not now”).

It appears that a few days ago, Facebook announced in a reply to the Telecom Regulatory Authority of India (TRAI) that 17 million Indians supported their initiative, and provided a list of comments from the aforesaid Indians to TRAI.

A news source believes that these were merely the Facebook users who had clicked “Yes”.

Here are the primary sources and a few derived news articles:


And here is how Facebook’s position seems to be evolving:


Is there anything wrong with offering free internet access to the poor?

Many Indians would love to see the provision of free internet access to the poor.

There is nothing wrong with that.

The free market allows anyone to offer free internet to anyone who will use it.

There is nothing wrong with that.

Website owners will have to pay the ISPs if they want their websites to be accessed by users for free.

There is nothing wrong with that.  There is no such thing as a free lunch.

It will hurt the small players, and propagate or magnify inequalities and the wealth gap between the rich and the poor.

Too bad, that is capitalism.  Nothing very wrong with that.

Facebook will probably make a load of money off of it through advertising and partnership deals.

Of course!  Get a life!  There is nothing wrong with that.

Facebook is currently running advertisements on Indian television announcing that their initiative will give poor rural Indians free access to the internet.


If what they were saying were true, it would be a wonderful thing.

But only if it were true.

However, Facebook’s initiative offers access to only about 50 websites (Facebook + Facebook partners + some utility websites) that Facebook has chosen for India.

That is not the kind of ‘internet’ that will teach poor villagers in India how to build windmills (the theme of one of the television advertisements being paid for by Facebook) now, is it?

So, the advertisements on television are possibly misrepresenting the facts, misleading voters and misinforming a largely uneducated or under-educated general public.

Yes, there is something wrong about that.

Finally, I suspect that Facebook are trying to get the Indian government to put up some money for or at least give them its endorsement of their scheme.

If the Indian government spends the tax payer’s money on or endorses something that is arguably a philosophically undesirable precedent for the internet, there is something wrong about that.

The above survey is possibly being used to make it falsely appear as if users in India overwhelmingly support their initiative.

And there is something wrong with that.

And finally, Facebook are:

  1. tricking users into participating in a survey (by disguising it as an exercise in patriotism)
  2. restricting their responses in a way that benefits Facebook financially, and
  3. claiming the support of users they have tricked in an attempt to influence a country’s telecom regulatory authority.

And there is something wrong with that.

UPDATE (added on 30th September)

A Facebook spokesman has issued a statement to the effect that those who clicked on the above survey would not be counted as supporters of

The statement does not explain what happened on the earlier occasion (when Facebook counted those who responded “Yes” to a very similar survey – about a month ago – as supporters of in their communications with the Telecom Regulatory Authority of India).

And here are a few more links that friends sent me in their comments:

  1. An against position:
  2. A balanced (or slightly for?) position:

SECOND UPDATE (added on 16th October)

  1. An article appeared in the BBC news service talking about pretty much what we’ve talked about above:

What might be the next big thing in telephony?

Have you also felt that Apple hasn’t released any really earth-shattering new features on the iPhone in a very long time?

I recently saw an Apple iPhone (version 6s) with the marketing tagline:

“The only thing that’s changed is everything.”

And I couldn’t help thinking that “everything’s changed” meant “nothing’s changed much”.

Or required the addition of “not in any way that matters”!

What’s the next big thing?

I am a big fan of Apple’s innovation.

They ushered in a revolution in computing user interfaces (multi-touch) and telephony.

However, isn’t it time for video telephony.

Phones have been used to connect people over an audio channel over great distances.

But wouldn’t it be fabulous to have the option to see those we are speaking to as well?

I know, there are apps

I know there are many apps which let you see who you’re talking to.

One is Skype.  Another is Whatsapp.

But, if you had an open protocol governing video calling, then video calls could become a feature in landline phones as well.

You could say receive a phone call on a landline phone and project the incoming video onto a big screen.

And that might change the nature of telephony.


The big enabler of video telephony might be 4G data networks which are fast enough to carry high quality video feeds.

So, video telephony might help sell 4G plans, and that might make carriers keener to support phones with video telephony capabilities.

Still why not apps?

Because video telephony is not common.

I speak to a lot of people on the phone everyday, but I rarely use video, though I am sure I would have liked to in many of those cases.

Making video a natural part of the phone experience might be all it takes to get customers to adopt video calling.

Finally, using an open protocol would allow apps to be built over the video phone.

For example, gaming apps and shopping apps could be built to interact with users telephonically.

You might dial a shop and be able to view clothes right on the racks or talk to the lady at the counter.

Being transferred to another department would take on a very different and literal meaning.

Until Then

Someone I know suggested that this would be a great way to add video to a phone call.


The Importance of Perfection in Product Design

In some technology spaces, you don’t reach the minimum threshold of usability until your product gets very, very, very, very, very close to absolute perfection.

Take the touch-screen interface.

By the time Apple got it right with the multi-touch interface (operated with the fingers) so many products had failed to scale the usability mountain.

The change that took touch screens over the final hurdle was the switch from stylus to fingers with a host of small changes that mattered:

The absence of menus on the main screen.

The scrolling.

The pinch and gestures.

The application ecosystem.

Going from 90% to 99% of the way to absolute perfection made all the difference to usability.

So it is with designing AI products.

So it might be with a feature like video telephony.

Which is why I feel it needs to be a seamless part of the phone.

Older articles

We also touched upon the value of perfection in an earlier article on the disproportionate importance of pushing ratings as high as they can go in the range of 4 and 5.



Why Google is being investigated for rigging search results

I read in an article a few days ago that Google is being investigated by the Competition Council of India on suspicion of rigging search results.

One of the complainants was none other than Flipkart which is, I believe, one of the largest e-commerce companies in India.

Flipkart seems to have complained that ‘it found search results to have a direct correlation with the amount of money it spent on advertising with Google through Google’s Adwords program’ (the quote is from the news article; I haven’t seen the actual complaint).

Iff Flipkart’s observations are indeed true, and if Flipkart can establish beyond a shadow of doubt the existence of a correlation between advertising expenditure and search ranking (for a random allocation of advertising spend – and we see later why this is important) then it could have serious implications for digital marketers.

If the search rankings really correlate with advertising spend, it would mean that customers would be better off not spending any money at all on advertising with Google, rather than spending a small amount of money on the same.  In other words, it impacts the choice of SEO vs SEM for digital marketing.


SEO and SEM are two strategies available to firms to bring their offerings to the notice of customers seeking information using search engines.

SEO (Search Engine Optimization) involves optimizing the text and links of web pages so that they rank higher in search results.

SEM (Search Engine Marketing) is the term used to describe the strategy of paying search engines (like Google) to display ads about a firm’s offerings alongside search results.

If there is a causal relation between advertising spend and ranking, then a common strategy used by many Indian firms – that of spending a little on SEM in addition to SEO – might hurt rather than help them.

Is Flipkart’s complaint valid?

I don’t have a detailed study and can’t provide incontrovertible evidence for the validity or invalidity of Flipkart’s complaint.

However, I have some anecdotal evidence that suggests that Flipkart’s complaint might hold some water.

And I am going to present the evidence to you in the form of a search experiment that you can all perform yourselves.

Search Experiment

Here’s an exercise that you can all try yourselves.

There is a book called “Taming Text” that is a practical introduction to a text search platform called ‘Solr’.

When I search for the string “taming text” (the location is India and I use a browser into which I am neither logged in nor signed into Google from), I get the following results:

Results for 'taming text' page 1 (above the fold)
Results for ‘taming text’ page 1 (above the fold)

As you can see, Flipkart is nowhere to be seen (though it is the largest online book retailer in India).

If you scroll down to the bottom and look ‘below the fold’, you see the following:

Results for 'taming text' page 1 (below the fold)
Results for ‘taming text’ page 1 (below the fold)

The Flipkart product page does not show up here either.  But you see an Amazon India advertisement right at the bottom.

Let’s look at the second page.

Results for 'taming text' page 2 (above the fold)
Results for ‘taming text’ page 2 (above the fold)

Again, no luck.  You get a link to Google books, but no Flipkart.

(We looked through 10 pages of results but found no link to Flipkart.  Did you?)

So, we try a different search.

We enter “taming text flipkart” into the search engine.

And here are the results!

Results for ‘taming text flipkart’ page 1 (above the fold)

This time, the Flipkart page shows up right at the top!

In addition, a Flipkart advertisement shows up right above it.

So it appears that Flipkart had bid on the ‘flipkart’ keyword or on ‘taming text’ or on some combination of ‘flipkart’ and ‘taming text’.

When we scroll to the bottom of this page of results, we see:

Results for 'taming text flipkart' page 1 (below the fold)
Results for ‘taming text flipkart’ page 1 (below the fold)

Amazon, it appears, had also bid for either ‘flipkart’ or ‘taming text’.

Moreover, we see that Google has obviously indexed the Flipkart product page for ‘Taming Text’.

Then why was the Flipkart page ranking so low as compared to the Amazon India page?

Evil or Innocent

Could the difference in rankings be caused by the differences in advertisement expenditure on different keywords by different vendors?

If so, it would be evil.

Flipkart seems to think so, as evidenced by their complaint.

But, as it turns out, that does not have to be the case.

It is possible for the search rankings to be correlated with advertising spend without the latter causing the former (correlation without causation) in the following manner.

If Flipkart’s own SEM algorithms had bid higher on keywords that described their products better, that could in and of itself have resulted in a correlation between search ranking and ad-spend.

You probably saw a case of that in the example above.

The term ‘taming text flipkart’ would certainly have matched the link to the Flipkart product better than the link to the Amazon product.  This is because of the appearance of the word ‘flipkart’ in the Flipkart URL (the word ‘flipkart’ would not have appeared in the Amazon URL).

So, the fact that more words in the search string were matched would have caused the Flipkart URL to be ranked higher.  If Flipkart had bid on the keyword ‘flipkart’ but not on ‘taming text’, it would appear as if the rankings were correlated with the advertising spend.  But one (the expenditure) would not have caused the other (the rankings).

Similarly, for the string ‘taming text’, the Flipkart product URL could have ranked lower than the Amazon URL merely because there are fewer buyers for the book in India than in the USA.  This could have resulted in Google’s machine learning algorithms associating the name of this book with the keyword ‘Amazon’.

Thus, there could have been a correlation of ad spend and ranking without causation.

In other words, the perceived correlation could be on account of external factors that affected both variables.  The only way to eliminate those external factors would be to randomly allocate advertising spend and then see if there still was a correlation.  If a correlation could still be established, there would then be a strong case for saying that the relation was one of causation and not correlation.

But even if there were no evil intention (no causation), the fact remains that this ranking pattern is unfair to Flipkart, though unintendedly.

In other words, Amazon’s search rankings (if my theory as to why it ranked higher is right), might have received a boost from buyer behaviour in a geography where its competitor Flipkart does not operate.

So, Flipkart’s complaint of unfairness might indeed be warranted.

Bias Prevention

In any case, this illustrates the importance of a new area of study – the study of bias in algorithms.

The article linked to above says:

Venkatasubramanian’s research revealed that you can use a test to determine if the algorithm in question is possibly biased. If the test—which ironically uses another machine-learning algorithm—can accurately predict a person’s race or gender based on the data being analyzed, even though race or gender is hidden from the data, then there is a potential problem for bias based on the definition of disparate impact.

Search bias was also described as a cause for concern by Brin and Page in their paper on Google written during their days at Stanford:

The paper says, and I quote none other than Sergey Brin and Lawrence Page:

The goals of the advertising business model do not always correspond to providing quality search to users.  … we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

Since it is very difficult even for experts to evaluate search engines, search engine bias is particularly insidious. A good example was OpenText, which was reported to be selling companies the right to be listed at the top of the search results for particular queries.  This type of bias is much more insidious than advertising, because it is not clear who “deserves” to be there, and who is willing to pay money to be listed. This business model resulted in an uproar, and OpenText has ceased to be a viable search engine. But less blatant bias are likely to be tolerated by the market. For example, a search engine could add a small factor to search results from “friendly” companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market.

Other interesting articles on the subject of search result bias:


Using linguistic clues and feature engineering in investigating the death of Jagendra Singh

courtesy of Wikimedia

There was an article in the news recently about a journalist named Jagendra who died of burns in North India.

The man had either been set on fire by policemen who had come to his house on behalf of a corrupt minister, or set himself on fire in their presence.

The policemen insist that the victim set himself on fire when they were at his house.

The victim, in his dying declaration, said that the cops had set him on fire.

The Facts

The undisputed facts of the case seem to be that:

a) the cops had gone to the journalist’s house to ask him to stop investigating a minister’s murky land deals (of which there seem to be many in this Northern state of India)

b) on the roof of his house, in the presence of the cops, the journalist was doused (with petrol?) and set on fire


What is disputed is whether the cops set the victim on fire, or if he did so himself.

Police Version

The police investigators seem to have arrived at the latter conclusion in their forensic report, dismissing the case as one of suicide, and letting the murder accused off the hook.


What I will attempt to show in this article is that the opposite conclusion could have been arrived at just as easily.

I will also argue that linguistic theory supports the victim’s claims and not the cops’.

Bad Assumptions

The forensic report prepared by the police investigators noted that:

there were “more wounds on the left side of Jagendra’s abdomen, just below his chest.”

“This indicated that he had poured kerosene over himself with his right hand. Besides, he sustained burn injuries on lower half of the body which usually is not the case of someone else pours the inflammable fuel on the burnt body,” added the source.

Sources in the forensic team which prepared the report told The Hindu on condition of anonymity that “the lower part of the body is affected only if the person himself pours the fuel on the body. But if a person pours fuel on a other person the upper part of the body gets affected“.

The cops’ assumptions above seem to be bad ones, because a paper in the “Journal of the Euro-Mediterranean Council for Burns and Fire Disasters” titled “Outcomes of patients who commit suicide by burning” says that there is involvement of the upper part of the body when people commit suicide by self-immolation:

The mechanism of the action together with the absence of the will to rescue oneself from the flames leads in most cases to involvement of the face, trunk, and upper extremities as also to frequent inhalation injury.

So, the investigators’ assumptions appear flawed.

Lack of Common Sense

And frankly, the police investigators’ report seems to lack in common sense in many ways.

To someone with common sense, it would seem that the location of burns couldn’t really tell one anything much about whether the burns were self-inflicted or not.

In other words, it would seem that the police were looking at immaterial clues.

Which brings us to a concept from machine learning.

Feature Engineering

There is, in machine learning, a concept called ‘feature engineering’.

If you want to solve a decision problem correctly using machine learning, you have to point out the relevant facts to the machine learning algorithm.

Facts as Features

The relevant facts are called features.  Selecting the right set of facts to use in decision making is called ‘feature engineering’.

Example of Feature Engineering

For example, if I told you that a flag had three horizontal stripes and asked you to decide which country that flag belonged to, you would not be able to decide correctly.

You would need to know the colours of the stripes before you could decide which country the flag belonged to.

Features for Murder/Suicide

Returning to the murder/suicide, when deciding whether the death of the journalist was a case of murder or suicide, the police seem to have used a very weak set of features.

What features could they have used to make a better decision?

The following:

1)  The presence or absence of burn injuries on the hands of the cops

Had the cops merely been witnesses to the burning, and not perpetrators, they would have tried to put out the fire.

Since the victim was on the roof, they would have tried to smother the fire with their own shirts and their hands.

If the cops could demonstrate that they had burnt or singed their hands and shirts, it would add weight to their version of events.

On the other hand, if they could not, one might be more inclined to believe the victim’s.

2)  The use of petrol if it is confirmed

The Wikipedia page on the burning says that petrol was used.

The most common flammable liquid in an Indian home is kerosene, not petrol.  Petrol is something that someone is more likely to come across on the road.

So, if it can be confirmed that the journalist was burnt with petrol, not kerosene, it would lend credence to the version of events in the journalist’s dying declaration.

3)  The burning having taken place on the roof

Had the journalist wanted to kill himself, he could have done so inside his house just as easily as on the roof.

In fact, if he had wanted to malign the cops, he would not have chosen a place where he would not have been seen setting himself on fire by his neighbours.

Since the victim was on his roof when he suffered burn injuries, it seems more likely that he was chased till he was cornered (on the roof).

So, if the location of burning can be proved to be the roof, it would lend credence to the victim’s version of the events.

4)  Whether the cops had purchased petrol on the way

If petrol had been used, and purchased on the way by the cops, it might be possible to get confirmation of the purchase of petrol in a bottle from one or other of the pumps on the way.

If any pump en route could confirm such a purchase, it would lend credence to the victim’s version.

5)  Fingerprints on the container

The flammable liquid would have had to have been stored or carried in a container.  The container would have remained on the site, especially if the cops’ version of the story was true.

Finger-prints could easily be lifted off the container and used to identify the perpetrator.

6)  The journalist thought he had been attacked with kerosene

The journalist in his dying moments, reportedly said: “Why did they have to burn me? If the Minister and his people had something against me, they could have hit me and beaten me, instead of pouring kerosene over me and burning me.”

So, the victim in his dying statement seems to have thought that he was being doused with kerosene.

He would not have mistaken petrol for kerosene if he had purchased it himself (providing petrol was the flammable liquid used).

7)  The journalist asked why

It is also relevant that the victim asked “Why did they have to burn me”.

An inquiry is used by humans when they want to try and make sense of the world (when they want to adjust their mental model to reality).

Had the victim wanted to make people believe in a falsehood, it seems more likely that he would have uttered a false statement instead of a question.

Had the victim been lying, I would linguistically have expected him to have said something to the tune of: “I promise you that these men set me on fire.  They poured petrol on me!”

8)  The motive

Had the motive of the journalist been nothing more than to stick it to the cops, he surely seems to have chosen a bad way to do so.

He could never have known beforehand that he would survive long enough to talk to a magistrate.

The cops, however, having admitted to acting on behalf of the minister, certainly had a motive – to silence the journalist and send a message to others like him.


The cops investigating the murder of the journalist Jagendra seem to have dropped the charges against the accused on very flimsy grounds.

An impartial investigation by someone other than the local cops would be, in this case, more than desirable.

Relevant Articles:






Medical Journal and Other Articles:



Fun With Text – Hacking Text Analytics


I’ve always wondered if there was a way to teach people to cobble together quick and dirty solutions to problems involving natural language, from duct tape, as it were.

Having worked in the field now for a donkey’s years as of 2015, and having taught a number of text analytics courses along the way, I’ve seen students of text analysis stumble mostly on one of two hurdles:

1.  Inability to Reduce Text Analytics Problems to Machine Learning Problems

I’ve seen students, after hours of training, still revert to rule-based thinking when asked to solve new problems involving text.

You can spend hours teaching people about classification and feature sets, but when you ask them to apply their learning to a new task, say segmenting a resume, you’ll hear them very quickly falling back to thinking in terms of programming steps.

Umm, you could write a script to look for a horizontal line, followed by capitalized text in bold, big font, with the words “Education” or “Experience” in it !!!

2.  Inability to Solve the Machine Learning (ML) Problems

Another task that I have seen teams getting hung up on has been solving ML problems and comparing different solutions.

My manager wants me to identify the ‘introduction’ sections.  So, I labelled 5 sentences as introductions.  Then, I trained a maximum entropy classifier with them.  Why isn’t it working?

One Machine Learning Algorithm to Rule Them All

One day, when I was about to give a lecture at Barcamp Bangalore, I had an idea.

Wouldn’t it be fun to try to use just one machine learning algorithm, show people how to code up that algorithm themselves, and then show them how a really large number of text analytics problem (almost every single problem related to the semantic web) could be solved using it.

So, I quickly wrote up a set of problems in order of increasing complexity, and went about trying to reduce them all to one ML problem, and surprised myself!  It could be done!

Just about every text analytics problem related to the semantic web (which is, by far, the most important commercial category) could be reduced to a classification problem.

Moreover, you could tackle just about any problem using just two steps:

a) Modeling the problem as a machine learning problem

Spot the appropriate machine learning problem underlying the text analytics problem, and if it is a classification problem, the relevant categories, and you’ve reduced the text analytics problem to a machine learning problem.

b) Solving the problem using feature engineering

To solve the machine learning problem, you need to coming up with a set of features that allows the machine learning algorithm to separate the desired categories.

That’s it!

Check it out for yourself!

Here’s a set of slides.

It’s called “Fun with Text – Hacking Text Analytics”.