Using linguistic clues and feature engineering in investigating the death of Jagendra Singh

courtesy of Wikimedia

There was an article in the news recently about a journalist named Jagendra who died of burns in North India.

The man had either been set on fire by policemen who had come to his house on behalf of a corrupt minister, or set himself on fire in their presence.

The policemen insist that the victim set himself on fire when they were at his house.

The victim, in his dying declaration, said that the cops had set him on fire.

The Facts

The undisputed facts of the case seem to be that:

a) the cops had gone to the journalist’s house to ask him to stop investigating a minister’s murky land deals (of which there seem to be many in this Northern state of India)

b) on the roof of his house, in the presence of the cops, the journalist was doused (with petrol?) and set on fire

Disputed

What is disputed is whether the cops set the victim on fire, or if he did so himself.

Police Version

The police investigators seem to have arrived at the latter conclusion in their forensic report, dismissing the case as one of suicide, and letting the murder accused off the hook.

Overview

What I will attempt to show in this article is that the opposite conclusion could have been arrived at just as easily.

I will also argue that linguistic theory supports the victim’s claims and not the cops’.

Bad Assumptions

The forensic report prepared by the police investigators noted that:

there were “more wounds on the left side of Jagendra’s abdomen, just below his chest.”

“This indicated that he had poured kerosene over himself with his right hand. Besides, he sustained burn injuries on lower half of the body which usually is not the case of someone else pours the inflammable fuel on the burnt body,” added the source.

Sources in the forensic team which prepared the report told The Hindu on condition of anonymity that “the lower part of the body is affected only if the person himself pours the fuel on the body. But if a person pours fuel on a other person the upper part of the body gets affected“.

The cops’ assumptions above seem to be bad ones, because a paper in the “Journal of the Euro-Mediterranean Council for Burns and Fire Disasters” titled “Outcomes of patients who commit suicide by burning” says that there is involvement of the upper part of the body when people commit suicide by self-immolation:

The mechanism of the action together with the absence of the will to rescue oneself from the flames leads in most cases to involvement of the face, trunk, and upper extremities as also to frequent inhalation injury.

So, the investigators’ assumptions appear flawed.

Lack of Common Sense

And frankly, the police investigators’ report seems to lack in common sense in many ways.

To someone with common sense, it would seem that the location of burns couldn’t really tell one anything much about whether the burns were self-inflicted or not.

In other words, it would seem that the police were looking at immaterial clues.

Which brings us to a concept from machine learning.

Feature Engineering

There is, in machine learning, a concept called ‘feature engineering’.

If you want to solve a decision problem correctly using machine learning, you have to point out the relevant facts to the machine learning algorithm.

Facts as Features

The relevant facts are called features.  Selecting the right set of facts to use in decision making is called ‘feature engineering’.

Example of Feature Engineering

For example, if I told you that a flag had three horizontal stripes and asked you to decide which country that flag belonged to, you would not be able to decide correctly.

You would need to know the colours of the stripes before you could decide which country the flag belonged to.

Features for Murder/Suicide

Returning to the murder/suicide, when deciding whether the death of the journalist was a case of murder or suicide, the police seem to have used a very weak set of features.

What features could they have used to make a better decision?

The following:

1)  The presence or absence of burn injuries on the hands of the cops

Had the cops merely been witnesses to the burning, and not perpetrators, they would have tried to put out the fire.

Since the victim was on the roof, they would have tried to smother the fire with their own shirts and their hands.

If the cops could demonstrate that they had burnt or singed their hands and shirts, it would add weight to their version of events.

On the other hand, if they could not, one might be more inclined to believe the victim’s.

2)  The use of petrol if it is confirmed

The Wikipedia page on the burning says that petrol was used.

The most common flammable liquid in an Indian home is kerosene, not petrol.  Petrol is something that someone is more likely to come across on the road.

So, if it can be confirmed that the journalist was burnt with petrol, not kerosene, it would lend credence to the version of events in the journalist’s dying declaration.

3)  The burning having taken place on the roof

Had the journalist wanted to kill himself, he could have done so inside his house just as easily as on the roof.

In fact, if he had wanted to malign the cops, he would not have chosen a place where he would not have been seen setting himself on fire by his neighbours.

Since the victim was on his roof when he suffered burn injuries, it seems more likely that he was chased till he was cornered (on the roof).

So, if the location of burning can be proved to be the roof, it would lend credence to the victim’s version of the events.

4)  Whether the cops had purchased petrol on the way

If petrol had been used, and purchased on the way by the cops, it might be possible to get confirmation of the purchase of petrol in a bottle from one or other of the pumps on the way.

If any pump en route could confirm such a purchase, it would lend credence to the victim’s version.

5)  Fingerprints on the container

The flammable liquid would have had to have been stored or carried in a container.  The container would have remained on the site, especially if the cops’ version of the story was true.

Finger-prints could easily be lifted off the container and used to identify the perpetrator.

6)  The journalist thought he had been attacked with kerosene

The journalist in his dying moments, reportedly said: “Why did they have to burn me? If the Minister and his people had something against me, they could have hit me and beaten me, instead of pouring kerosene over me and burning me.”

So, the victim in his dying statement seems to have thought that he was being doused with kerosene.

He would not have mistaken petrol for kerosene if he had purchased it himself (providing petrol was the flammable liquid used).

7)  The journalist asked why

It is also relevant that the victim asked “Why did they have to burn me”.

An inquiry is used by humans when they want to try and make sense of the world (when they want to adjust their mental model to reality).

Had the victim wanted to make people believe in a falsehood, it seems more likely that he would have uttered a false statement instead of a question.

Had the victim been lying, I would linguistically have expected him to have said something to the tune of: “I promise you that these men set me on fire.  They poured petrol on me!”

8)  The motive

Had the motive of the journalist been nothing more than to stick it to the cops, he surely seems to have chosen a bad way to do so.

He could never have known beforehand that he would survive long enough to talk to a magistrate.

The cops, however, having admitted to acting on behalf of the minister, certainly had a motive – to silence the journalist and send a message to others like him.

Conclusion

The cops investigating the murder of the journalist Jagendra seem to have dropped the charges against the accused on very flimsy grounds.

An impartial investigation by someone other than the local cops would be, in this case, more than desirable.

Relevant Articles:

1.  http://www.thehindu.com/news/national/other-states/up-scribe-death-forensic-report-gives-clean-chit-to-accused-minister/article7350108.ece

2.  http://www.outlookindia.com/article/breaking-the-news-brutally/294697

3.  http://www.firstpost.com/politics/death-journalist-ensure-justice-jagendra-singh-akhilesh-yadav-must-sack-minister-2294366.html

4.  http://www.nytimes.com/2015/06/12/world/asia/indian-journalist-who-linked-official-to-graft-dies.html

5.  http://www.thehindu.com/opinion/op-ed/for-regional-journalists-its-a-fight-for-survival/article7364281.ece

Medical Journal and Other Articles:

1.  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3741006/

2.  http://medind.nic.in/jal/t13/i1/jalt13i1p44.pdf

Posted in Uncategorized | Tagged , , , , , , , , | Leave a comment

Fun With Text – Hacking Text Analytics

hacking_text_analytics

I’ve always wondered if there was a way to teach people to cobble together quick and dirty solutions to problems involving natural language, from duct tape, as it were.

Having worked in the field now for a donkey’s years as of 2015, and having taught a number of text analytics courses along the way, I’ve seen students of text analysis stumble mostly on one of two hurdles:

1.  Inability to Reduce Text Analytics Problems to Machine Learning Problems

I’ve seen students, after hours of training, still revert to rule-based thinking when asked to solve new problems involving text.

You can spend hours teaching people about classification and feature sets, but when you ask them to apply their learning to a new task, say segmenting a resume, you’ll hear them very quickly falling back to thinking in terms of programming steps.

Umm, you could write a script to look for a horizontal line, followed by capitalized text in bold, big font, with the words “Education” or “Experience” in it !!!

2.  Inability to Solve the Machine Learning (ML) Problems

Another task that I have seen teams getting hung up on has been solving ML problems and comparing different solutions.

My manager wants me to identify the ‘introduction’ sections.  So, I labelled 5 sentences as introductions.  Then, I trained a maximum entropy classifier with them.  Why isn’t it working?

One Machine Learning Algorithm to Rule Them All

One day, when I was about to give a lecture at Barcamp Bangalore, I had an idea.

Wouldn’t it be fun to try to use just one machine learning algorithm, show people how to code up that algorithm themselves, and then show them how a really large number of text analytics problem (almost every single problem related to the semantic web) could be solved using it.

So, I quickly wrote up a set of problems in order of increasing complexity, and went about trying to reduce them all to one ML problem, and surprised myself!  It could be done!

Just about every text analytics problem related to the semantic web (which is, by far, the most important commercial category) could be reduced to a classification problem.

Moreover, you could tackle just about any problem using just two steps:

a) Modeling the problem as a machine learning problem

Spot the appropriate machine learning problem underlying the text analytics problem, and if it is a classification problem, the relevant categories, and you’ve reduced the text analytics problem to a machine learning problem.

b) Solving the problem using feature engineering

To solve the machine learning problem, you need to coming up with a set of features that allows the machine learning algorithm to separate the desired categories.

That’s it!

Check it out for yourself!

Here’s a set of slides.

It’s called “Fun with Text – Hacking Text Analytics”.

Posted in Uncategorized | Tagged , , , , , , , , , , , , | Leave a comment

Why the example of a Nash equilibrium in the movie “A Beautiful Mind” is incorrect

I was reading an article today about the mathematician John Nash, whose life the movie “A Beautiful Mind” was based on.

The article contained a link to a clipping from the movie that, it said, explained the game theoretic concept of a Nash equilibrium.

In the clip, Nash and his three friends are at a bar and have to make a choice.

They can go and speak to the four brunettes at the bar, or they can all go to talk to the lone blonde, whom they all like better.

Nash explains to his friends that if they all went to speak to the blonde, she would be put off by all the attention and turn them all down.

But once the blonde turned them down, the brunettes would too, since no one wants to be someone’s second choice (and so they would all lose).

So, Nash convinces his friends to ignore the blonde and speak to one of the brunettes each (so that they would all win).

The strategy of ignoring the blonde, the movie suggests, results in a Nash equilibrium.

However, that turns out to be incorrect.

The strategies adopted by the four men do not result in a Nash equilibrium.

A Nash equilibrium is only obtained when all players adopt a strategy where no single player, by changing his strategy, can obtain a better outcome.

That is obviously not true in this case.

Any one of the four friends, by reneging on their deal, might get to go home with the blonde (a better deal).

nash_equilibrium_2

So, the strategy of going after the second choice does not satisfy the conditions for a Nash equilibrium.

A Nash equilibrium is really only obtained when all the men follow the strategy of going after the blonde (in vain).

The mistake has also been pointed out by others: http://math.stackexchange.com/questions/853988/is-the-nash-equilibrium-example-in-a-beautiful-mind-accurate

There is a better explanation of the Nash equilibrium in the video I shared in an earlier blog post:  https://aiaioo.wordpress.com/2013/01/26/what-game-theory-says-about-why-gas-stations-are-built-next-to-each-other/

Professor Nash passed away a few days ago in a car crash.

Posted in Uncategorized | Leave a comment

Meet us at Barcamp Bangalore 2015: Fun With Text – Natural Language Processing For Hackers

We’re going to be at Barcamp Bangalore.  You can come and meet us at our session “Fun With Text” which is a workshop on text analytics for hackers.

We’re actually going to be trying something a bit crazy at this session.

We’ll start by going over an extremely simple machine learning algorithm.

And then we’re going to go about showing people how almost everything you could ever want to do with text can be done using only that algorithm.

All programmers welcome. Refresh your basic probability theory before you come!

Here’s the Barcamp link:  https://barcampbangalore.org/bcb/spring-2015/fun-with-text-natural-language-processing-for-hackers

Posted in Uncategorized | Leave a comment

The purchase of nuclear reactors and the price

On 11th December, 2014 India agreed to buy 12 nuclear reactors from Russia for $40 billion.  India is also about to buy 6 reactors from the USA for $50 billion. The reactors referred to above are both 1 GW reactors.

So, each Russian 1 GW reactor costs about $3 billion.  Each American 1 GW reactor costs about $8 billion.

In comparison, Indian nuclear reactors cost between $360 million and $500 million each.

The two 540 MW reactors at Tarapur cost about $500 million each.  The four 220 MW Indian reactors at Kaiga cost $360 million each.

So, the cost for adding 1 GW of capacity is about $1.5 billion if you use Indian reactor technology.

It is between $3 billion and $7 billion if you buy reactors from Russia or from the USA.

If Indian reactors are of comparable price or cheaper, why then is India paying so much money to import reactors?

Are Indian Suppliers Unable to Build Reactors Fast Enough?

One possible answer is that the reactors (built by the Indian government’s nuclear agencies) are not being built fast enough.

However, it appears the Indian nuclear agency can build 540 MW reactors very fast indeed.  The one at Tarapur was apparently completed in 4 years and 10 months.

In comparison, the AP1000 reactors that Westinghouse designs seem to take up to 10 years to build.

Are Indian Reactors Unsafe?

Another possible reason for buying reactors from large external vendors could be that Indian reactors are not as safe as those from other suppliers.

However, the IAEA inspected the 220 MW reactors built in India (those costing $360 million each) and concluded that they were among the safest in the world and could withstand the type of natural disaster that caused the accident at Fukushima.

On the other hand, GE seems to have turned a blind eye to weaknesses in its containment structures at Fukushima even though the flaws had been identified 40 years ago.

If Indian reactors are cheaper, faster to build, and safer, then why exactly did India agree to purchase nuclear reactors from outside India at such a huge markup?

One possibility is that market forces are not the only factor driving the reactor purchases:

Possible Political Compulsions

Obama’s presidential campaign was possibly funded in part by energy firms.  So, it is possible that he is looking to help campaign donors.

But why would the Prime Minister of India play along?

It is possible that the same economic forces that come to bear on President Obama also play a part in Indian elections.  In the Indian election campaigns last year, the winning team spent twice what President Obama’s campaign spent and 75% of the money in the political parties’ war chests came from unknown sources.

So, there is a need for greater transparency and due diligence.

There is one more puzzling fact to consider.

Competitive bidding has not been used in the matter of nuclear reactor purchases.

The nuclear reactors have all been purchased in a manner reminiscent of the coal allocation scam – without any competitive bidding whatsoever.

So, who loses?

The Indian and American Taxpayers

I am going to go out on a limb and say that both Indian and American taxpayers stand to lose out in case this deal between the US and Indian governments has a corrupt angle to it.

How Indian Taxpayers Will Lose Out

Indian taxpayers will lose out because they will be paying approximately $100 billion for the 40 reactors that will be constructed.  $100 billion is about the size of the last bailout package for Greece.  It’s a large sum of money that the Indian government cannot afford.

An article on the Modi government’s purchase of 6 submarines last year for $12 billion hits the nail right on the head:

“According to the World Bank, India has the world’s largest share of people living on $1.25 a day or less. Currently, 400 million Indians live in extreme poverty, and that number will not decrease without prudent policy-making. Reducing poverty requires a degree of social spending and government intervention, and a government willing to spend billions on naval ships before addressing extreme poverty is telling of the government’s priorities.”

How American Taxpayers Might Lose Out

Trickle-down economics will have you believe that anything done to help large firms like GE and Westinghouse will also help the poorer sections of society in the USA.

But, if it is reasonable to suppose that wealth trickles down, it is also, I would argue, reasonable to suppose that poverty trickles up.

I am going to outline in the following paragraph one mechanism by which poverty might trickle up.

Can Poverty Trickle Up?

$100 billion is about half the size of the Indian central government’s income (tax revenue is about $180 billion annually).

If earnings in India drop because of decreased welfare spending, or a depreciating rupee, or lower salaries, more jobs could move to India and hurt American job seekers.

So, there is a way in which poverty can trickle across geopolitical boundaries.

Posted in Uncategorized | Tagged , , , , , , , , | 2 Comments

Is Kejriwal’s proposal to install 1.5 million CCTV cameras feasible?

“Surveillance video cameras, Gdynia” by Paweł Zdziarski – Own work. Licensed under CC BY 2.5 via Wikimedia Commons

A young political party that is contesting the Delhi elections next week (the Aam Aadmi Party, headed by Arvind Kejriwal) has made a promise to install 1 million to 1.5 million CCTV cameras all over Delhi to promote women’s security.

In comparison, the number of government-owned security cameras in the United Kingdom is only 70,000.

So, is the proposal a feasible one?

Well, let’s see.

How many people would be required to monitor 1.5 million cameras around the clock, 24 by 7?

Assuming that one person can monitor 100 cameras, 15,000 people would be needed to monitor the cameras at any given time.

But considering that a typical work day is 8 hours, 3 times that number would have to be employed, working in 3 shifts.

So, a total of 45,000 people would be needed to monitor the cameras.

In comparison, Delhi police only has a sanctioned strength of 80,000 personnel.

Well, is there a better solution?

Alternative 1 – Surveillance of Hotspots

It appears that analytics can be used to identify crime hotspots so that the hotspots alone can be monitored with a much smaller number of security cameras.

In her report titled ‘predictive policing’, Dr. Jennifer Bachner of John Hopkins writes about the Santa Cruz Police Department’s (SCPD) crime prevention program as follows:

The core of the SCPD program is the continuous identification of areas that are expected to experience increased levels of crime in a specified time-frame.  A computer algorithm draws upon a database of past criminal incidents to assign probabilities of crime occurring to 150 by 150 meter square cells on a map of Santa Cruz.  The database includes the time, location and type of each crime committed.

In the calculations of probabilities, more recent crimes are given greater weight.  The program then generates a map that highlights the 15 cells with the highest probabilities.  Prior to their shifts, officers are briefed on the locations of these 15 cells and encouraged to devote extra time to monitoring these areas.  During their shifts, officers can log into the web-based system to obtain updated, real-time, hot-spot maps.

So, by using analytics to calculate the suitable positioning of surveillance cameras, it might be possible to reduce the number of cameras required to a more manageable amount.

Alternative 2 – Self Surveillance

Phones with powerful cameras are available cheaply these days.  A watchdog app which lets a traveller at night register their source and destination addresses and to upload photographs of their conveyance would be a great way to promote safety.

The app could monitor a traveller’s route and alert the traveller if there was a serious deviation, especially towards any crime hotspots.

The app could let the user alert someone if something bad seemed about to happen.

The app could periodically check on the traveller till they reached the destination, and alert authorities if the traveller did not respond within a specified number of minutes.

Posted in Uncategorized | Tagged , , , , , , , , , , | Leave a comment

Kashmir Floods 2014: Designs For Simple Homemade Boats

I was watching reports on the Kashmir floods on television today.  Reporters in Srinagar went around a few neighborhoods that had been partially submerged in the waters of the Jhelum river and talked to people living there.

The impression I got was that there was no longer any serious danger to life and limb in Srinagar, and that the problem there was now merely one of logistics.

Many families (I believe 700,000 people) had decided to remain in their partially submerged homes, on higher floors that the waters wouldn’t reach, and now had no way to procure food and water for themselves because the roads were impassable and because the phone lines were down.

It seemed to me that these people could procure food and water if they could put together makeshift boats to get around in.

So, I started thinking of ways to build boats with household materials.

After thinking about the problem a little, I hit upon a simple boat design that anybody with a bed and some waterproofing material can construct.

Bed Frame Boat

Step 1:  Get hold of a light bed (a good option would be a steel-framed folding bed) such as the one in the image below:

beds_1

Step 2:  Turn the bed upside down as shown in the image below and if it is a folding bed, connect the legs with supporting rods.  If the bed is a stiff wooden bed, it might be possible to merely link the legs of the bed together with rope.

beds_2

Step 3:  Cover the bed with a waterproof tarpaulin sheet (such as the sheets you’d cover your car with).

beds_3

Step 4:  Optionally attach floats to the ends of the bed (to prevent sinking of the tarp tears).

beds_4

Each boat should easily carry one or two people.

The boats can be roped together to make larger craft that can act as taxis.

beds_5

We shall have to try this design out and make sure it works in practice because the forces on the tarpaulin could tear the fabric.  So the raft would work only if the tarps were strong enough to withstand the forces on the sides and bottom of the craft.

Another very simple design, that would make for a far more robust craft would be a barrel raft, shown in the image below.  It’s just a bunch of barrels tied together.

Barrel Raft

barrel_raftA barrel raft would be more resistant to debris and to sharp objects holing the bottom, but unless the barrels are broad enough to sit in, it would be less comfortable, and also less stable.

Barrels are going to be a bit unwieldy so the best design we came up with ultimately was the Jerry Can Raft.

Jerry Can Raft

The idea came from Dwiji Guru, a friend of mine who is a physicist and consults on design involving physics and on policy in Bangalore.  He figured out that a craft displacing 10 litres of water can carry 10 kilograms of weight on water.  So, ten 20 litre cans submerged to half their volume can carry a weight of 100 Kg.

Cans like these http://shardacontainers.com/narrow-mouth-rectangular.htm are pretty tough and have a handle at the top.

We figured that you can tie 12 cans with a long rope in pairs with their handles together along the center.  Then tie one stiffening rod along the outside edges and two spacers at the front and back, and you’ll have a roughly 1 m by 2 m raft.

This raft would be slim and light and almost impossible to sink.

Posted in Uncategorized | Tagged , , , , , , , , , , , | Leave a comment