Category: The Best

Mechanical Consciousness

Mankind has attempted for a long time to explain consciousness, one’s awareness  of one’s own existence, of the world we live in, and of the passage of time.  And mankind has further believed for a very long time that consciousness extends beyond death and the destruction of the body.

Most explanations of consciousness have tended to rely on religion, and on philosophical strains associated with religion.  Possibly as a result, there has been a tendency to explain consciousness as being caused by a “soul” which lives on after death and in most traditions gets judged for its actions and beliefs during its time of residence in the body.

In this article, it is proposed that consciousness can have a purely mechanical origin.

The proposal is merely conjecture, but observations that support the conjecture (though they do not prove it) and I hope, render the conjecture plausible, are provided.  The explanatory power of the model is also somewhat explored.

It is also proposed that the working of the human mind is similar to that of many machine learning models in that they share certain limitations.

 

Preliminaries

First, let me define consciousness.  Consciousness of something is the knowledge of the presence or existence of something (of time or of our selves or of the world around us).

I argue that consciousness requires at the very least what we call “awareness” (that is, being able to sense directly or indirectly what one is conscious of).

Claim:  If I were not aware of something, I wouldn’t be conscious of it.

Argument: If all humanity lived underground for all time and never saw the sky, we would not be aware of the existence of the sky either by direct experience or by hearsay.  So, we couldn’t be conscious of it.  So, it is only when we are aware of the existence of something that we are conscious of it.

So, we have established a minimum requirement for consciousness – and that is “awareness” (being able to sense it).

But does consciousness require anything more than awareness?

The ability to reason and to predict behavior are things the human mind is capable of.

But are they required for consciousness?

Claim:  Reasoning is not required for consciousness.

Argument:  I argue that reasoning is not required because one cannot reason about something that one is not aware of the existence or presence of.  So, anything that one reasons about is something that one has registered the presence of in some manner, in other words, that one is conscious of.

Claim:  Prediction of the behavior of something is not required for consciousness.

Argument:  Prediction of the future behaviour of a thing is not possible without observation over time of how that thing behaves.  So observation (and consciousness) precedes prediction.

Yann LeCun argues that “common sense” is the ability to predict how something might behave in the future (if its future state is not completely random).  If we accept that definition, we might say that common sense builds on consciousness, not the other way around.

So, it appears that consciousness (knowledge of the existence of something) requires the bare minimum of awareness through the senses, and does not require reasoning or the ability to predict.

 

Development

The next question to consider is whether awareness constitutes consciousness or if there is more to it.

Claim:  There is more to consciousness than the signals that our senses send to the brain (awareness).

Argument:  The signals sent to the brain are analogous to signals that are present in completely inanimate things.  A camera has a sensor that records images of the outside world.  Even a pin-hole camera senses the outside world upon the wall on which the image of the sensed world is cast.  Even a shadow can be considered to be a “sensing” of the object that casts the shadow.  That does not imply consciousness.  There must be something else in animate “living” things that produces consciousness.

What is that something extra that is over and above what our senses record?

I believe that the extra thing that constitutes consciousness is the ability to create a model of what we sense and remember it (keep it in memory).

By “create a model”, I mean store a representation of what is sensed in some kind of memory so that what is sensed can be reproduced in some medium possibly at a later stage.

The model cannot be reproduced if it is not stored and remembered, so memory is also key to consciousness.

So, consciousness is the creation of a model in memory of what is sensed.

In other words, anything that can sense something in the world and actively create a model of what it senses (be able to reproduce it exactly or inexactly) is conscious.

I will attempt to justify this claim later.

 

Elaboration

So, the claim is that anything – even if it is a machine – that can actively create a model of something that it senses (is aware of) and store it in memory in such a way as to permit retrieval of the model, is conscious of it.

I am not saying that conscious beings are conscious of every aspect of what they sense as soon as they sense it. It can be possible that they sense and temporarily store a lot of things (for humans, for example, that could be every pixel of what we see outside the blind spot) but only model in a more abstract form and store in memory as an abstraction (and in a retrievable form) those parts that they pay attention to.

So it is possible that a conscious being may be conscious of the pixels of a bird outside the window but not conscious of it as a bird (model it in a more abstract form) or of its colour (model its properties) unless the conscious being pays attention to it.

For example, let us say we’re talking of a human.  Let’s say further that the human sees a mountain.

The human senses (sees) the mountain when rays of light scattered by the surface of the mountain or from things upon the mountain enter her or his eye and impinge upon the retina, triggering a chain of chemical reactions that lead to electrical potentials building up that act upon the nerves in the retinal cortex.

Subsequently, the neurons in the optical pathway of the human’s brain fire in such a manner that eventually, various parameters of the mountain come to be represented in the pattern of neural activations in the human’s brain.

We know that the human has modeled the mountain because the human can be asked to draw the mountain on a sheet of paper and will be able to do so.

Now, the human can be conscious of various parameters of the mountain as well.  For example, if the predominant colour of the mountain is represented in those neural activations, then the human is conscious of the predominant colour of the mountain.  For instance, if the human can answer, accurately or inaccurately, a question about the colour of the mountain, the human can be said to have modeled the same.

If the height of the mountain is represented in the neural patterns, then the human is conscious of the height of the mountain.  This can be tested by asking the human to state the height of the mountain.

If the shape of the mountain is vaguely capture in the neural activations so that the human identifies the same with that of a typical mountain, then the human is conscious of the mountain’s shape and that it is a mountain.

This ability to model is not present in what we typically consider an inanimate object.  A pin-hole camera would not actively create a model of what it senses (projects onto the wall) and is therefore not conscious.  Its projection is purely a result of physical phenomena external to it and it has no agency in the creation of the image within it.  So it has no consciousness.

Let’s say we use a digital camera which records the pixels of let’s say a mountain before it.  It can reproduce the mountain pixel by pixel, and so can be said to have a model in its memory of the mountain.  In other words, such a camera is conscious of the pixels of the mountain and everything else in the field of view.  It wouldn’t be conscious of the shapes or sizes or colours or even of the presence of  a mountain in the sense that a human would.

Claim:  Consciousness requires the active acquisition and storage of information from what is sensed.

Argument:  If the “model” is just the result of physical phenomena, say a projected image in a pin-hole camera, then there is no information acquired and stored by the system from what is sensed, and hence no consciousness.

Now, supposing that we were to build a machine of sand that created a representation of the mountain in sand and of the height and colour of the mountain and of the shape of the mountain and of the association of this shape with typical mountain shapes and of every other parameter that the human brain models.

Now, I would argue that this sand machine could be said to be conscious of the mountain in the same way as we are, even though it uses a completely different mechanism to create a model of the mountain.

Claim:  The hypothetical sand machine and a human brain are equivalent

Argument:  Consciousness of something is only dependent on what is modeled, and no on the method of modeling.  So, as long as the parameters of the mountain are modeled in exactly the same way in two systems, they can be said to be conscious of it in the same way.

 

Corollary

We are machines.

 

All right, so that’s a claim as well.

Here are two arguments in support of the claim.

a) Our behaviour in some sensory tasks is similar to that we would expect from machine learning tools called classifiers.

  1. The Himba colour experiment discovered that the Himba tribe of Africa were distinguishing colours differently from the rest of the world. They could not distinguish between blue and green but could distinguish between many shades of green which other humans typically had a hard time telling apart.
  2. People who speak languages that do not have vowel tones have trouble hearing differences in tone. Similarly, people who speak languages where the consonants ‘l’ and ‘r’ are conflated cannot easily tell them apart.

This is typically how a machine learning tool called a classifier behaves.  A classifier needs to be trained on labelled sounds or colours and will learn to recognize only those, and will have a hard time telling other sounds or colours apart.

b) The limitations that our brains reveal when challenged to perform some generative tasks (tasks of imagination) are identical to the limitations that the machine learning tools called classifiers exhibit.

Let me try the experiment on you.   Here’s a test of your imagination.  Imagine a colour that you have never seen before.

Not a mixture of colours, mind you, but a colour that you have never ever seen before.

If you are like most people, you’ll draw a blank.

And that is what a classifier would do too.

So, I would say that the human brain models things like colours or phonemes using some kind of classification algorithm, because it displays the limitations that such algorithms do.

So it is possible that we shall be able to discover by similar experiments on different types of human cognitive functions, that humans are merely machines capable of consciousness (of modeling a certain set of parameters related to what we perceive) and other cognitive functions that define us as human.

 

Further Discussion

People with whom I’ve discussed this sometime ask me if considering consciousness as the process of building a model of something adequately explains feelings, emotions, likes and dislikes and love and longing.

My answer is that it does, at least as far as likes and dislikes go.

A liking of something is a parameter associated with that thing and it is a single-value parameter that can be easily modeled by one or more numbers.

Neural networks can easily represent such numbers (regression models) and so can model likes and dislikes.

As for love and longing, these could result from biological processes and genetic inclinations, but as long as they are experienced, they would have had to be modeled in the human mind, possibly represented by a single number (a single point representation of intensity) or a distributed representation of intensity.  What is felt in these cases would also be modeled as an intensity (represented at a point or in a distributed manner).  One would be conscious of a feeling only when one could sense it and model it.  And the proof that one has modeled it lies in the fact that one can describe it.

So, when  the person becomes conscious of the longing, it is because it has been modeled in their brain.

 

Still Further Discussion

Again, someone asked if machines could ever possibly be capable of truth and kindness.

I suppose the assumption is that only humans are capable of noble qualities such as truth and kindness or that there is something innate in humans which gives rise to such qualities (perhaps gifted to humanity or instilled in them by the divine or the supernatural or earned by souls that attain humanity through the refinement of past lives).

However, there is no need to resort to such theories to explain altruistic qualities such as truthfulness, goodness and kindness.  It is possible to show game theoretically that noble qualities such as trustworthiness would emerge in groups competing in a typical modern economic environment involving a specialization of skills, interdependence and trading.

Essentially the groups that demonstrate less honesty and trustworthiness fail to be competitive against groups that demonstrate higher honesty and trustworthiness and therefore are either displaced by the latter or adopt the qualities that made the latter successful.  So, it is possible to show that the morals taught by religions and noble cultural norms can all be evolved by any group of competing agents.

So, truth and kindness are not necessarily qualities that machines would be incapable of (towards each other).  In fact, these would be qualities they would evolve if they were interdependent and had to trade with each other and organize and collaborate much as we do.

 

Related Work

This is a different definition than the definition used by Max Tegmark in his book “Life 3.0” but his definition of “consciousness” as “subjective experience” confuses it with “sentience” (the ability to feel).

Tegmark also talks about the work of the philosophers David Chalmers and Scott Aaronson, who seem to be approaching the question from the direction of physics – as in we are just particles from food and the atmosphere rearranged, so what arrangement of particles causes consciousness?

I think that is irrelevant.

All we need to ask is “What is the physical system, whatever it is made of, capable of modeling?”

Interestingly, in the book, Tegmark talks about a number of experiences that any theory of consciousness should explain.

Let’s look at some of those.

 

Explanatory Power of this Model

Explaining Abstraction

He talks about how tasks move from the conscious to the unconscious level as we practise them and get good at them.

He points out that when a human reads this, you do not read character by character but word by word.  Why is it that as you improve your reading skills, you are no longer conscious of the letters?

Actually, this can be explained by the theory we just put forth.

When we are learning to read (modeling the text is reading), we learn to model characters when we see a passage of text like this one and read character by character.

But with practice, we learn to model words or phrases at a higher level from passages of text, and direct our attention to the words or phrases because that facilitates reading.

We can chose to direct our attention to the letters and read letter by letter as well, if we so choose.

So, this model can explain attention too.

Attention

The brain is limited in its capacity to process and store information, so the human brain focuses its attention on the parts of the model it has built that are required for the performance of any task.

It can chose to not keep in memory more granular parts of the model once it has built a larger model.  For instance it can choose to not keep in memory the characters if it already has modeled the word.

This also explains phenomena such as “hemineglect” (patients with certain lesions in their brain miss half their field of vision but are not aware of it – so they may not eat food in the left half of their plate since they do not notice it).

We can explain it by saying that the brain has modeled a whole plate from the faulty sensory information provided to it and therefore the user is conscious of a whole plate, but minus the missing information.

Blindsight

Tegmark also talks of the work of Christof Koch and Francis Krick on the “neural correlates of consciousness”.

Koch and Krick performed an experiment where they distracted one eye with flashing images and caused the other eye to miss registering a static image presented to it.

They inferred from this that the retina is not capable of consciousness.

I would counter that by saying that the retina is conscious of the pixels of the images it sees if it constructs models of them (as it does) and stores them.

But if the brain models more abstract properties more useful to the tasks we perform, we focus our attention on those and therefore do not store in the memory the images that are not relevant to the more critical task (the distracting task).

So, I would argue that our consciousness can include models that comes from the retina (if some neural pathway from the retina creates models in memory at the pixel level).

But if our attention decides to focus on and consign to memory better things than what the retina models, it will, and then it will not necessarily model and be conscious of pixels from the retina.

 

Still Other work

Tegmark also talks extensively about the work of Giulio Tononi and his collaborators on something called “integrated information” and the objections to it by Murray Shanahan, but I’ll leave those interested in those theories to refer the work of their authors.

Advertisements

Fraud detection using computers

For a long time, we’ve been interested in using mathematics (and computers) to detect and deter fraud.  It is related to our earlier work on identifying perpetrators of terrorist attacks.  (Yeah, I know it’s not as cool, but it’s some similar math!)

Today, I want to talk about some approaches to detecting fraud that we talked about on a beautiful summer day, in the engineering room at Aiaioo Labs.

That day, in the afternoon, somebody had rung the bell.  A colleague had answered the bell and then come and handed me a sheet of paper, saying that a lady at the door was asking for donations.

The paper bore the letterhead of an organization in a script that I couldn’t read.  However the text in English stated that the bearer was a student collecting money to feed a few thousand refugees living in a refugee camp in Hyderabad (the refugees’ homes had been destroyed in artillery shelling on the India-Pakistan border and that there were a few thousand families without shelter who needed food and medicines urgently).

On the sheet were the names and signatures of about 20 donors who had each donated around 1000 rupees.

Now the problem before us was to figure out if the lady was a genuine student volunteer or a fraudster out to make some quick money.

There was one thing about the document that looked decidedly suspicious.

It was that the amounts donated were all very similar – 1000, 1200, 1300, 1000, 1000, 1000, 1000.

All the numbers had unnaturally high values.

So, I called a friend of mine who came from the place she claimed the refugees (and the student volunteers) were from and asked him to talk to her and tell me if her story checked out.

He spoke to her over the phone for a few minutes and then told me that her story was not entirely true.

She was from the place that she claimed the refugees came from, but she was in fact collecting money for her own family (they had come south because one of them had needed a medical operation and were now collecting money to travel back to their home town).

When we asked her why she had lied, she just shrugged.

We felt it would be fine to help a family in need, so we gave her some money.

However, the whole affair gave us an interesting problem to solve.

How do you tell if a set of numbers is ‘natural’ or if it has been made up by a person intent on making them look natural?

Well, it turns out that statistics can give you the tools to do that.

Method 1

In nature, many processes result in random numbers that follow a certain distribution. And there are standard distributions that almost all numbers found in nature belong to.

For example, on the sheet of paper that the lady had presented, the figures for the money donated should have followed a normal distribution.  There should have been a few high values and a few low values and a lot of the values in the middle.

Since that wasn’t the case I could easily tell that the numbers had been made up.

But you don’t need a human to tell you that.  There are statistical tests that can be done to see if a set of numbers belongs to any expected distribution.

I looked around online and found an article that tells you about methods that can be used to check if a set of numbers belongs to a normal distribution (a distribution that occurs very frequently in nature): http://mathforum.org/library/drmath/view/72065.html

Some of the methods it talks about are the Kolmogorov-Smirnov test, the Chi-square test, the D’Agostino-Pearson test and the Jarque-Bera test.

Details of each can be found at these links (taken from the article):

One common test for normality with which I am personally NOT familiar, is the Kolmogorov-Smirnov test.  The math behind it is very involved, and I would suggest you refer to other resources such as this page

  Wikipedia: Kolmogorov-Smirnov Test
    http://en.wikipedia.org/wiki/Kolmogorov-Smirnov_test 

You can read more about the D'Agostino-Pearson test and get a table that can be used in Excel here:

  Wikipedia: Normality Test
     http://en.wikipedia.org/wiki/User:Xargque#Normality_Test 

 Wikipedia: Jarque-Bera Test
     http://en.wikipedia.org/wiki/Jarque-Bera_test 

One item of note: depending on how your stats program calculates kurtosis, you may or may not need to subtract 3 from kurtosis.

 See: Wikipedia Talk: Jarque-Bera Test
      http://en.wikipedia.org/wiki/Talk:Jarque-Bera_test

On to the next method:

Method 2

Another property of many naturally occurring numbers is that about one third of them start with the number 1 !!!  Surprising isn’t it?!!

Well, it turns out that this applies to population numbers, electricity bills, stock prices and the lengths of rivers.

It applies to all numbers that come from power law distributions (power laws govern the distribution of wealth, connections on facebook, the numbers of speakers of a language, and lot of numbers related to society).

This is called Benford’s law:  http://en.wikipedia.org/wiki/Benford’s_law

(I believe that Benford’s law would have applied to the above case as well – donations would have a power law distribution – if you assumed that all donors donated money proportional to their wealth).

When I read about Benford’s law on Wikipedia (while writing this article), I found that it is already being used for accounting fraud detection.

The Wikipedia says:

Accounting fraud detection

In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford’s Law ought to show up any anomalous results. Following this idea, Mark Nigrini showed that Benford’s Law could be used in forensic accounting and auditing as an indicator of accounting and expenses fraud.[10] In practice, applications of Benford’s Law for fraud detection routinely use more than the first digit.[10]

Method 3

There are also methods that can be used by governments and large organizations to prevent fraud in the issuing of tenders.

More about that in my next article.

In trust we god

in_trust_we_god

Can trust affect the outcome of political events (war), business transactions (pricing) and economic affairs (poverty)?

This is a problem that I’ve been very interested in for many years.

A few years ago I came across papers in economics and game theory that supplied the mathematical tools that we need to analyse such problems.

So, I’ll take each area of interest 1) politics 2) business and 3) economics and explain how trust matters in each case.

1.  Politics

Can the outcome of something like war be determined by trust?

Let’s assume an army of 2 soldiers.

In a war, the benefits to each soldier can be modeled as a bi-matrix (normal-form game) as follows:

soldier 2 fights soldier 2 flees
soldier 1 fights 5, 5
–5, 0
soldier 1 flees 0, -5
0, 0
Normal form or payoff matrix of a 2-player, 2-strategy game

The first of the two numbers in the matrix represents the payoff to soldier 1.

The second of the two numbers in the matrix represents the payoff to soldier 2.

(The soldiers win something (represented by 5 points) if their army wins; they win nothing if their army loses; and they lose their life (represented by -5 points) if they do not flee and their army loses; we assume the army wins if both soldiers do not flee and loses if one or both flee).

If soldier 1 trusts soldier 2 not to flee the battlefield, the best strategy for soldier 1 is to stay and fight as well (since he will then get more benefits than if he flees).

If soldier 1 does not trust soldier 2 to stay on the battlefield (if he suspects that soldier 2 will run away), then the best strategy for soldier 1 is to run away himself (so that he does not remain on the battlefield and get killed).

So, this model shows that if two equal 2 man armies meet on a battlefield, the one whose soldiers trust each other more will win.

2.  Business (Pricing)

There is a very interesting paper by George A. Akerlof (‘The Market for “Lemons”: Quality Uncertainty and the Market Mechanism’).

It tries to explain why the price of a new car in a show room is so much higher than the price of a new car in the second-hand car market.

For example, a car costing $25,000 fresh out of the showroom, might fetch $18,000 if sold as a used car in the used car market.

Akerlof’s paper tries to explain why the price dropped so sharply.

Akerlof suggests that the price drop is a result of the uncertainty surrounding the quality of the car in the used-car market.

A certain percentage of cars in a used-car market will be defective (since anyone can sell a car in an unregulated market, and unscrupulous people would have put defective cars up for sale).

Let’s say 50% of the cars in the used car market are defective.

Now, a person buying a used car a day old will only be prepared to risk paying 50% of the showroom price for the car (because of the 50% chance that the car is worth nothing).

The Price of Trust

This result has the following unintended consequence:

The more a person trusts a seller, the higher the price he will be willing to offer for a car.

I’ll give you an example of that.  (I’m sorry, but this is a bit racist).

When I was a student in North Carolina, and I was looking to buy a used car, I was given the following piece of advice by my fellow students.

They said, “Go for a car that an American is selling because they will tell you about any problems that it has.  Don’t buy a car from an Asian or an Indian unless you know them well.  They won’t tell you if there are any problems.”

I see the same effect even when doing business in India today – a lot of business happens through connections.

Price Sensitivity

It might also explain why Indians are so price sensitive.

Indians are said to be very price-sensitive, preferring the less expensive offerings over more expensive ones that promise better quality (I recall Richard Branson said that at one point while explaining why he didn’t want to enter India).

I think the price sensitivity is a result of Indians not being able to trust promises of higher quality from their countrymen.

Price becomes the only measure that Indian buyers are able to trust to when making a purchasing decision, leading to extreme price-sensitivity in the Indian market.

Hiring and ‘Brain Drain’

Even in hiring, this can have the effect of driving down salaries.

When hiring someone, an Indian firm is likely to offer a lower salary than the market, because they don’t trust in the abilities of the person being hired.

In Akerlof’s paper, he talks about a side-effect of a lack of trust.  He says that good quality cars will just stop being sold on the low-trust markets.

The applies to the job market in India as well:  Indian firms tend to offer lower salaries, which might lead to the best engineers choosing MNCs over Indian firms or leaving Indian shores altogether.

3.  Economics

I’ve described in an earlier blog how man-in-the-middle systems of government can fail to work efficiently if the man-in-the-middle is corrupt.

I’ve described in that post how resources can be wrongly allocated in the presence of corruption.

https://aiaioo.wordpress.com/2013/08/15/who-betrayed-ekalavya-2/

The result of an inefficient allocation of our resources is poverty.

For example, the Indian government has tripled defence spending in the last 10 years – through heavy borrowing – when it is possible to show that we need to allocate whatever money we have to education (see our arguments for that https://aiaioo.wordpress.com/2012/06/04/algorithms-to-combat-poverty/).

World Bank studies (that you can get off an Indian Reserve Bank website) show that corrupt governments spend more on arms (because of how easy it is to hide kickbacks from arms deals) than honest governments.

So, the economic prosperity of a country can be impacted by corruption.

Causes of Corruption

But we can ask a deeper question:  “What causes corruption?”

I’ll try to show right here that it is a lack of trust.

Take for example two players in a bidding war (let’s say that they are bidding for a government contract).

Each has the choice to give a bribe or not to give a bribe.

Player 1 is more likely to give a bribe if player 1 does not trust player 2 to not offer a bribe to the government official.

It’s the same decision matrix that I have used for the case of the 2 soldier army.

So you get it?

Everything depends on trust.

Philosophy

I am probably way out of my depth on this, but the ancient Greeks seem to have had two views on the supreme ideal that man should strive for.

According to the Wikipedia article on Dialectics:

“The Sophists taught arête (Greek: ἀρετή, qualityexcellence) as the highest value, and the determinant of one’s actions in life.”

But there lived in Greece a man who disagreed with that notion:  ”Socrates favoured truth as the highest value, proposing that it could be discovered through reason and logic in discussion: ergo, dialectic.”

But the above models seem to suggest that truth (honesty) results in trust (you know that the guy next to you is honest and won’t lie about the quality of a car or bribe a government official to get ahead of you).

And what the Akerlof paper shows is that trust rewards and promotes quality.

In other words, the two Greek concepts of quality (of the values mankind must uphold for its own good) are probably one and the same.

Related Posts:

1.  Framework for evaluating values

2.  What traffic can reveal about society

3.  Who betrayed Ekalavya?

4.  Can economics change the world?

5.  Is there an algorithm to combat poverty?

6.  Why dance is undervalued

7.  Is 5 very far from 4?

Related Far-out Posts:

1.  Splitting the Truth into Precision and Recall

2.  Does AI have Buddha nature?

[The image in this picture was taken from a circulated Facebook post.  The copyright owner of the image is unknown at this time and if anyone knows him/her I’d like to make sure they’re ok with my using the image and acknowledge them].

Building Machine Learning Models that can help with Customer Service and Supply Chain Management

The Laptop that Stopped Working

One fine day, a couple of months ago, a laptop that we owned stopped working.  We heard 4 beeps coming from the machine at intervals but nothing appeared on the screen.

Customer Service

The service person quickly looked up the symptoms in his knowledge base and informed us that 4 beeps meant a memory error.

I replaced first the two memory modules one by one, but the machine still wouldn’t start.  I tried two spare memory modules that I had in the cupboard but the computer wouldn’t start.

I had a brand new computer with me that used the same type and speed of memory as the one we were fixing.  I pulled out its memory chips and inserted them into the faulty computer, but still no luck.

At that point, the service person told me that it must be the mother board itself that was not working.

Second Attempt at Triage

So the next day, a mother board and some memory arrived at my office.  A little later a field engineer showed up and replaced the mother board.   The computer still wouldn’t start up.

When the field engineer heard 4 beeps, the engineer said it MUST BE THE MEMORY.

Third Attempt at Triage

A few days later, a new set of memory modules arrived.

The engineer returned and tried inserting the new memory in.  Still no luck.  The computer would not start and you could still hear the 4 beeps.

A third set of brand new memory modules and a new mother board were sent over.

Fourth Attempt at Triage

The engineer tried both motherboards and various combinations of memory modules, but still, all you could hear were 4 beeps and the computer would not start.

During one of his attempts to combine memory and motherboards, the engineer noticed that though the computer did not start, it did not beep either.

So, the engineer guessed that it was the screen that was not working.  But just to be safe, he’d ask them to send another motherboard and another set of memory modules to go with it.

Fifth Attempt at Triage

The screen, the third motherboard and the fourth set of memory modules arrived in our office and an engineer spent the day trying various combinations of screens, motherboards and memory modules.

But the man on the phone said: “Sir, 4 beeps means there is something wrong with your memory.  I will have them replaced.”

I had to take out my new laptop’s memory and pop it into the faulty machine to convince the engineer and support staff that replacing the memory would not fix the problem.

All the parts were now sent over – the memory, motherboard, processor, drive, and screen.

Sixth Attempt at Triage

Finally, the field engineer found that when he had replaced the processor, the computer was able to boot up with no problems.

Better Root Cause Analysis

The manufacturer could have spared themselves all that expense, time and effort had they used an expert system that relied on a probabilistic model of the symptoms and their causes.

Such a model would be able to tell, given the symptoms, which component was the most likely to have failed.

Such a model would be able to direct a field engineer to the component or components whose replacement would be most likely to fix the problem.

If the attempted fix did not work, the model would simply update its understanding of the problem and recommend a different course of action.

I will illustrate the process using what is known in the machine learning community as a directed probabilistic graphical model.

Run-Through of Root Cause Analysis 

Let’s say a failure has occurred and there is only one symptom that can be observed: the laptop won’t start and emits 4 beeps.

The first step is to enter this information into the probabilistic graphical model.  From a list of symptoms, we select the ones that we observe (all observed symptoms are represented as yellow circles in this document).

So the following diagram has only one circle (observed symptom). 

Model 1:  The symptom of 4 beeps is modeled in a probabilistic graphical model with a yellow circle as follows:

pgm_1

Now, let’s assume that this symptom can be caused by the failure of memory, the motherboard or the processor.

Model 2:  I can add that information to the predictive model, so that the model now looks like this:

pgm_2

The model captures the belief that the causes of the symptom – processor / memory / motherboard failure are (in the absence of any symptoms) independent of each other.

It also captures the belief that given a symptom like 4 beeps, evidence for one cause will explain away (or decrease the probability of) the other causes.

Once such a model is built, it can tell a field engineer the most probable cause of a symptom, the second most probable cause and so on.

So, the engineer will only have to look at the output of the model’s analysis to know whether he needs to replace one component, or two, and which ones.

When the field engineer goes out and replaces the components, his actions can also be fed into the model.

Model 3:  Below is an extended model into which attempts to fix the problem by replacing the memory can be incorporated.

pgm_3

If a field engineer were to feed into the system the fact that the memory was replaced with a new module and it didn’t fix the problem, the system would be able to immediately figure out that the memory could not be the cause of the problem, and it would suggest the next most probable cause of failure.

Model 4

Finally, in case new memory modules being sent to customers for repairs frequently turned out to be defective, that information could also be added to the model as follows:

pgm_4

Now, if the error rate for new memory modules in the supply chain happens to be high for a particular type of memory, then if memory replacement failed to fix a 4-beep problem, the model would understand that faulty memory could still be the cause of the problem.

Applications to Supply Chain Management

The probabilities of all the nodes adjust themselves all the time and this information can actually be used to detect if the error rates in new memory module deliveries suddenly go up.

Benefits to a Customer Service Process

1.  Formal capture and storage of triage history

2.  Suggestion of cause(s) given the effects (symptoms)

3.  Suggestion of other causes given triage steps performed

What the system will seem to be doing (to the layman):

1.  Recording symptoms

2.  Recommending a course of action

3.  Recording the outcome of the course of action

4.  Recommending next steps

Analysing documents for non-obvious differences

The ease of classification of documents depends on the categories you are looking to classify documents into.

A few days ago, an engineer wrote about a problem where the analysis that needed to be performed on documents was not the most straight-forward.

He described the problem in a forum as follows: “I am working on sub classification. We already crawled sites using focused crawling. So we know domain, broad category for the site. Sometimes site is also tagged with broad category. So I don’t require to predict broad class for individual site. I am interested in sub-classification. For example, I don’t want to find if post is related to sports, politics, cricket etc. I am interested in to find if post is related to Indian cricket, Australia cricket, given that I already know post is related to cricket. Since in cricket post may contains frequent words like runs, six, fours, out,score etc, which are common across all cricket related posts. So I also want to consider rare terms which can help me in sub-classification. I agree that I may also require frequent words for classification. But I don’t want to skip rare terms for classification.

If you’re dealing with categories like sports, politics and finance, then using machine learning for classification is very easy.  That’s because all the nouns and verbs in the document give you clues as to the category that the document belongs to.

But if you’re given a set of categories for which there are few indicators in the text, you end up with no easy way to categorize it.

After spending a few days thinking about it, I realized that something I had learnt in college could be applied to the problem.  It’s a technique called Feature Selection.

I am going to share the reply I posted to the question, because it might be useful to others working on the classification of documents:

You seem to have a data set that looks as follows (letters are categories and numbers are features):

A P 2 4
A Q 2 5
B P 3 4
B Q 3 5

Let’s say the 2s and the 3s are features that occur very frequently in your corpus while the 4s and the 5s are features that occur far less frequently in your corpus.

When you use the ‘bag of words’ model as your feature vector, your classifier will only learn to tell A apart from B (because the 4s and 5s will not matter much to the classifier, being overwhelmed as it is by the 2s and 3s which are far more frequent).

I think that is why you have come to the conclusion that you need to look for rare words to be able to accomplish your goal of distinguishing category P from category Q.

But in reality, perhaps what you need to do is identify all the features like 4 and 5 that might be able to help you distinguish P from Q and you might even find some frequent features that could help you do that (it might turn out that some frequent features might also have a fairly healthy ability to resolve these categories).

So, now the question just boils down to how you would go about finding the set of features that resolves any given categorization scheme.

The answer seems to be something that literature refers to as ‘Feature Selection’.

As the name says, you select features that help you break data points apart in the way you want.

Wikipedia has an article on Feature Selection:

http://en.wikipedia.org/wiki/Feature_selection 

And Mark Hall’s thesis http://www.cs.waikato.ac.nz/~mhall/thesis.pdf seems to be highly referenced.

Mark Hall’s thesis – “A good feature subset is one that contains features highly correlated with (predictive of) the class, yet uncorrelated with (not predictive of) each other.”

To be honest to you, I’d heard about Feature Selection, but never connected it to the problem it solves until now, so I’m just looking up reading material as I write.

Best of luck with it.

Frameworks for evaluating values

I recently came across a very interesting 2001 paper by Daphne Koller that dealt with influence diagrams and how they could be applied to game theory.  I came across the paper while doing some background reading on a talk on decision making in accordance with our core values by a friend of mine, Somik Raha.

Influence diagrams are a formalism (very similar to probabilistic graphical models) that are used for making decisions.

What Somik Raha has attempted to do is come up with a framework for making decisions while also taking one’s values into account (either as constraints or as inputs into the decision model).  To do that he proposes extensions to influence diagrams.

What I found interesting when I thought about Daphne Koller’s work and Somik’s together, is that they could possibly give you a framework to evaluate your values.

Koller’s formalism reduces to a game theoretic model, which can be evaluated to determine the outcome of the decisions made by a group of people.

Plug in a formalism based on Somik’s ideas and you just might be able to create a way to quantity the benefits of values.

The Importance of Values?

I have been thinking a bit about values these days because there has been a horrific gang rape in Delhi, and there have recently been numerous incidents of bad driving where friends of mine have been injured in Bangalore.  Then there is corruption.  Our society seems to be quite happy with inequality and vast differences in the distribution of wealth.  It make me wonder if our values are to blame.

I have often wondered whether some of our problems originate in our value systems and whether the value systems that we consider sacrosanct in India are really very good ones.

Let me take just a couple of values that most Indians would consider to be very good values

  1. Non-violence
  2. Obedience

and let’s discuss them in more detail.

  1.  Non-violence

This value appeals not just to people in India.  You see variants of the value of non-violence appear in Tolstoy’s writings and in Semitic religions, as you can see from the Bible (“turn the other cheek”) and the Quran (“give alms to one who begs from you, even if he comes on the back of a horse”).

The issue with this sort of value is that it makes a person (and those around him/her) extremely vulnerable to injustice.

In India, we restrict the liberty of women – in their choice of clothing, company and lifestyle – for fear that they could be in danger if they violated societal norms.  This shows that none of us want to fight society or cross swords with someone who might make disparaging comments about personal choices.

Moreover, possibly as a result of the value of non-violence, very few Indians if any are taught fighting skills in school.  So, even if a person really wanted to act, say to protect a friend, he or she might not really have the skills to take down an aggressor.

So, instead of protecting and standing up for people who might be vulnerable, we become their tormentors and make their lives more miserable, just so we don’t have to get our hands dirty, or because we don’t have the skills and strength to do squat.

I’ve written about how bribes are openly collected by traffic policemen.  It should be very easy to put a stop to such behavior if you’re willing to fight.

If non-violence is not a core value, then how do we protect people from tearing each other to bits?

We could start with a question like:  non-violence for what purpose?  (turning it into an extrinsic value)

If the answer is something like, “so that the weak feel protected”, why not make protecting the weak our core value?

I’d prefer teaching kids values like “Don’t ever turn your back on a bully” rather than values like “Don’t fight anybody, and just come home safe, child!”

2.  Obedience

Indian parents love to boast that their child is “such an obedient child!”

Is that a good thing?

Obedience is different from politeness or respect.  The latter are mutual but the former is one way.

So, the politics of obedience creates a hierarchy of subservience.

In India, Parents expect complete obedience from Children.

The Police expect complete obedience from People.

The Politicians expect complete obedience from Police.

Teachers expect complete obedience from Students.

Managers expect complete obedience from Employees.

The creation of the hierarchy (through expectations of obedience) can be very dangerous in many ways.

1.  It can stifle creativity and problem-solving ability.  There is a bias against ideas flowing up a hierarchy because those higher up the hierarchy claim their place above those required to be obedient to them on the premise that they are somehow superior to those below them.  A good example is how parliament will not accept that people have a right to demand a bill against corruption (members of the Indian parliament claim that parliament is supreme in a parliamentary democracy – not the citizens that the parliamentarians represent).

2.  It can leave young people ill-equipped to defend their personal spaces.  I read in a paper on rape that many rapists approach victims by testing their boundaries.  They make comments and otherwise violate the intended victim’s personal boundaries.  If these are not strongly resisted, the probability of an assault becomes greater.  Another strategy used by rapists is to move their intended victim to a new location where they are more vulnerable. It is very important for people to be conditioned so that they do not obey an order by an attacker to relocate under any circumstances.

3.  The hierarchies perpetuate the power of stronger (bigger, older or richer) parties by providing social sanction to their dominant position, and so hinder social mobility.

4.  The obedience hierarchy could allow a few people at the top to amass too much power. It might have, for example, prevented cops from disobeying those in power during the Gujarat riots.

5.  Obedience means valuing rules above truth.  Obedience implies not challenging the rules or the status quo.  So there is little scope for discovering if the rules really are good ones for everybody.  People often defend something they assert with a “because I said so.” – that is, you are expected to believe them because of their authority, and not because they can substantiate their assertion.

Obedience as an absolute value is not entirely harmless.  It could be dangerous to us as a society because corrupt politicians can use the pliability and obedience of people around them to get away with evil (remember the activist who was hacked to death on the orders of a corporator from Bangalore, the journalist who was burnt to death in Uttar Pradesh, or the shutdown that the former Chief Minister of Karnataka State ordered when he was about to be investigated for corrupt dealings?).

I’d love to replace “obedience” with something else, perhaps “honesty” and “trustworthiness” and “pride”.

Summary

I understand that we as Indians are very proud of our values but I’ve tried to argue that our values need to be re-examined.

Personally, I’d love to see the day when we replace all our values with just the value of trustworthiness.

Trustworthiness as a value would mean we’d fight for each other.  It would mean we’d protect the weak.  It would mean we’d be on time.  It would mean we’d be honest.  It would mean we’d be capable and skilled and strong.  It would mean we’d be proud of each other.  It might mean we’d never lose another war.

Reading Koller’s and Somik’s work you get the feeling that one day you might be able to evaluate the comparative benefits of two sets of values, and pick the better one, using plain math.

And hopefully, by showing them mathematical proofs, you can convince people to change their values and pick better ones for themselves.

How to prevent death and injury in stampedes – Part 2

Image

We made an assumption in my previous post on a technique for preventing stampedes … that the force that can result from a crowd of people falling increases with the slope and with the number of people up above that point.

If this assumption is valid, there are simple techniques for preventing deaths from stampedes.

One of the simplest techniques, (courtesy of Saravanan – Microsoft Research in Bangalore), is a path designed as shown in the above image.

There are horizontal sections to the path at regular intervals.  These sections ensure that the weight of people falling on anyone would be limited to the length of the path between two horizontal sections.

It’s a simple mechanism to prevent death from crushing during stampedes on sloping ground.

It turns out that the stampede at the Ganga last week happened on flat ground.  People seem to have died from being trampled over rather than from the pressure of a mass of people falling or leaning together.

Well, I can’t think of a fool-proof solution for being trampled just yet, but I believe that some day someone will find a way.

What is interesting is that so many problems have solutions, but people need to care enough about each other to act on them.

Thinking up solutions is one thing.  Developing a society where people care is quite another.