Tag: ai

Mechanical Consciousness

Mankind has attempted for a long time to explain consciousness, one’s awareness  of one’s own existence, of the world we live in, and of the passage of time.  And mankind has further believed for a very long time that consciousness extends beyond death and the destruction of the body.

Most explanations of consciousness have tended to rely on religion, and on philosophical strains associated with religion.  Possibly as a result, there has been a tendency to explain consciousness as being caused by a “soul” which lives on after death and in most traditions gets judged for its actions and beliefs during its time of residence in the body.

In this article, it is proposed that consciousness can have a purely mechanical origin.

The proposal is merely conjecture, but observations that support the conjecture (though they do not prove it) and I hope, render the conjecture plausible, are provided.  The explanatory power of the model is also somewhat explored.

It is also proposed that the working of the human mind is similar to that of many machine learning models in that they share certain limitations.

 

Preliminaries

First, let me define consciousness.  Consciousness of something is the knowledge of the presence or existence of something (of time or of our selves or of the world around us).

I argue that consciousness requires at the very least what we call “awareness” (that is, being able to sense directly or indirectly what one is conscious of).

Claim:  If I were not aware of something, I wouldn’t be conscious of it.

Argument: If all humanity lived underground for all time and never saw the sky, we would not be aware of the existence of the sky either by direct experience or by hearsay.  So, we couldn’t be conscious of it.  So, it is only when we are aware of the existence of something that we are conscious of it.

So, we have established a minimum requirement for consciousness – and that is “awareness” (being able to sense it).

But does consciousness require anything more than awareness?

The ability to reason and to predict behavior are things the human mind is capable of.

But are they required for consciousness?

Claim:  Reasoning is not required for consciousness.

Argument:  I argue that reasoning is not required because one cannot reason about something that one is not aware of the existence or presence of.  So, anything that one reasons about is something that one has registered the presence of in some manner, in other words, that one is conscious of.

Claim:  Prediction of the behavior of something is not required for consciousness.

Argument:  Prediction of the future behaviour of a thing is not possible without observation over time of how that thing behaves.  So observation (and consciousness) precedes prediction.

Yann LeCun argues that “common sense” is the ability to predict how something might behave in the future (if its future state is not completely random).  If we accept that definition, we might say that common sense builds on consciousness, not the other way around.

So, it appears that consciousness (knowledge of the existence of something) requires the bare minimum of awareness through the senses, and does not require reasoning or the ability to predict.

 

Development

The next question to consider is whether awareness constitutes consciousness or if there is more to it.

Claim:  There is more to consciousness than the signals that our senses send to the brain (awareness).

Argument:  The signals sent to the brain are analogous to signals that are present in completely inanimate things.  A camera has a sensor that records images of the outside world.  Even a pin-hole camera senses the outside world upon the wall on which the image of the sensed world is cast.  Even a shadow can be considered to be a “sensing” of the object that casts the shadow.  That does not imply consciousness.  There must be something else in animate “living” things that produces consciousness.

What is that something extra that is over and above what our senses record?

I believe that the extra thing that constitutes consciousness is the ability to create a model of what we sense and remember it (keep it in memory).

By “create a model”, I mean store a representation of what is sensed in some kind of memory so that what is sensed can be reproduced in some medium possibly at a later stage.

The model cannot be reproduced if it is not stored and remembered, so memory is also key to consciousness.

So, consciousness is the creation of a model in memory of what is sensed.

In other words, anything that can sense something in the world and actively create a model of what it senses (be able to reproduce it exactly or inexactly) is conscious.

I will attempt to justify this claim later.

 

Elaboration

So, the claim is that anything – even if it is a machine – that can actively create a model of something that it senses (is aware of) and store it in memory in such a way as to permit retrieval of the model, is conscious of it.

I am not saying that conscious beings are conscious of every aspect of what they sense as soon as they sense it. It can be possible that they sense and temporarily store a lot of things (for humans, for example, that could be every pixel of what we see outside the blind spot) but only model in a more abstract form and store in memory as an abstraction (and in a retrievable form) those parts that they pay attention to.

So it is possible that a conscious being may be conscious of the pixels of a bird outside the window but not conscious of it as a bird (model it in a more abstract form) or of its colour (model its properties) unless the conscious being pays attention to it.

For example, let us say we’re talking of a human.  Let’s say further that the human sees a mountain.

The human senses (sees) the mountain when rays of light scattered by the surface of the mountain or from things upon the mountain enter her or his eye and impinge upon the retina, triggering a chain of chemical reactions that lead to electrical potentials building up that act upon the nerves in the retinal cortex.

Subsequently, the neurons in the optical pathway of the human’s brain fire in such a manner that eventually, various parameters of the mountain come to be represented in the pattern of neural activations in the human’s brain.

We know that the human has modeled the mountain because the human can be asked to draw the mountain on a sheet of paper and will be able to do so.

Now, the human can be conscious of various parameters of the mountain as well.  For example, if the predominant colour of the mountain is represented in those neural activations, then the human is conscious of the predominant colour of the mountain.  For instance, if the human can answer, accurately or inaccurately, a question about the colour of the mountain, the human can be said to have modeled the same.

If the height of the mountain is represented in the neural patterns, then the human is conscious of the height of the mountain.  This can be tested by asking the human to state the height of the mountain.

If the shape of the mountain is vaguely capture in the neural activations so that the human identifies the same with that of a typical mountain, then the human is conscious of the mountain’s shape and that it is a mountain.

This ability to model is not present in what we typically consider an inanimate object.  A pin-hole camera would not actively create a model of what it senses (projects onto the wall) and is therefore not conscious.  Its projection is purely a result of physical phenomena external to it and it has no agency in the creation of the image within it.  So it has no consciousness.

Let’s say we use a digital camera which records the pixels of let’s say a mountain before it.  It can reproduce the mountain pixel by pixel, and so can be said to have a model in its memory of the mountain.  In other words, such a camera is conscious of the pixels of the mountain and everything else in the field of view.  It wouldn’t be conscious of the shapes or sizes or colours or even of the presence of  a mountain in the sense that a human would.

Claim:  Consciousness requires the active acquisition and storage of information from what is sensed.

Argument:  If the “model” is just the result of physical phenomena, say a projected image in a pin-hole camera, then there is no information acquired and stored by the system from what is sensed, and hence no consciousness.

Now, supposing that we were to build a machine of sand that created a representation of the mountain in sand and of the height and colour of the mountain and of the shape of the mountain and of the association of this shape with typical mountain shapes and of every other parameter that the human brain models.

Now, I would argue that this sand machine could be said to be conscious of the mountain in the same way as we are, even though it uses a completely different mechanism to create a model of the mountain.

Claim:  The hypothetical sand machine and a human brain are equivalent

Argument:  Consciousness of something is only dependent on what is modeled, and no on the method of modeling.  So, as long as the parameters of the mountain are modeled in exactly the same way in two systems, they can be said to be conscious of it in the same way.

 

Corollary

We are machines.

 

All right, so that’s a claim as well.

Here are two arguments in support of the claim.

a) Our behaviour in some sensory tasks is similar to that we would expect from machine learning tools called classifiers.

  1. The Himba colour experiment discovered that the Himba tribe of Africa were distinguishing colours differently from the rest of the world. They could not distinguish between blue and green but could distinguish between many shades of green which other humans typically had a hard time telling apart.
  2. People who speak languages that do not have vowel tones have trouble hearing differences in tone. Similarly, people who speak languages where the consonants ‘l’ and ‘r’ are conflated cannot easily tell them apart.

This is typically how a machine learning tool called a classifier behaves.  A classifier needs to be trained on labelled sounds or colours and will learn to recognize only those, and will have a hard time telling other sounds or colours apart.

b) The limitations that our brains reveal when challenged to perform some generative tasks (tasks of imagination) are identical to the limitations that the machine learning tools called classifiers exhibit.

Let me try the experiment on you.   Here’s a test of your imagination.  Imagine a colour that you have never seen before.

Not a mixture of colours, mind you, but a colour that you have never ever seen before.

If you are like most people, you’ll draw a blank.

And that is what a classifier would do too.

So, I would say that the human brain models things like colours or phonemes using some kind of classification algorithm, because it displays the limitations that such algorithms do.

So it is possible that we shall be able to discover by similar experiments on different types of human cognitive functions, that humans are merely machines capable of consciousness (of modeling a certain set of parameters related to what we perceive) and other cognitive functions that define us as human.

 

Further Discussion

People with whom I’ve discussed this sometime ask me if considering consciousness as the process of building a model of something adequately explains feelings, emotions, likes and dislikes and love and longing.

My answer is that it does, at least as far as likes and dislikes go.

A liking of something is a parameter associated with that thing and it is a single-value parameter that can be easily modeled by one or more numbers.

Neural networks can easily represent such numbers (regression models) and so can model likes and dislikes.

As for love and longing, these could result from biological processes and genetic inclinations, but as long as they are experienced, they would have had to be modeled in the human mind, possibly represented by a single number (a single point representation of intensity) or a distributed representation of intensity.  What is felt in these cases would also be modeled as an intensity (represented at a point or in a distributed manner).  One would be conscious of a feeling only when one could sense it and model it.  And the proof that one has modeled it lies in the fact that one can describe it.

So, when  the person becomes conscious of the longing, it is because it has been modeled in their brain.

 

Still Further Discussion

Again, someone asked if machines could ever possibly be capable of truth and kindness.

I suppose the assumption is that only humans are capable of noble qualities such as truth and kindness or that there is something innate in humans which gives rise to such qualities (perhaps gifted to humanity or instilled in them by the divine or the supernatural or earned by souls that attain humanity through the refinement of past lives).

However, there is no need to resort to such theories to explain altruistic qualities such as truthfulness, goodness and kindness.  It is possible to show game theoretically that noble qualities such as trustworthiness would emerge in groups competing in a typical modern economic environment involving a specialization of skills, interdependence and trading.

Essentially the groups that demonstrate less honesty and trustworthiness fail to be competitive against groups that demonstrate higher honesty and trustworthiness and therefore are either displaced by the latter or adopt the qualities that made the latter successful.  So, it is possible to show that the morals taught by religions and noble cultural norms can all be evolved by any group of competing agents.

So, truth and kindness are not necessarily qualities that machines would be incapable of (towards each other).  In fact, these would be qualities they would evolve if they were interdependent and had to trade with each other and organize and collaborate much as we do.

 

Related Work

This is a different definition than the definition used by Max Tegmark in his book “Life 3.0” but his definition of “consciousness” as “subjective experience” confuses it with “sentience” (the ability to feel).

Tegmark also talks about the work of the philosophers David Chalmers and Scott Aaronson, who seem to be approaching the question from the direction of physics – as in we are just particles from food and the atmosphere rearranged, so what arrangement of particles causes consciousness?

I think that is irrelevant.

All we need to ask is “What is the physical system, whatever it is made of, capable of modeling?”

Interestingly, in the book, Tegmark talks about a number of experiences that any theory of consciousness should explain.

Let’s look at some of those.

 

Explanatory Power of this Model

Explaining Abstraction

He talks about how tasks move from the conscious to the unconscious level as we practise them and get good at them.

He points out that when a human reads this, you do not read character by character but word by word.  Why is it that as you improve your reading skills, you are no longer conscious of the letters?

Actually, this can be explained by the theory we just put forth.

When we are learning to read (modeling the text is reading), we learn to model characters when we see a passage of text like this one and read character by character.

But with practice, we learn to model words or phrases at a higher level from passages of text, and direct our attention to the words or phrases because that facilitates reading.

We can chose to direct our attention to the letters and read letter by letter as well, if we so choose.

So, this model can explain attention too.

Attention

The brain is limited in its capacity to process and store information, so the human brain focuses its attention on the parts of the model it has built that are required for the performance of any task.

It can chose to not keep in memory more granular parts of the model once it has built a larger model.  For instance it can choose to not keep in memory the characters if it already has modeled the word.

This also explains phenomena such as “hemineglect” (patients with certain lesions in their brain miss half their field of vision but are not aware of it – so they may not eat food in the left half of their plate since they do not notice it).

We can explain it by saying that the brain has modeled a whole plate from the faulty sensory information provided to it and therefore the user is conscious of a whole plate, but minus the missing information.

Blindsight

Tegmark also talks of the work of Christof Koch and Francis Krick on the “neural correlates of consciousness”.

Koch and Krick performed an experiment where they distracted one eye with flashing images and caused the other eye to miss registering a static image presented to it.

They inferred from this that the retina is not capable of consciousness.

I would counter that by saying that the retina is conscious of the pixels of the images it sees if it constructs models of them (as it does) and stores them.

But if the brain models more abstract properties more useful to the tasks we perform, we focus our attention on those and therefore do not store in the memory the images that are not relevant to the more critical task (the distracting task).

So, I would argue that our consciousness can include models that comes from the retina (if some neural pathway from the retina creates models in memory at the pixel level).

But if our attention decides to focus on and consign to memory better things than what the retina models, it will, and then it will not necessarily model and be conscious of pixels from the retina.

 

Still Other work

Tegmark also talks extensively about the work of Giulio Tononi and his collaborators on something called “integrated information” and the objections to it by Murray Shanahan, but I’ll leave those interested in those theories to refer the work of their authors.

I also examine Graziano’s Attention Schema Theory of consciousness in another post https://aiaioo.wordpress.com/2017/12/18/mechanical-consciousness-and-attention/

Advertisements

Deep Bayesian Learning for NLP

Deep learning is usually associated with neural networks.

In this article, we show that generative classifiers are also capable of deep learning.

What is deep learning?

Deep learning is a method of machine learning involving the use of multiple processing layers to learn non-linear functions or boundaries.

What are generative classifiers?

Generative classifiers use the Bayes rule to invert probabilities of the features F given a class c into a prediction of the class c given the features F.

The class predicted by the classifier is the one yielding the highest P(c|F).

A commonly used generative classifier is the Naive Bayes classifier.  It has two layers (one for the features F and one for the classes C).

Deep learning using generative classifiers

The first thing you need for deep learning is a hidden layer.  So you add one more layer H between the C and F layers to get a Hierarchical Bayesian classifier (HBC).

Now, you can compute P(c|F) in a HBC in two ways:

Product of Sums
Computing P(c|F) using a Product of Sums
Sum of Products
Computing P(c|F) using a Sum of Products

The first equation computes P(c|F) using a product of sums (POS).  The second equation computes P(c|F) using a sum of products (SOP).

POS Equation

We discovered something very interesting about these two equations.

It turns out that if you use the first equation, the HBC reduces to a Naive Bayes classifier. Such an HBC can only learn linear (or quadratic) decision boundaries.

Consider the discrete XOR-like function shown in Figure 1.

hbc_figure_1

There is no way to separate the black dots from the white dots using one straight line.

Such a pattern can only be classified 100% correctly by a non-linear classifier.

If you train a multinomial Naive Bayes classifier on the data in Figure 1, you get the decision boundary seen in Figure 2a.

Note that the dotted area represents the class 1 and the clear area represents the class 0.

Multinomial NB Classifier Decision Boundary
Figure 2a: The decision boundary of a multinomial NB classifier (or a POS HBC).

It can be seen that no matter what the angle of the line is, at least one point of the four will be misclassified.

In this instance, it is the point at {5, 1} that is misclassified as 0 (since the clear area represents the class 0).

You get the same result if you use a POS HBC.

SOP Equation

Our research showed us that something amazing happens if you use the second equation.

With the “sum of products” equation, the HBC becomes capable of deep learning.

SOP + Multinomial Distribution

The decision boundary learnt by a multinomial non-linear HBC (one that computes the posterior using a sum of products of the hidden-node conditional feature probabilities) is shown in Figure 2b.

Decision boundary of a SOP HBC.
Figure 2b: Decision boundary learnt by a multinomial SOP HBC.

The boundary consists of two straight lines passing through the origin. They are angled in such a way that they separate the data points into the two required categories.

All four points are classified correctly since the points at {1, 1} and {5, 5} fall in the clear conical region which represents a classification of 0 whereas the other two points fall in the dotted region representing class 1.

Therefore, the multinomial non-linear hierarchical Bayes classifier can learn the non-linear function of Figure 1.

Gaussian Distribution

The decision boundary learnt by a Gaussian nonlinear HBC is shown in Figure 2c.

Decision Boundary of a Gaussian SOP HBC.
Figure 2c: Decision boundary learnt by a SOP HBC based on the Gaussian probability distribution.

The boundary consists of two quadratic curves separating the data points into the required categories.

Therefore, the Gaussian non-linear HBC can also learn the non-linear function depicted in Figure 1.

Conclusion

Since SOP HBCs are multilayered (with a layer of hidden nodes), and can learn non-linear decision boundaries, they can therefore be said to be capable of deep learning.

Applications to NLP

It turns out that the multinomial SOP HBC can outperform a number of linear classifiers at certain tasks.  For more information, read our paper.

Visit Aiaioo Labs

Fun with Text – Managing Text Analytics

The year is 2016.

I’m a year older than when I designed the text analytics lecture titled “Fun with Text – Hacking Text Analytics“.

Yesterday, I found myself giving a follow on lecture titled “Fun with Text – Managing Text Analytics”.

Here are the slides:

“Hacking Text Analytics” was meant to help students understand a range text analytics problems by reducing them into simpler problems.

But it was designed with the understanding that they would hack their own text analytics tools.

However, in project after project, I was seeing that engineers tended not to build their own text analytics tools, but instead rely on handy and widely available open source products, and that the main thing they needed to learn was how to use them.

So, when I was asked to lecture to an audience at the NASSCOM Big Data and Analytics Summit in Hyderabad, and was advised that a large part of the audience might be non-technical, and could I please base the talk on use-cases, I tried a different tack.

So I designed another lecture “Fun with Text – Managing Text Analytics” about:

  • 3 types of opportunities for text analytics that typically exist in every vertical
  • 3 use cases dealing with each of these types of opportunities
  • 3 mistakes to avoid and 3 things to embrace

And the take away from it is how to go about solving a typical business problem (involving text), using text analytics.

Enjoy the slides!

Visit Aiaioo Labs

Ruminations on Consciousness

descartes_mind_and_body
Descartes’ illustration of the mind-body problem

There is an interesting unanswered question that humanity still hasn’t managed to put to rest, and it is:

“What is consciousness?”

Is human consciousness magical or mechanical?

Is there some magical thing called a soul in all animals that makes us who we are, drives our actions and makes us conscious of the world around us?

Various religious traditions have different explanations for consciousness.

Traditional Hypothesis

Semitic traditions – Judaism, Islam and Christianity – don’t talk much about consciousness but they have an implicit position on the subject.  On the other hand, Hindu philosophical traditions talk explicitly about it, and the following are some of the more well-known philosophical positions:

  1. a) Advaita – the consciousness of creatures on earth is essentially the same as that of the divine.
  2. b) Dvaita – there are two kinds of consciousness – that of earthly creatures and that of the divine.
  3. c) Vishishtadvaita – there are two kinds of consciousness – earthly and divine – and the former can become one with the latter.

So, Hindu philosophies take one of the following positions:

Dvaita:

phil_dvaita

The earthly consciousness is singular from the point of view of the individual.  Each person (and in some traditions each animal) is believed to have one.

Advaita:

phil_advaita

The view of all Semitic religions, though not explicitly stated, seems to be closer to that of Dvaita philosophy, in that souls are considered distinct entities from the gods (after death, these souls end up in a good place or a bad place for a long time).

I say gods in plural because all Semitic religions seem to believe in the existence of a good divine being and an evil divine being (satan/shaitan/iblees) who is different from the good divine being and not fully subject to him (which is different from Hindu philosophies where the concept of an absolutely bad/evil divine being doesn’t seem to exist).

phil_semitic

So, there are two or more (if you count the lesser divine beings called angels, jinns, etc) divine consciousnesses in Judaism, Islam or Christianity.

Judeo-Christian Beliefs

In Judeo-Christian literature, for example, a text about a man called Job deals with this concept of a bad divine being.  The Wikipedia says: “Between Job 1:9–10 and 2:4–5, Satan points out that God has given Job everything that a man could want, so of course Job would be loyal to God; Satan suggests that Job’s faith would collapse if all he has been given (even his health) were to be taken away from him. God therefore gives Satan permission to test Job.”  There are also in this belief system conceptions of other divine beings: a holy trinity, a pantheon of angels, etc.

Islamic Beliefs

In an account by al-Tabari, a 9th century scholar, the prophet Muhammad is described as having endorsed in verse three dieties of the Kaaba other than Al-Lah at one point (they were called Al-Lat, Al-Uzza, and Manat), and later withdrawn the endorsement with the explanation that Satan (who in Islamic theology is believed to have only the power to put ideas into people’s minds) had made him do it.  The verses endorsing these other deities (later withdrawn) are referred to in some places as the Satanic Verses (https://en.wikipedia.org/wiki/Satanic_Verses).

All the above religious belief systems imagine one single soul as resident in each living thing on earth.

However, in the Baha’i faith, it appears that there is a concept of a good and an evil side in each living thing, though possibly not as two consciousnesses.  Abdu’l-Bahá is supposed to have said: “This lower nature in man is symbolized as Satan — the evil ego within us, not an evil personality outside.”

In fiction, there have been imaginings of more than one conscious ‘soul’ being resident in a human.

Take the tale of Dr. Jekyll and Mr. Hyde.  In it, the character of Jekyll/Hyde is described as having two consciousnesses, one good and one bad – each akin to one of the principal divine beings in Semitic religions.

phil_jekyll_hyde.png

So, in Jekyll/Hyde’s world, there is a multiplicity of consciousnesses, not just in the divine plane, but also on earth.

Extrapolation

So, it appears that we can imagine multiple divine beings in existence, and multiple consciousnesses existing in each of us.  We can also imagine a single divine being in existence, and a single conscious soul.  We can even imagine the earthly soul/consciousness being the same as the divine soul/consciousness.

The obvious question is:  can we imagine the absence of the magical soul since we can imagine the absence of a divine being (atheistic belief systems have existed since times immemorial)?

One of the reasons for postulating the existence of divine beings is that they give us a way of explaining inexplicable phenomena.  In antiquity, when humans needed to explain thunder and tides, they imagined thunder gods and sea gods.  Later, when they became better able to explain (or at least to be conscious of unchangeable patterns in and to fend against) nature, they began to adopt more abstract conceptions of deities that reflected human consciousness, and where the religious traditions served to provide an explanation for the phenomenon of life, and an ethical framework for reasoning during the period of life.

Once the creation of living things could be explained, and social contracts became things one could reason about, humans seem to have found it easier to surmise that no gods were needed to explain creation, life and ethical values.

Similarly, as we become better able to explain how our minds work, and to understand perception, memory, cognition and language, and also phenomena such as hallucination and mental illnesses, beliefs in magical phenomena such as spirits taking possession of individuals have begun to diminish.

By extrapolation, one might suppose that with time, the belief in an immortal soul will also diminish.

Alternative Hypothesis

What is likely to replace the concept of a magical soul in all living things?

One of the modern theories of consciousness (according to David Chalmers) seems to be that the mind and the brain are one (see the mind-brain identity theory of 1950 mentioned in http://consc.net/papers/five.pdf).

However, since the brain itself is little understood, we’d merely be explaining something we don’t understand using something that we don’t understand (though we’d be seeing it as something physical).

It seems to me that it might be better to explain our consciousness to ourselves in terms of how computers do the things that we are conscious of doing, since computers are well-understood.

It appears that a definition of consciousness as:  “the ability to perceive the world, form a model of the world (including imagined worlds), retain a memory of the perceptions and models, reason independently about those models, and optionally, to act on the perceptions” would be accurate.

So any machine could be considered conscious if it is able to perceive the world, form a model of the world (including imagined worlds), retain a memory of the perceptions and models, reason independently about those models, and optionally act on the perceptions.

There are already theories that seem to come close to the above definition.  They are called Representationalist Theories.  And many of them, interestingly, seem to have been developed only in the last 20 years: http://plato.stanford.edu/entries/consciousness/#RepThe

Here are some other discussions of the aforesaid theories:

  1. http://plato.stanford.edu/entries/consciousness-representational/
  2. http://plato.stanford.edu/entries/qualia/

A concept such as qualia (see preceding link), which seems so troubling to a representationalist philosopher, would appear trivial to someone well-versed in machine learning, because in machine learning, we already have names for concepts that go beyond these, such as features and models.

So it seems to me at first glance that representationalist theories of consciousness with the addition of concepts from machine learning can adequately explain consciousness in all animals.

Buddhist Philosophy

Now, it turns out that this alternative hypothesis that there is no soul is not such a recent supposition.  It seems to have appeared a long time ago.

In fact, it formed a central tenet of another Indian religion – Buddhism.

In Buddhism the concept of the non-existence of the soul is called Anatta from an (not, without) and attā (soul).

So, it appears that in one Indian philosophical tradition, there is no place for the supernatural consciousness.

anatta

Consequences and More Questions

The consequences of reducing the being/soul/consciousness to a mechanical process would be very interesting.

If, we accept the above definition, we would have to think of humans as computers, because if human perceptions of the physical world are nothing more than a mental model of the same, then an electronic model of the world in a computer or a physical model in a mechanical device would also qualify as consciousness of the world.

Can we say that a computer or a mechanical model is conscious of the world in the same way that we are?

If we go with the theory that we have no magical soul, then the only alternative that remains is to accept that if a physical representation of the world can be created of its own volition by a machine that can also reason about it, then the machine is also conscious of the world.  In other words, our consciousness would have to be accepted as consisting of nothing more than our memories of the world we perceive and the models we create in our minds, and our ability to think about and reason over them.  Anything that can similarly perceive, model and remember things would have to be considered as possessing consciousness, leading to other interesting questions such as:  are humans machines, is ‘consciousness’ = ‘life’, can we have consciousness without life, and finally,  what is life?

Visit Aiaioo Labs

Building Machine Learning Models that can help with Customer Service and Supply Chain Management

The Laptop that Stopped Working

One fine day, a couple of months ago, a laptop that we owned stopped working.  We heard 4 beeps coming from the machine at intervals but nothing appeared on the screen.

Customer Service

The service person quickly looked up the symptoms in his knowledge base and informed us that 4 beeps meant a memory error.

I replaced first the two memory modules one by one, but the machine still wouldn’t start.  I tried two spare memory modules that I had in the cupboard but the computer wouldn’t start.

I had a brand new computer with me that used the same type and speed of memory as the one we were fixing.  I pulled out its memory chips and inserted them into the faulty computer, but still no luck.

At that point, the service person told me that it must be the mother board itself that was not working.

Second Attempt at Triage

So the next day, a mother board and some memory arrived at my office.  A little later a field engineer showed up and replaced the mother board.   The computer still wouldn’t start up.

When the field engineer heard 4 beeps, the engineer said it MUST BE THE MEMORY.

Third Attempt at Triage

A few days later, a new set of memory modules arrived.

The engineer returned and tried inserting the new memory in.  Still no luck.  The computer would not start and you could still hear the 4 beeps.

A third set of brand new memory modules and a new mother board were sent over.

Fourth Attempt at Triage

The engineer tried both motherboards and various combinations of memory modules, but still, all you could hear were 4 beeps and the computer would not start.

During one of his attempts to combine memory and motherboards, the engineer noticed that though the computer did not start, it did not beep either.

So, the engineer guessed that it was the screen that was not working.  But just to be safe, he’d ask them to send another motherboard and another set of memory modules to go with it.

Fifth Attempt at Triage

The screen, the third motherboard and the fourth set of memory modules arrived in our office and an engineer spent the day trying various combinations of screens, motherboards and memory modules.

But the man on the phone said: “Sir, 4 beeps means there is something wrong with your memory.  I will have them replaced.”

I had to take out my new laptop’s memory and pop it into the faulty machine to convince the engineer and support staff that replacing the memory would not fix the problem.

All the parts were now sent over – the memory, motherboard, processor, drive, and screen.

Sixth Attempt at Triage

Finally, the field engineer found that when he had replaced the processor, the computer was able to boot up with no problems.

Better Root Cause Analysis

The manufacturer could have spared themselves all that expense, time and effort had they used an expert system that relied on a probabilistic model of the symptoms and their causes.

Such a model would be able to tell, given the symptoms, which component was the most likely to have failed.

Such a model would be able to direct a field engineer to the component or components whose replacement would be most likely to fix the problem.

If the attempted fix did not work, the model would simply update its understanding of the problem and recommend a different course of action.

I will illustrate the process using what is known in the machine learning community as a directed probabilistic graphical model.

Run-Through of Root Cause Analysis 

Let’s say a failure has occurred and there is only one symptom that can be observed: the laptop won’t start and emits 4 beeps.

The first step is to enter this information into the probabilistic graphical model.  From a list of symptoms, we select the ones that we observe (all observed symptoms are represented as yellow circles in this document).

So the following diagram has only one circle (observed symptom). 

Model 1:  The symptom of 4 beeps is modeled in a probabilistic graphical model with a yellow circle as follows:

pgm_1

Now, let’s assume that this symptom can be caused by the failure of memory, the motherboard or the processor.

Model 2:  I can add that information to the predictive model, so that the model now looks like this:

pgm_2

The model captures the belief that the causes of the symptom – processor / memory / motherboard failure are (in the absence of any symptoms) independent of each other.

It also captures the belief that given a symptom like 4 beeps, evidence for one cause will explain away (or decrease the probability of) the other causes.

Once such a model is built, it can tell a field engineer the most probable cause of a symptom, the second most probable cause and so on.

So, the engineer will only have to look at the output of the model’s analysis to know whether he needs to replace one component, or two, and which ones.

When the field engineer goes out and replaces the components, his actions can also be fed into the model.

Model 3:  Below is an extended model into which attempts to fix the problem by replacing the memory can be incorporated.

pgm_3

If a field engineer were to feed into the system the fact that the memory was replaced with a new module and it didn’t fix the problem, the system would be able to immediately figure out that the memory could not be the cause of the problem, and it would suggest the next most probable cause of failure.

Model 4

Finally, in case new memory modules being sent to customers for repairs frequently turned out to be defective, that information could also be added to the model as follows:

pgm_4

Now, if the error rate for new memory modules in the supply chain happens to be high for a particular type of memory, then if memory replacement failed to fix a 4-beep problem, the model would understand that faulty memory could still be the cause of the problem.

Applications to Supply Chain Management

The probabilities of all the nodes adjust themselves all the time and this information can actually be used to detect if the error rates in new memory module deliveries suddenly go up.

Benefits to a Customer Service Process

1.  Formal capture and storage of triage history

2.  Suggestion of cause(s) given the effects (symptoms)

3.  Suggestion of other causes given triage steps performed

What the system will seem to be doing (to the layman):

1.  Recording symptoms

2.  Recommending a course of action

3.  Recording the outcome of the course of action

4.  Recommending next steps

Analysing documents for non-obvious differences

The ease of classification of documents depends on the categories you are looking to classify documents into.

A few days ago, an engineer wrote about a problem where the analysis that needed to be performed on documents was not the most straight-forward.

He described the problem in a forum as follows: “I am working on sub classification. We already crawled sites using focused crawling. So we know domain, broad category for the site. Sometimes site is also tagged with broad category. So I don’t require to predict broad class for individual site. I am interested in sub-classification. For example, I don’t want to find if post is related to sports, politics, cricket etc. I am interested in to find if post is related to Indian cricket, Australia cricket, given that I already know post is related to cricket. Since in cricket post may contains frequent words like runs, six, fours, out,score etc, which are common across all cricket related posts. So I also want to consider rare terms which can help me in sub-classification. I agree that I may also require frequent words for classification. But I don’t want to skip rare terms for classification.

If you’re dealing with categories like sports, politics and finance, then using machine learning for classification is very easy.  That’s because all the nouns and verbs in the document give you clues as to the category that the document belongs to.

But if you’re given a set of categories for which there are few indicators in the text, you end up with no easy way to categorize it.

After spending a few days thinking about it, I realized that something I had learnt in college could be applied to the problem.  It’s a technique called Feature Selection.

I am going to share the reply I posted to the question, because it might be useful to others working on the classification of documents:

You seem to have a data set that looks as follows (letters are categories and numbers are features):

A P 2 4
A Q 2 5
B P 3 4
B Q 3 5

Let’s say the 2s and the 3s are features that occur very frequently in your corpus while the 4s and the 5s are features that occur far less frequently in your corpus.

When you use the ‘bag of words’ model as your feature vector, your classifier will only learn to tell A apart from B (because the 4s and 5s will not matter much to the classifier, being overwhelmed as it is by the 2s and 3s which are far more frequent).

I think that is why you have come to the conclusion that you need to look for rare words to be able to accomplish your goal of distinguishing category P from category Q.

But in reality, perhaps what you need to do is identify all the features like 4 and 5 that might be able to help you distinguish P from Q and you might even find some frequent features that could help you do that (it might turn out that some frequent features might also have a fairly healthy ability to resolve these categories).

So, now the question just boils down to how you would go about finding the set of features that resolves any given categorization scheme.

The answer seems to be something that literature refers to as ‘Feature Selection’.

As the name says, you select features that help you break data points apart in the way you want.

Wikipedia has an article on Feature Selection:

http://en.wikipedia.org/wiki/Feature_selection 

And Mark Hall’s thesis http://www.cs.waikato.ac.nz/~mhall/thesis.pdf seems to be highly referenced.

Mark Hall’s thesis – “A good feature subset is one that contains features highly correlated with (predictive of) the class, yet uncorrelated with (not predictive of) each other.”

To be honest to you, I’d heard about Feature Selection, but never connected it to the problem it solves until now, so I’m just looking up reading material as I write.

Best of luck with it.

Mental Models and Art Aesthetics

The Mind of a Dance
An Exploration of the Mental Model that a Spectator might Form when Watching a Dance.

This is a post about a painting that I completed only last week.  But it was a painting that I’d been trying to figure out how to paint for all of 7 years.  It all started with a motorbike ride.

One day in 2006, I was riding down a mountainous road in North Carolina,
when I saw a hill of striking beauty (I believe it was somewhere near a town called
Marion).

It was evening and the colours were beginning to fade, but I could clearly
make out on one side of the hill striking trees that were reddish-orange in hue and I
found them beautiful against the light green that surrounded them and the blue skies
above.

I continued to ride down the road.  I rounded the hill and came to the other
side. I saw again, waves of green and yellow and touches of red and I was overcome
with a sense of being in the presence of the highest beauty that the eye could
behold.

So, I stopped and removed the camera from the knapsack behind me. From
various points along the road, I tried to photograph the hill.

However, I could not frame a shot that could evoke the beauty of
the hill that stood before my eyes.

That evening, I was troubled by a sense of failure, of bafflement at having failed to record the soul of the scenery that I had left behind. The hills around Marion were all extremely beautiful, but nothing else on the road had the perfection of the hill that I had not been able to photograph.

So, that evening, I sat and pondered a very interesting question that had to do with
aesthetics. The question was this: Does the mind experience beauty according to
images that it has seen, or does the mind experience beauty by a mental model it
has built of an object and the associations it has made between that model and
models of other objects?

Let me clarify what I mean by a mental model with an example. When we approach a
person whom we have never met before, we don’t immediately see all of that person.
We perhaps at first only see the front of the person, at a distance. Later we see more
of the person, their face, their profile, and how they look from behind. As we see
more and more of a person, we fill in more and more of the missing information about
how the person looks.

However, from the very beginning, we know that we are
looking at a person, even though we haven’t seen the person’s teeth or sides or
back. So we must have decided that the frontal image that we saw fit best into a
mental model of a person.  The mental representation may not just be a 3
dimensional representation. It may be a set of associations and memories over time, all woven together in complex ways.

So, the question that came to my mind was:  Is a human mind’s evaluation of
the aesthetics of an object based on a) any of the individual images of the object
that we have seen or on b) the entire mental model that we have constructed in our
minds of the object.

The more I thought about it, the more I began to feel that my experience with the hill
would be best explained by the conclusion that our aesthetic experience of an
object is determined by the mental model we have formed of it, and not by any image
of it, and here were the reasons for my leaning towards the latter conjecture:

a) If it is just a single view or image of an object that drives us to experience it as
beautiful or attractive, then our feelings about its attractiveness should change all
the time, based on small things like the angle at which we are viewing it. That does
not seem to be how we experience things in real life.

b) The experience of the hill led me to think that the latter explanation was the more
likely because if it was one of the views of the hill that made me think it beautiful, I
should have been able to capture the view with my camera. Since I couldn’t,
perhaps it was something I saw on the other side of the hill, combined with the
fading light and the landscape along the way (perhaps combined with the beauty of
some of the trees I saw upon the hill) that came together to contribute to the sense
of beauty that I experienced when looking at the hill.

That evening, I thought of a new kind of painting technique that would be possible if
the above conjecture proved true. It seemed that it might be worthwhile to paint a
picture from the point of view of the mental model a viewer might form if they saw the actual subject of the painting in context, possibly from different angles, or at different distances, and over an extended period of time.

But it was only last week, a good 7 years after I passed through Marion, NC, that I
tried to create a painting of the sort. Since I have recently watched many Indian
classical dance performances and some contemporary dance performances, I
attempted to paint all the things that might be associated with a dance performance
in the mind of a spectator. I constructed the painting like a fractal, starting with a
dancer and then representing the dancer at different levels of granularity and detail.

One of the things on the mind of the spectator would be the dancer’s body
(the whole form, from head to toe). Emoting is a key aspect of dancing in Indian classical dance.  So, in order to appreciate the emotion, a spectator would have to pay close attention to (and so form a model of) the dancer’s face, and of his or her eyes. So, I wove an
image of a forehead, and of an eye, into the painting. Another part of the body that
Indian classical dancers use a lot is the hands and the fingers. They use them to
denote various objects, characters and actions. I associated the movements of the
arms and fingers with flowing water, and plants with branches. So, I painted in some
flowing washes and let some negative white spaces take the form of trees in the
painting. Indian dance costumes are colourful, so I made everything in the painting
as bright as a classical dance costume.

The process of painting was a lot of fun. It was very different from the process of
scientific experimentation of course. There was none of the same rigour and I
wasn’t really looking to prove anything.

I liked the painting that came about and I have inserted a photograph of the same into this blog entry (it’s at the top).

But we don’t know yet whether mental models lie at the root of aesthetic
experiences. And you know what? As an AI researcher, I am just dying to find out! It
seems possible to design an experiment to determine the validity of the conjecture.

We could take an object whose facade is beautiful but whose side view is not. The
subjects of the experiment would be shown the front and side views of the building
and asked to rate the appearance of the front of the building. The control group
would be shown just the front of the building and asked to rate its appearance.

A significant difference in ratings (poorer ratings by subjects shown the front and
side view) would support the conjecture that a mental model lay at the root of the
aesthetic experience.

I hope that one fine day, we might be able to conduct the experiment, and if we do, I
shall share the results here with you!

As a footnote, I’d like to share this blog post http://aimeeknight.com/2012/04/10/the-new-aestheticperhaps/ by a blogger from (it appears to be – hmm, what a coincidence) North Carolina.  The article is about art that attempts to portray the world from the point of view of a machine – using a machine’s aesthetic – the phrase they’ve used is ‘New Aesthetic’.  I had not heard of the concept before, and it isn’t the same thing that I am working on, but since we’re discussing aesthetics and perception, it appeared relevant and interesting.

As a second footnote, I’d like to put in a word of explanation about why a firm of computer scientists and AI researchers should be interesting in painting.  Well, to many early AI researchers, AI was as much a quest to understand the human mind as it was a quest to solve difficult problems.  There were many attempts to comprehend the human mind by constructing logical and mathematical models of various cognitive processes (things the mind did) and seeing if the models could mimic the real thing.  So, AI research had a lot of common ground with Psychology.  And there is still a cross-disciplinary research area known as Cognitive Science that blends Psychology and AI research.  So, I just want to defend myself by saying that talking about aesthetics is not off-topic for this blog.