Hate News and Social Media

[Image Credit: Wikipedia]

We all know about “fake news” and understand that we need to be on our guard against it. But have we thought hard enough about “hate news” and are we concerned enough about it?

Let me start with a quick look back at fake news. Fake news is misinformation (often deliberate) that is passed off as real news. It contains vignettes that are factually incorrect. Two classic examples of fake news were the “birther” hypothesis (the incorrect information that went around that Obama was not born in the USA) and the Obama religion conspiracy theory (that Obama was not of the religion he claimed to be).

The Indian equivalent of that is what I call the Rahul Gandhi birther hypothesis, which is that Rahul Gandhi’s grand-dad’s name was not Feroze Ghandy but Feroze Khan (and that he was a Pathan from Pakistan) thereby suggesting to the majority of Indians not only that Rahul Gandhi was not patriotic, but also sympathetic to a certain religious dispensation. It had the effect of the Obama birther hypothesis and the Obama religion hypothesis rolled into one.

Now that is fake news, but the reason the fake news even matters to Indians and Americans is that it has been preceded by a campaign of hate news.

There has been, for many years, in the USA and in India, the dissemination through certain far-right channels, of information that incites hatred towards people of certain groups. In the case of the USA, there has been a one-sided portrayal of immigrants. In the case of India, there has been a sustained campaign against certain religious groups.

These campaigns stir up hatred towards a religious or racial “other” and make people susceptible to having their attitudes and decisions manipulated by fake news.

I must mention here that the propaganda of some of the most successful groups taking on state actors (such as ISIS in the middle east and Jaish-e-Mohammed in South Asia) also use hatred as a means of obtaining recruits and maintaining the loyalty of supporters around the world, their deliberate campaigns of hatred making their target audience susceptible to having their attitudes and decisions manipulated.

And it works! During the final three months of the election campaign in India, this incitement to hatred took on such a feverish pitch that after the elections, people were attacked in the streets of Delhi for such innocuous acts as wearing prayer caps.

It might also be said that in the last elections, both in India and in the USA, a key strategy employed to win the elections was to:

Incite people to a feverish pitch of hatred towards an “othered” group
Paint the opposition as sympathetic towards that “othered” group
Claim to represent and sympathise with the “in” group.

The “othering” takes place through hate news. The association of the political opponent with the othered group makes use of fake news.

In the USA, the othered group was people whose origins might be traced to beyond the Southern border and people of a certain religion. In India, the othered group was people of a certain religion.

How do we define hate news

I think hate news might be defined as any information (real or fake) that results in othering.

Hate news typically works by presenting a one-sided view of an issue. For instance, in India, hate news highlights those historical events where rulers belonging to the othered group have acted against the interests of the in group, while remaining completely silent about historical events where the same rulers have acted in the interests of the in group, and about events where rulers belonging to the in group have acted against the interests of the in group or against the interests of other groups.

So we could define hate news as news that is one-sided (against an othered group). A hate news source could be defined as a source that only presents one-sided perspectives against an othered group.

However, the mention of the othered group is important in such a definition. Without the mention of the othered group, all one-sided argument might be considered as bad. But activists (such as climate activists) take one-sided positions against something that could pose a danger to society.

So the one-sidedness of the argument alone is insufficient to mark it as hate news. It would have to be hate inducing and aimed at othering a religious or ethnic group of people.

How does hate news help a majoritarian politician?

Once a politician has successfully ‘other’ed a group of people, they are now in charge of a political project that energizes their support base and consequently helps them win elections provided the “in” group is in a dominating majority.

I believe that Modi in India, Trump in the USA, Hitler in Germany and Boris Johnson in the UK all used the same trick.

Othering need not always be the result of the use of hate news. Politicians might not even be the agents of that hatred. They might merely be opportunistic exploiters of hatred. For instance, racial prejudice had existed in the United States prior to the development of modern 24×7 television news channels and religious hatred has existed in India for a long time before literacy levels were high enough for mass news consumption.

However, it is possible that deliberate use of hate news was made to harden the pre-existing otherings. Hate news may have also helped propagate otherings in places where they did not always exist. For instance, a political party that uses the strategy of hatred has made considerable inroads into the State of West Bengal in India which was once known for its liberal and left-leaning disposition.

The existence of social media echo chambers makes it easy to concentrate hate news upon people with pre-existing biases who are highly susceptible to it, and these echo-chambers make it difficult to combat the insidious effects of the same.

Groups of people who keep sharing hateful news articles with each other can keep the flames of hatred alive and even raise its intensity for a very long time, possibly in perpetuity.

Many groups (organized or otherwise) exist that have the sole aim of helping a politician who claims to sympathise with the in-group come to power.

Combating hate news

The only thing that can counteract hate news is news that presents alternative but trusted view-points; that carries information that helps people see the situation from a different perspective.

It is not often possible to reason with a person who believes in the message contained in hate news, because it has been found that facts alone rarely change anyone’s mind. However, strategies involving looking at things from the othered person’s perspective seem to have had some success (though the findings in that respect are not straightforward).

However, since the lack of opposing perspectives supplies some of the power of hate news in echo chambers, the combating of the impact of hate news should probably involve a) the identification of hate news, and b) the communication of alternative perspectives to the user (possibly as recommendations near the hate news).

On social media, this might entail the use of algorithms that do more than recommend articles similar to what users have liked in the past. Better algorithms might have other aims such as increasing the diversity of perspectives that those users have been subjected to but without a loss in user engagement (a business necessity).

This might be an interesting challenge to the AI and language understanding communities.

Apart from technology strategies, it might be necessary to reach out across divides, with each side bearing in mind that engagement with different opinions, no matter how unpalatable, might be necessary if one is to prevent the political projects that finance hate news from winning.

Learn Deep Learning through Pytorch Exercises

I assume that you, like we, don’t enjoy having to stare at equations on a blackboard, and would rather be working through exercises that help you understand a subject.

We created a curriculum for teaching deep learning centered around Pytorch exercises (solving toy problems mostly).

Hopefully, those of us who hate sitting in classrooms will find it satisfying to learn the subject by doing experiments and observing how various algorithms fare on the toy problems.

The slides are linked to here:

Deep Learning through Pytorch Exercises from aiaioo

The code that goes with these slides is available from https://github.com/aiaioo/DeepLearningBasicsTutorial/

Mechanical Consciousness and Attention

I’d written an article a few weeks ago touching on the subject of what makes us conscious.

I wrote that consciousness was merely models of the world stored in memory. In other words, “anything – even if it is a machine – that can actively create a model of something that it senses (is aware of) and store it in memory in such a way as to permit retrieval of the model, is conscious of it.”

It then came to my notice that very similar thoughts had been expressed in a 2013 book by Michael A. Graziano, a Princeton neuroscientist, titled “Consciousness and the Social Brain”.

So, I purchased the book and read it. It set out a theory of consciousness called the “Attention Schema Theory” that Graziano had discovered through his work on social neuroscience (in particular this paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3223025/ that he’d written in 2011).

Graziano describes consciousness (awareness) as “the brain’s simplified, schematic model of the complicated, data-handling process of attention”.

I liked Graziano’s arguments on attention being required for consciousness and felt they deserved to be discussed in some depth here.

So, in this article, I go over Graziano’s Attention Schema Theory of consciousness and then share some further thoughts that I had about it.

Graziano’s Attention Schema Theory

Graziano gives a very concise and pithy description of what he means by “attention” and by “schema” and links them both very simply to awareness. He says, and I quote:

Attention is not data encoded in the brain; it is a data handling method. It is an act. It is something the brain does, a procedure, an emergent process. Signals compete with each other and a winner emerges – like bubbles rising up out of the water. As circumstances shift, a new winner emerges. There is on reason for the brain to have any explicit knowledge about the process or dynamics of attention. Water boils but has no knowledge of how it does it.

I am suggesting, however, that in addition to doing attention, the brain also constructs a description of attention, a quick sketch of it so to speak, and awareness is that description.

A schema is a coherent set of information that in a simplified by useful way, represents something more complex. In the present theory, awareness is an attention schema. It is not attention, but rather a simplified, useful description of attention.

Then, Graziano goes on to provide certain reasons for thinking that awareness (what I called consciousness in my own post on “Mechanical Consciousness”) is an attention schema.

He identifies eight similarities between attention and awareness and argues that consciousness is the process of creating a representation of attention in the brain.

Eight key similarities

If the hypothesis is correct, if awareness is a schema that describes attention, then we should be able to find similarities between awareness and attention. These similarities have been noted before by many scientists. Here I am suggesting a specific reason why awareness and attention are so similar to each other: the one is the brain’s schematic description of the other. Awareness is a sketch of attention. Below I list eight key similarities.

Both involve a target. You attend to something. You are aware of something.
Both involve an agent. Attention is performed by the brain. Awareness is performed by the “I” who is aware.
Both are selective. Only a small fraction of available information is attended at any one time. Awareness is selective in the same way. You are aware of only a tiny amount of the information impinging of your senses at any one time.
Both are graded. Attention typically has a single focus but while attending mostly to A, the brain spares some attention for B. Awareness also has a focus and is graded in the same manner. One can be mostly intently aware of A and a little aware of B.
Both operate on similar domains of information. Although most studies of attention focus on vision, it is certainly not limited to vision. The same signal enhancement can be applied to any of the five senses, to a thought, to an emotion, to a recalled memory, or to a plan to make a movement, for example. Likewise one can be aware of the same range of items. If you can attend to it, then you can be aware of it.
Both imply an effect on behaviour. When the brain attends to something, the neural signals are enhanced … and have a greater impact on behaviour. Likewise, when you are aware of something, you can choose to act on it. Both, therefore, imply an ability to drive behaviour.
Both imply deep processing. Attention is when an information processor devotes computing resources to an information set. Awareness implies an intelligence seizing on, being occupied by, experiencing, or knowing something.
Finally, and particularly tellingly, awareness almost always tracks attention. Awareness is like a needle on a dial pointing more or less to the state of one’s attention. At any moment in time, the information that is attended usually matches the information that reaches awareness … Awareness is undoubtedly a close but imperfect indicator of attention.

I believe that Graziano also linked attention to consciousness because of his earlier work on the neuroscience of social behaviour where he postulates that one being identifies another as conscious by watching it direct its attention to something. So, he seems to suggest at one point in the book that since we identify those things as conscious that appear to pay attention to something, attention must be central to consciousness.

Graziano also says that attention enhances awareness and awareness enhances attention. He says:

If the theory is correct, then awareness is a description, a representation, constructed in the brain. The thing being represented is attention. But attention is the process of enhancing representations in the brain.

I must say that I agree with all of the above, with all that Graziano has said, except for this:

The thing being represented is attention.

What is being represented is the world that is being perceived.

I agree with everything else.

So, I would write the above as: “awareness is a description, a representation, constructed in the brain. The thing being represented is the world perceived by the senses. But attention is the process of enhancing representations in the brain.”

So I agree that attention is important for choosing which parts of the information that the senses collect get enhanced into more abstract representations.

I agree that when a viewer looks at a street full of people, the viewer’s brain may pay attention to one person on that street and be more aware of that person than the rest. I think that what happens is that when the brain pays attention to that one person, it constructs and stores in memory a more detailed model of that person and less detailed models of all the rest of the scene of the street and associates them more strongly with things in the present.

However, explaining points 1 through 8 in the “eight key similarities” list is a bit more difficult.

The question here is what makes the one person one pays attention to appear more salient.

Graziano takes the view that explaining that salience requires the assumption that attention is required for consciousness.

I wonder if that is really required.

Graziano described attention as the process of enhancing a model of the world (creating a more detailed model of a part of a less detailed model of the world and associating those details more strongly with other salient things at the present time in memory).

But Graziano doesn’t explain why that which we attend to is salient in our awareness while other things are not.

He assumes that there is something that makes that happen and that is part of consciousness.

I think we can explain the salience of what we attend to if we make two assumptions:

That attending to something allows us to make more memories of it as time passes.
More recent memories are more salient.

So, let’s say there is a sequence of moments in time t1, t2, t3 and so on.

Let’s say at t1, a conscious viewer has looked out at a street without attending to anything.

His brain constructs rudimentary models of everything in view, perhaps allowing the user to discern humans and objects in the view.

Let’s say one of the humans draws the viewer’s attention so at time t2, the viewer transfers his/her attention to that human.

Now, according to Graziano’s views on attention, the viewer starts to form a more detailed model of the human who is the subject of the viewer’s attention at time t2.

If the viewer’s attention remains on the same human, then the viewer continues to store in memory models of the human that (s)he is paying attention to at t3.

This makes that human more salient in the viewer’s mind.

So, if we assume that more recent memories are more salient, we can explain consciousness without treating attention as something special, but merely as a process of creating and storing models of a part of what the senses are bringing to the viewer’s brain in memory.

We can explain all 8 similarities by using the above explanation/assumption of salience.

1, 3, 4, 6 and 8 all follow from the idea that our newest models in memory are those of what we attended to and therefore what we attended to is what is most salient in our awareness.

2 happens because the brain that is attending is also the same that is consigning models to memory.

5 and 7 follow from our definition of attention as a process of enhancing information.

So, it seems that we can explain all the above similarities by just speaking of attention as a process of creating models.

Our consciousness can still be explained merely in terms of those models that are stored in memory in a retrievable manner, without involving attention in the explanation.

I think there are advantages to involving attention in an explanation of consciousness.

Advantages of Involving Attention in an Explanation of Consciousness

a) It immediately excludes devices like cameras from any suggestion of consciousness.

b) It intuitively explains why some things in our awareness are more salient than others?

Disadvantages of Involving Attention in an Explanation of Consciousness

a) It excludes other living things like trees that are not known to have an attention mechanism from being counted as conscious. We don’t see trees paying attention to things. However, trees are aware of and respond to sunlight, the seasons and other conditions in their environment. So, I would argue that trees are conscious. A view of consciousness as emanating from the creation of models of the world (with or without the use of attention to enhance models) might be better able to accommodate the plant kingdom.

b) If we say that we are conscious of only those parts of the retrievable models in our brain/memory that we pay attention to (that make it into the attention schema, in Graziano’s words), then we have to answer the question of what it is in our attention that makes it so special that it can give a part of a model of the world the kiss of consciousness. We could avoid the difficulty by saying instead that some parts of the models in our brain are salient because they were stored/refreshed/enhanced in memory more recently and therefore are more easily retrievable and have a more ponderous effect on anything we think or do.

So, I would prefer it if Graziano were to define consciousness as: “a description, a representation, constructed in the brain. The thing being represented is the world perceived by the senses. But attention is the process of enhancing representations in the brain.”

That would result in the explanatory power he seeks and would also avoid the problem of assigning to attention the magical ability to create consciousness.

Mechanical Chef

On November 17th, 2017 at the Maker Faire in Bangalore, on stage before an audience of a few hundred people, a machine cooked a simple Indian side-dish – a potato fry.

The next day it prepared a pot of rice.

In so doing, it became the world’s first machine to cook an Indian meal autonomously.

The Mechanical Chef is the coming to fruition of an idea that germinated in the year 2012, when a colleague and I were walking down Millers Road, on the way to a cup of coffee at a roadside store, and I asked the colleague if there was any way in which we could really change the world and ‘make a dent in the universe’ as it were, in the present millennium.

The colleague was a woman, and she said that if Indian women didn’t have to do any cooking, it would save a large fraction of half a billion women in India 3 hours a day every day of their lives.

That day in 2012, my colleague, Princiya and I started dissecting Indian cooking.

We came up with simple operations that underlay Indian recipes that did not require shaping (we aren’t aiming to prepare rotis {there’s a machine for that already}, idlis or dosas {again, there’s a machine available for that}).

And they were:

Adding ingredients
Heating
Manipulations such as stirring, sieving, grinding, whisking and powdering
Transferring from one container to another

And we came up with a machine that could do all the above.

At the time, there was a Chinese cooking robot which could cook stir-fried (wok-friend) dishes:

and there were several experimental machines like Moley which used expensive robotic arms for cooking.

There is even another team in Bangalore that is also working at present on a cooking robot for Indian food!

Our First Design

In 2012 we came up with a much simpler architecture which could support all the above cooking functions (the ones that would be needed for Indian cooking). It was a type of sensor network for cooking.

It was an improvement over the Chinese automated wok in that it could cook multiple dishes at the same time, and it was an improvement over the robotic arms because it was much simpler and therefore more cost-effective.

We took the drawings to Dr. Raju of Systemantics (a maker of industrial robots in Bangalore) one day. He listened to us patiently and observed that our machine would need very powerful motors in order to lift heavy vessels up against the force of gravity.

This is a common mistake that firms developing cooking robots make. Going against gravity is never a good idea when you have to carry kilograms of ingredients in a vessel. The motors end up having to be big and powerful and friction becomes a very big hindrance to smooth movement.

This was the mistake that Moley (and the Chinese automatic wok) had made. Moley’s motors and construction make it so expensive that it costs around $100,000 a piece. We figured we’d need to build something that cost around $300 to be able to sell it to home users.

So, we went back to the drawing board.

Our Second Design

Last year (in 2016), we developed a machine that looked a bit like this one at MIT:

… and this one …

In our second design, the cooking vessels moved on a linear rail and ingredients dropped into them from above much like they do in the machines above.

We did not have a grinder and could not transfer ingredients from one container to another at will in our design.

But as we analysed Indian cooking recipes further, we realized that the vast majority of Indian dishes did not need any transferring of ingredients across vessels, and even if they did, we could stop the cooking at a certain point and ask a human to do the transfer without greatly impacting the automatic cooking experience.

We could also get by without grinding for many dishes if we used specially made powders in our cooking machine.

It was with this design that I went to my old friend Prashanth G. B. N., a mechanical engineer and designer of hardware systems, for his advice.

He took a look at the drawings and felt we would have to make the machine even simpler if it had to be used in Indian homes.

Our Third Design

He took a sheet of paper and sketched on it a design that had only rotary movements.

“Rotary motion is far simpler to build out than linear motion” he explained.

After developing a detailed drawing of the machine, Prashanth and I took out a patent on the same.

From Design to Reality

The Mechanical Chef however only turned into reality when Arpit Sharma, an aerospace engineer from Rajasthan joined me and solved all the mechanical problems that remained.

We had to solve the problem of dispensing powders from a removable and easy-to-wash and easy-to-assemble container.

We had to find ways to completely dissociate the parts that came into contact with the food from the parts that came into contact with the electricals so that the former would be easy to clean.

We had to minimize the number of parts that needed to be cleaned after every cook.

We needed to finesse the electronics and electricals – a job which was often summarily dropped into the lap of a young engineer from Mysore – Avinash Bharadwaj.

To support ourselves as we worked on all these problems and to pay for the hardware, we applied for a grant from the Department of Science and Technology of the Government of India through IIT-Bombay. In October of this year, we received the grant.

The first prototype that Arpit Sharma built looked like this.

And it worked!

The Proof of the Pudding is in the Eating

Here’s a video of the machine at work. And you’re welcome to stop by and be one of the first humans on earth to eat Indian food cooked by a robot.

Here’s how it stirs.

Here’s our mini funding pitch!

Here’s our team at the office with the Mechanical Chef.

There’s a little website for it: http://www.mechanicalchef.com

POST PUBLICATION ADDITION

The prototype that we demonstrated in November 2017 suffered from a few flaws which made it unreliable and unsuitable for user trials. One of the problems was that the quantities of ingredients it would add to a dish could not be precisely controlled. So you could never be sure that the dishes would taste good when they were done. Another problem was that the steam from the cooking could make the spices congeal and jam the works. It took us till April 25th to solve all the sticky issues and get the machine ready for user taste trials. One April 25th, 26th and 27th, the Mechanical Chef cooked entire dishes completely unassisted and with repeatable (and delicious) results.

Here’s a full end-to-end recording of one of the cooking runs (the one of April 26th).

Write to us if you’d like to drop by and see us.

Mechanical Consciousness

Mankind has attempted for a long time to explain consciousness, one’s awareness of one’s own existence, of the world we live in, and of the passage of time. And mankind has further believed for a very long time that consciousness extends beyond death and the destruction of the body.

Most explanations of consciousness have tended to rely on religion, and on philosophical strains associated with religion. Possibly as a result, there has been a tendency to explain consciousness as being caused by a “soul” which lives on after death and in most traditions gets judged for its actions and beliefs during its time of residence in the body.

In this article, it is proposed that consciousness can have a purely mechanical origin.

The proposal is merely conjecture, but observations that support the conjecture (though they do not prove it) and I hope, render the conjecture plausible, are provided. The explanatory power of the model is also somewhat explored.

It is also proposed that the working of the human mind is similar to that of many machine learning models in that they share certain limitations.

Preliminaries

First, let me define consciousness. Consciousness of something is the knowledge of the presence or existence of something (of time or of our selves or of the world around us).

I argue that consciousness requires at the very least what we call “awareness” (that is, being able to sense directly or indirectly what one is conscious of).

Claim: If I were not aware of something, I wouldn’t be conscious of it.

Argument: If all humanity lived underground for all time and never saw the sky, we would not be aware of the existence of the sky either by direct experience or by hearsay. So, we couldn’t be conscious of it. So, it is only when we are aware of the existence of something that we are conscious of it.

So, we have established a minimum requirement for consciousness – and that is “awareness” (being able to sense it).

But does consciousness require anything more than awareness?

The ability to reason and to predict behavior are things the human mind is capable of.

But are they required for consciousness?

Claim: Reasoning is not required for consciousness.

Argument: I argue that reasoning is not required because one cannot reason about something that one is not aware of the existence or presence of. So, anything that one reasons about is something that one has registered the presence of in some manner, in other words, that one is conscious of.

Claim: Prediction of the behavior of something is not required for consciousness.

Argument: Prediction of the future behaviour of a thing is not possible without observation over time of how that thing behaves. So observation (and consciousness) precedes prediction.

Yann LeCun argues that “common sense” is the ability to predict how something might behave in the future (if its future state is not completely random). If we accept that definition, we might say that common sense builds on consciousness, not the other way around.

So, it appears that consciousness (knowledge of the existence of something) requires the bare minimum of awareness through the senses, and does not require reasoning or the ability to predict.

Development

The next question to consider is whether awareness constitutes consciousness or if there is more to it.

Claim: There is more to consciousness than the signals that our senses send to the brain (awareness).

Argument: The signals sent to the brain are analogous to signals that are present in completely inanimate things. A camera has a sensor that records images of the outside world. Even a pin-hole camera senses the outside world upon the wall on which the image of the sensed world is cast. Even a shadow can be considered to be a “sensing” of the object that casts the shadow. That does not imply consciousness. There must be something else in animate “living” things that produces consciousness.

What is that something extra that is over and above what our senses record?

I believe that the extra thing that constitutes consciousness is the ability to create a model of what we sense and remember it (keep it in memory).

By “create a model”, I mean store a representation of what is sensed in some kind of memory so that what is sensed can be reproduced in some medium possibly at a later stage.

The model cannot be reproduced if it is not stored and remembered, so memory is also key to consciousness.

So, consciousness is the creation of a model in memory of what is sensed.

In other words, anything that can sense something in the world and actively create a model of what it senses (be able to reproduce it exactly or inexactly) is conscious.

I will attempt to justify this claim later.

Elaboration

So, the claim is that anything – even if it is a machine – that can actively create a model of something that it senses (is aware of) and store it in memory in such a way as to permit retrieval of the model, is conscious of it.

I am not saying that conscious beings are conscious of every aspect of what they sense as soon as they sense it. It can be possible that they sense and temporarily store a lot of things (for humans, for example, that could be every pixel of what we see outside the blind spot) but only model in a more abstract form and store in memory as an abstraction (and in a retrievable form) those parts that they pay attention to.

So it is possible that a conscious being may be conscious of the pixels of a bird outside the window but not conscious of it as a bird (model it in a more abstract form) or of its colour (model its properties) unless the conscious being pays attention to it.

For example, let us say we’re talking of a human. Let’s say further that the human sees a mountain.

The human senses (sees) the mountain when rays of light scattered by the surface of the mountain or from things upon the mountain enter her or his eye and impinge upon the retina, triggering a chain of chemical reactions that lead to electrical potentials building up that act upon the nerves in the retinal cortex.

Subsequently, the neurons in the optical pathway of the human’s brain fire in such a manner that eventually, various parameters of the mountain come to be represented in the pattern of neural activations in the human’s brain.

We know that the human has modeled the mountain because the human can be asked to draw the mountain on a sheet of paper and will be able to do so.

Now, the human can be conscious of various parameters of the mountain as well. For example, if the predominant colour of the mountain is represented in those neural activations, then the human is conscious of the predominant colour of the mountain. For instance, if the human can answer, accurately or inaccurately, a question about the colour of the mountain, the human can be said to have modeled the same.

If the height of the mountain is represented in the neural patterns, then the human is conscious of the height of the mountain. This can be tested by asking the human to state the height of the mountain.

If the shape of the mountain is vaguely capture in the neural activations so that the human identifies the same with that of a typical mountain, then the human is conscious of the mountain’s shape and that it is a mountain.

This ability to model is not present in what we typically consider an inanimate object. A pin-hole camera would not actively create a model of what it senses (projects onto the wall) and is therefore not conscious. Its projection is purely a result of physical phenomena external to it and it has no agency in the creation of the image within it. So it has no consciousness.

Let’s say we use a digital camera which records the pixels of let’s say a mountain before it. It can reproduce the mountain pixel by pixel, and so can be said to have a model in its memory of the mountain. In other words, such a camera is conscious of the pixels of the mountain and everything else in the field of view. It wouldn’t be conscious of the shapes or sizes or colours or even of the presence of a mountain in the sense that a human would.

Claim: Consciousness requires the active acquisition and storage of information from what is sensed.

Argument: If the “model” is just the result of physical phenomena, say a projected image in a pin-hole camera, then there is no information acquired and stored by the system from what is sensed, and hence no consciousness.

Now, supposing that we were to build a machine of sand that created a representation of the mountain in sand and of the height and colour of the mountain and of the shape of the mountain and of the association of this shape with typical mountain shapes and of every other parameter that the human brain models.

Now, I would argue that this sand machine could be said to be conscious of the mountain in the same way as we are, even though it uses a completely different mechanism to create a model of the mountain.

Claim: The hypothetical sand machine and a human brain are equivalent

Argument: Consciousness of something is only dependent on what is modeled, and no on the method of modeling. So, as long as the parameters of the mountain are modeled in exactly the same way in two systems, they can be said to be conscious of it in the same way.

Corollary

We are machines.

All right, so that’s a claim as well.

Here are two arguments in support of the claim.

a) Our behaviour in some sensory tasks is similar to that we would expect from machine learning tools called classifiers.

The Himba colour experiment discovered that the Himba tribe of Africa were distinguishing colours differently from the rest of the world. They could not distinguish between blue and green but could distinguish between many shades of green which other humans typically had a hard time telling apart.
People who speak languages that do not have vowel tones have trouble hearing differences in tone. Similarly, people who speak languages where the consonants ‘l’ and ‘r’ are conflated cannot easily tell them apart.

This is typically how a machine learning tool called a classifier behaves. A classifier needs to be trained on labelled sounds or colours and will learn to recognize only those, and will have a hard time telling other sounds or colours apart.

b) The limitations that our brains reveal when challenged to perform some generative tasks (tasks of imagination) are identical to the limitations that the machine learning tools called classifiers exhibit.

Let me try the experiment on you. Here’s a test of your imagination. Imagine a colour that you have never seen before.

Not a mixture of colours, mind you, but a colour that you have never ever seen before.

If you are like most people, you’ll draw a blank.

And that is what a classifier would do too.

So, I would say that the human brain models things like colours or phonemes using some kind of classification algorithm, because it displays the limitations that such algorithms do.

So it is possible that we shall be able to discover by similar experiments on different types of human cognitive functions, that humans are merely machines capable of consciousness (of modeling a certain set of parameters related to what we perceive) and other cognitive functions that define us as human.

Further Discussion

People with whom I’ve discussed this sometime ask me if considering consciousness as the process of building a model of something adequately explains feelings, emotions, likes and dislikes and love and longing.

My answer is that it does, at least as far as likes and dislikes go.

A liking of something is a parameter associated with that thing and it is a single-value parameter that can be easily modeled by one or more numbers.

Neural networks can easily represent such numbers (regression models) and so can model likes and dislikes.

As for love and longing, these could result from biological processes and genetic inclinations, but as long as they are experienced, they would have had to be modeled in the human mind, possibly represented by a single number (a single point representation of intensity) or a distributed representation of intensity. What is felt in these cases would also be modeled as an intensity (represented at a point or in a distributed manner). One would be conscious of a feeling only when one could sense it and model it. And the proof that one has modeled it lies in the fact that one can describe it.

So, when the person becomes conscious of the longing, it is because it has been modeled in their brain.

Still Further Discussion

Again, someone asked if machines could ever possibly be capable of truth and kindness.

I suppose the assumption is that only humans are capable of noble qualities such as truth and kindness or that there is something innate in humans which gives rise to such qualities (perhaps gifted to humanity or instilled in them by the divine or the supernatural or earned by souls that attain humanity through the refinement of past lives).

However, there is no need to resort to such theories to explain altruistic qualities such as truthfulness, goodness and kindness. It is possible to show game theoretically that noble qualities such as trustworthiness would emerge in groups competing in a typical modern economic environment involving a specialization of skills, interdependence and trading.

Essentially the groups that demonstrate less honesty and trustworthiness fail to be competitive against groups that demonstrate higher honesty and trustworthiness and therefore are either displaced by the latter or adopt the qualities that made the latter successful. So, it is possible to show that the morals taught by religions and noble cultural norms can all be evolved by any group of competing agents.

So, truth and kindness are not necessarily qualities that machines would be incapable of (towards each other). In fact, these would be qualities they would evolve if they were interdependent and had to trade with each other and organize and collaborate much as we do.

Related Work

This is a different definition than the definition used by Max Tegmark in his book “Life 3.0” but his definition of “consciousness” as “subjective experience” confuses it with “sentience” (the ability to feel).

Tegmark also talks about the work of the philosophers David Chalmers and Scott Aaronson, who seem to be approaching the question from the direction of physics – as in we are just particles from food and the atmosphere rearranged, so what arrangement of particles causes consciousness?

I think that is irrelevant.

All we need to ask is “What is the physical system, whatever it is made of, capable of modeling?”

Interestingly, in the book, Tegmark talks about a number of experiences that any theory of consciousness should explain.

Let’s look at some of those.

Explanatory Power of this Model

Explaining Abstraction

He talks about how tasks move from the conscious to the unconscious level as we practise them and get good at them.

He points out that when a human reads this, you do not read character by character but word by word. Why is it that as you improve your reading skills, you are no longer conscious of the letters?

Actually, this can be explained by the theory we just put forth.

When we are learning to read (modeling the text is reading), we learn to model characters when we see a passage of text like this one and read character by character.

But with practice, we learn to model words or phrases at a higher level from passages of text, and direct our attention to the words or phrases because that facilitates reading.

We can chose to direct our attention to the letters and read letter by letter as well, if we so choose.

So, this model can explain attention too.

Attention

The brain is limited in its capacity to process and store information, so the human brain focuses its attention on the parts of the model it has built that are required for the performance of any task.

It can chose to not keep in memory more granular parts of the model once it has built a larger model. For instance it can choose to not keep in memory the characters if it already has modeled the word.

This also explains phenomena such as “hemineglect” (patients with certain lesions in their brain miss half their field of vision but are not aware of it – so they may not eat food in the left half of their plate since they do not notice it).

We can explain it by saying that the brain has modeled a whole plate from the faulty sensory information provided to it and therefore the user is conscious of a whole plate, but minus the missing information.

Blindsight

Tegmark also talks of the work of Christof Koch and Francis Krick on the “neural correlates of consciousness”.

Koch and Krick performed an experiment where they distracted one eye with flashing images and caused the other eye to miss registering a static image presented to it.

They inferred from this that the retina is not capable of consciousness.

I would counter that by saying that the retina is conscious of the pixels of the images it sees if it constructs models of them (as it does) and stores them.

But if the brain models more abstract properties more useful to the tasks we perform, we focus our attention on those and therefore do not store in the memory the images that are not relevant to the more critical task (the distracting task).

So, I would argue that our consciousness can include models that comes from the retina (if some neural pathway from the retina creates models in memory at the pixel level).

But if our attention decides to focus on and consign to memory better things than what the retina models, it will, and then it will not necessarily model and be conscious of pixels from the retina.

Still Other work

Tegmark also talks extensively about the work of Giulio Tononi and his collaborators on something called “integrated information” and the objections to it by Murray Shanahan, but I’ll leave those interested in those theories to refer the work of their authors.

I also examine Graziano’s Attention Schema Theory of consciousness in another post https://aiaioo.wordpress.com/2017/12/18/mechanical-consciousness-and-attention/

The Vanishing Information Problem – Why we switched to deep learning with neural networks

It’s been a year since my last post. My last post was about deep (multi-layer) Bayesian classifiers capable of learning non-linear decision boundaries.

Since then, I’ve put on hold the work I was doing on deep (multi-layer) Bayesian classifiers and instead been working on deep learning using neural networks.

The reason for this was simple: our last paper revealed a limitation of deep directed graphical models that deep neural networks did not share, which allowed the latter to be of much greater depth (or to remember way more information) than the former.

The limitation turned out to be in the very equation that allowed us (read our last paper on deep (multi-layer) Bayesian classifiers for an explanation of the mathematics) to introduce non-linearity into deep Bayesian networks:

Sum of Products

The equation contains a product of feature probabilities P(f|h,c) [the part inside the big brackets in the above equation].

This product yields extreme (uncalibrated) probabilities and we had observed that those extreme probabilities were essential to the formation of non-linear decision boundaries in the deep Bayesian classifiers we’d explored in the paper. The extremeness allowed the nearest cluster to a data point to have a greater say in the classification than all the other clusters.

We had found that when using this equation, there was no need to explicitly add non-linearities between the layers, because the above product itself gave rise to non-linear decision boundaries.

However, because of the extremeness of the product of P(f|h,c), the probability P(h|F) (the probability of a hidden node given the features) becomes a one-hot vector.

Thus a dense input vector (f) becomes transformed into a one hot vector (h), in just one layer.

Once we have a one-hot vector, we don’t gain much from the addition of more layers of neurons (which is also why you shouldn’t use the softmax activation function in intermediate layers of deep neural networks).

This is because one-hot encodings encode very little information.

There’s an explanation of this weakness of one-hot encodings in the following lecture by Hinton comparing RNNs and HMMs.

Hinton points out there that an RNN with its dense representation can encode exponentially more information than a finite state automaton (that is, an HMM) with its one-hot representation of information.

I call this tendency of deep Bayesian models to reduce dense representations of information to one-hot representations the vanishing information problem.

Since the one-hot representation is a result of overconfidence (a kind of poor calibration), it can be said that the vanishing information problem exists in any system that suffers from overconfidence.

Since Bayesian systems suffer from the overconfidence problem, they don’t scale up to lots of layers.

(We are not sure whether the overconfidence problem is an artifact of the training method that we used, namely expectation maximization, or of the formalism of directed graphical models themselves).

What our equations told us though was that the vanishing information problem was inescapable for deep Bayesian classification models trained using EM.

As a result, they would never be able to grow as deep as deep neural networks.

And that is the main reason why we switched to using deep neural networks in both our research and our consulting work at Aiaioo Labs.

Deep Bayesian Learning for NLP

Deep learning is usually associated with neural networks.

In this article, we show that generative classifiers are also capable of deep learning.

What is deep learning?

Deep learning is a method of machine learning involving the use of multiple processing layers to learn non-linear functions or boundaries.

What are generative classifiers?

Generative classifiers use the Bayes rule to invert probabilities of the features F given a class c into a prediction of the class c given the features F.

The class predicted by the classifier is the one yielding the highest P(c|F).

A commonly used generative classifier is the Naive Bayes classifier. It has two layers (one for the features F and one for the classes C).

Deep learning using generative classifiers

The first thing you need for deep learning is a hidden layer. So you add one more layer H between the C and F layers to get a Hierarchical Bayesian classifier (HBC).

Now, you can compute P(c|F) in a HBC in two ways:

The first equation computes P(c|F) using a product of sums (POS). The second equation computes P(c|F) using a sum of products (SOP).

POS Equation

We discovered something very interesting about these two equations.

It turns out that if you use the first equation, the HBC reduces to a Naive Bayes classifier. Such an HBC can only learn linear (or quadratic) decision boundaries.

Consider the discrete XOR-like function shown in Figure 1.

hbc_figure_1

There is no way to separate the black dots from the white dots using one straight line.

Such a pattern can only be classified 100% correctly by a non-linear classifier.

If you train a multinomial Naive Bayes classifier on the data in Figure 1, you get the decision boundary seen in Figure 2a.

Note that the dotted area represents the class 1 and the clear area represents the class 0.

Multinomial NB Classifier Decision Boundary — Figure 2a: The decision boundary of a multinomial NB classifier (or a POS HBC).

It can be seen that no matter what the angle of the line is, at least one point of the four will be misclassified.

In this instance, it is the point at {5, 1} that is misclassified as 0 (since the clear area represents the class 0).

You get the same result if you use a POS HBC.

SOP Equation

Our research showed us that something amazing happens if you use the second equation.

With the “sum of products” equation, the HBC becomes capable of deep learning.

SOP + Multinomial Distribution

The decision boundary learnt by a multinomial non-linear HBC (one that computes the posterior using a sum of products of the hidden-node conditional feature probabilities) is shown in Figure 2b.

Decision boundary of a SOP HBC. — Figure 2b: Decision boundary learnt by a multinomial SOP HBC.

The boundary consists of two straight lines passing through the origin. They are angled in such a way that they separate the data points into the two required categories.

All four points are classified correctly since the points at {1, 1} and {5, 5} fall in the clear conical region which represents a classification of 0 whereas the other two points fall in the dotted region representing class 1.

Therefore, the multinomial non-linear hierarchical Bayes classifier can learn the non-linear function of Figure 1.

Gaussian Distribution

The decision boundary learnt by a Gaussian nonlinear HBC is shown in Figure 2c.

Decision Boundary of a Gaussian SOP HBC. — Figure 2c: Decision boundary learnt by a SOP HBC based on the Gaussian probability distribution.

The boundary consists of two quadratic curves separating the data points into the required categories.

Therefore, the Gaussian non-linear HBC can also learn the non-linear function depicted in Figure 1.

Conclusion

Since SOP HBCs are multilayered (with a layer of hidden nodes), and can learn non-linear decision boundaries, they can therefore be said to be capable of deep learning.

Applications to NLP

It turns out that the multinomial SOP HBC can outperform a number of linear classifiers at certain tasks. For more information, read our paper.

Visit Aiaioo Labs

Decongesting Bangalore’s Roads: An Analysis of the BDA’s Detailed Project Report on the Proposed Flyover

In response to public demand, the Bangalore Development Authority (BDA) has finally released the Detailed Project Report (DPR) for the proposed steel flyover in Bangalore.

In this article we’ll attempt to address (using points from the DPR), the key question of whether the flyover will be of any use to commuters.

The DPR contains a study of traffic volumes at each of the intersections, and summarizes its findings in the following three diagrams (page 58) which are very easy to understand.

dpr_530 dpr_531 dpr_532

The Numbers

The numbers in the diagrams are peak hour traffic numbers at all junctions affected by the project in terms of PCUs/hr. PCU stands for Passenger Car Unit.

So, the first diagram says for example, that at peak hour traffic, there is the equivalent of 6175 cars entering from Hebbal and 6949 cars exiting at Rajbhavan Road in one hour.

The second diagram shows the impact of the flyover on ground-level traffic. It shows that the number of cars entering at ground level from Hebbal will drop to 3088 while the number of cars exiting at Rajbhavan Road will drop to 4122.

The excess traffic (3087 incoming at Hebbal and 1393 outgoing at Rajbhavan Road) will be carried on the flyover.

The incoming numbers add up (3087 + 3088 = 6175) as expected.

The outgoing numbers don’t add up (1393 + 4122 < 6949)!

The outgoing traffic numbers don't add up because there will no longer be a right turn at Basaveshwara Circle. So, a part of the traffic volume decrease has nothing to do with the flyover!

Anyways, these calculations, if assumed correct, point to a reduction in traffic by 20% to 50% at the ground level.

Travel Time Calculations

Will the reduction in traffic lead to a corresponding decrease in travel time?

Not necessarily.

The DPR contains no estimate of reductions in travel time.

There are three reasons for doubting there will be huge reductions in overall travel time:

The impact of the constriction of the road leading to and from the flyover because of the ramps of the flyover needs to be taken into account. The bottlenecks at the ramps could lead to traffic piling up at the entrances and the exists of the flyover.
If the total capacity of the roads carrying traffic away from the flyover is too low, it could lead to traffic queueing up on the flyover itself.

Those who prepared the DPR should have run a simulation of the traffic on the flyover, below it and on the roads leading into and out of the flyover to determine if any savings in travel time would result or if serious backups on the flyover and around it could cancel any benefits.

A case in point is the flyover from the Electronic City software technology park (STP) to the Silk Board. It might allow traffic to move fast on it, but it might be slowing down traffic inside the STP and on the road below it at its exit.

Flyover Effectiveness Conclusions

We don’t know if the travel time will decrease significantly unless the required simulations are done.

Public Transportation Conclusions

However, it is possible from the DPR to draw conclusions about the effectiveness of public transportation.

We see from the study that only 2% to 3.5% of the vehicles on the roads are city buses (pages 42-46).

So, if we doubled Bangalore’s bus fleet (which you can for the cost of the flyover) it would not increase the traffic on these roads by much but could replace almost all the private vehicle traffic not just on this stretch but all over the city (assuming people make the switch from private to public transport, and each bus carries 100 passengers).

Explanation & Calculations

Here’s how we can calculate that.

The proposed flyover will cost approximately 1800 crore rupees.

A TATA bus costs about 20 lakhs rupees.

So, you can buy about 7000 buses (and hire drivers/conductors for a year and build facilities for them) for 1800 crore rupees (or get 9000 buses without drivers/conductors or facilities). I’m assuming that a quarter of the price of a bus will get you facilities and pay for the driver and conductor.

Now the BMTC runs around 7000 buses today. So, for the price of the flyover, one could double the bus fleet.

We can show that doubling the fleet can drastically decrease the volume of traffic on these roads.

Let’s say that 10,000 vehicles used that stretch of the road. We know that 2% of those vehicles were buses. That’s 200 buses. Let’s assume that all the other vehicles carry on average 2 passengers. So, except for the 200 buses, the remaining 10,000 – 200 vehicles carry 20,000 – 400 passengers. If we double the buses, we have 200 more buses. Now each bus can easily carry about 100 passengers (50 seated and 50 standing). So, that means we can carry 20,000 passengers just by adding 200 buses (doubling the bus count). That is the entire carrying capacity of all the other vehicles on the road!

So with a negligible increase in traffic (from 2% of current traffic to 4%), we can accommodate all the passengers of the remainder of the traffic using that stretch of Bellary Road today.

More Benefits of Public Transport

But it’s not just that!

For the cost of the flyover, we’d have doubled buses all over Bangalore!

So, we’d have added an equivalent carrying capacity to all the private vehicles on all the roads in Bangalore for the cost of this flyover!

That’s what this BDA DPR tells us!

Environment Benefits of Public Transport

But that’s not all!

There are still more benefits! Think of the reduction in pollution. Replacing all those private vehicles with equivalent buses would reduce pollution by 95%.

Assumption

In the above calculations, I’ve assumed that everyone will give up private transport for public transport.

That won’t happen in real life unless you get the same convenience from public transport.

It could happen if, like with the metro, bus passengers are:

assured of getting a bus from a known key location to a desired key destination every ten minutes or at a known precise time (with bus tracking) and
assured the buses are not overcrowded (the pleasantness of the travel is comparable to the pleasantness of private transport).

If you can get that sort of predictability, and comfort, then for those travelling on those routes to work, it would make more sense to use public transport than to use private transport.

So, it may need a lot more than doubling buses (mere capacity matching) to assure convenience and ensure that people prefer public to private transport. It would also need route planning, bus tracking and highly predictable key routes.

Other Proposals

There are many other proposals for reducing congestion along the North South Bellary Road.

Here’s one: http://www.deccanherald.com/content/561722/rail-link-kia-less-rs.html

This article says that there is an operational railway line between Yalahanka and Channasandra, and this can be extended easily till the airport up North and Baiyappanahalli in the East, taking airport traffic off the Bellary Road.

Estimated cost: Rs. 150 crores. And it’s a public transport proposal, so it takes a lot of cars off the roads.

Visit Aiaioo Labs

Of Barriers and Free Trade

Many economists insist that free trade benefits everyone. The argument goes like this:

Model 1

FreeMarketScenario2

Let’s say there are two countries, Country 1 and Country 2, each with a bread factory and two citizens who buy bread from their factory.

There is a trade barrier in place preventing the factories from selling to the other country.

Each factory makes a profit of $2 on each loaf of bread.

The factory in Country 1 earns $4 from its sales. The people spend $40 on bread.

The factory in Country 2 also earns $4 from its sales of bread. It’s citizens only spend $20.

So, the net expenditure of money in Country 1 is $36, and in Country 2 $16.

Now, let’s see what happens if the trade barrier is removed.

Everybody now buys from the factory in Country 2.

So the factory in Country 1 earns nothing (it sells nothing) but its citizens spend only $20.

The factory in Country 2 earns $8 from its sales of bread to four citizens now. The citizens of country 2 spend the usual $20.

So, the net expenditure of money in Country 1 is now only $20 and in Country 2 only $12.

Economists argue that through this mechanism, the free market benefits all trading parties.

But it turns out that not everybody wins if you consider a different model – one that includes businesses shutting down and unemployment.

Model 2

When there are trade barriers, everyone works for the factories in their own countries.

FreeMarketScenario3

With the barriers in place, the factory in Country 1 pays a salary of $20 to its two employees.

The factory in Country 2, being more efficient, pays a salary of $40 to its employees.

Now the net earnings in Country 1 (earnings – expenditure) are $4 (because the earnings cancel out the expenditure of the citizens, and the factory earns $4). The net earnings in Country 2 are now $64 (because the citizens spend only $10 each while earning $40, and the factory earns $4).

Now let’s say free trade is introduced.

The bread factory in Country 2 starts selling its bread at $10 in Country 1.

The bread factory in Country 1 shuts down (because no one will buy its bread at $20).

The citizens of Country 1 who worked for the bread factory no longer have an income. They still have to buy bread to survive (using their savings) but their expenditure is lower not at $20.

So now Country 1 ends up with a net earnings of $ -20 (a drop of $24).

The net earnings in Country 2 go up to $68 (and increase of $4).

So in this model, not only does the earnings of Country 1 go down but also the earnings of the world as a whole (because there are fewer people gainfully employed).

However, things don’t stop there.

Model 3

The bread factory realizes that by hiring the citizens of Country 1 to work for it instead of citizens of Country 2, it can cut its cost of production in half (because salaries in Country 1 are half that of Country 2). By doing so, it can start making profits of $6 per loaf of bread.

So now you have the citizens of Country 2 facing unemployment (no earnings).

So the net earnings of Country 2 drop to $4 (a drop of $64).

The earnings of Country 1 on the other hand go up to $20 (a gain of $40).

But there is again a drop in the earnings of the world, because though the same number of people are employed, it is higher-paid people who are now out of a job.

This is not compatible with a model of free trade that continually improves things for everybody.

Of course, I’ve used a very simple model, but it reflects the truth that when free trade is introduced, there are winners and losers and that the weaker parties are the losers.

There seems to be a way to predict who the winners and losers in any attempt to introduce free trade will be.

Force Model

In models of military engagements, the principle of force is used to predict the outcome. Given roughly equal equipment and training, the stronger force almost always wins.

The reason for that is shown in the following diagram. Let’s say there is a red force with 4 soldiers and a green force with 8 soldiers. Each side fires a shot every minute. Each soldier aims at an enemy soldier and fires, with a 50% probability of hitting his target. After the first volley, the red force would have fired 4 shots and hit 2 green soldiers. But the green force would have fired 8 shots and hit 4 red soldiers.

So the losses would be disproportionately higher for the weaker party, and the green force would win the battle with the loss of two of their own.

ForceModel1

It appears the same model can be used to predict the winners in economics too.

In economic models, the contending forces would be firms or employees.

Larger and better equipped firms (which can produce cheaper products) can be expected to win (push small firms or more expensive employees in direct competition out of business) in any market where free trade is introduced.

For example, before India traded freely with the USA, there were many local soft drink manufacturers (Torino, Thums Up and Limca) in India. When Coca Cola and Pepsi entered the market, all the Indian soft drink brands had to sell out.

Similarly cheaper workforces (which cost less to hire) can be expected to win.

An example would be the use of immigrant labour in many countries (especially the USA) for agriculture.

So, how can a weaker economic segment (be they workers or industry) be protected?

Through barriers (barriers allowed by trade agreements). Barriers (something that a force can shelter behind) may be used to protect weaker forces of any kind.

Here are a few barriers used in economics. As you can see, barriers to trade are already in heavy use though people talk as if free trade is widely practised.

Barrier 1

The US workforce in the computer industry being more expensive than an Indian workforce, has been protected by visa and fee barriers.

There are visa quotas and a fee levied on Indian firms – to the tune of $5000 per visa for H-1B and L-1 visas). The fee costs the Indian software industry $1 billion to $1.5 billion annually.

Not palatable to Indian industry of course, but effective in protecting high-end jobs in the USA.

Now what I don’t understand is why the USA uses barriers to protect jobs for skilled people who are highly qualified and quite capable of acquiring more skills and competing on quality.

I would suppose that if the USA used barriers, they should probably do so to protect jobs at the low end of the economy (helping out poorly skilled or unskilled workers who’d find it harder to acquire more skills or compete on quality) as India does (through a minimum salary requirement for work visas, which is aimed squarely at keeping out cheaper Chinese labour from infrastructure [say road building] projects).

Barrier 2

During the colonial era, barriers to the import of Indian products were used to protect a nascent textile industry and jobs in England.

I quote from an 1840 English parliamentary inquiry about India (taken from the above article):

Before a British Parliamentary Committee in 1840] Montgomery Martin stated that he . . . was convinced that an outrage had been committed ‘by reason of the outcry for free trade on the part of England without permitting India a free trade herself.’ After supplying statistical data of Indian textile exports to Great Britain, he pointed out that between 1815–1832 prohibitive duties ranging from 10 to 20, 30, 50, 100 and 1,000 per cent were levied on articles from India. … ‘Had this not been the case,’ wrote Horace Wilson in his 1826 History of British India, ‘the mills of Paisley and Manchester would have been stopped in their outset, and could scarcely have been again set in motion, even by the power of steam. They were created by the sacrifice of Indian manufacture. Had India been independent, she could have retaliated, would have imposed prohibitive duties on British goods and thus have preserved her own productive industry from annihilation. This act of self-defence was not permitted her’” (Clairmonte 1960: 86-87).

Barrier 3

Another example can be seen in the policy of subsidizing the renewable energy industry in the USA and buying renewable energy locally.

The use of models to determine winners, as described above, can empower governments to decide which sectors of the economy are most vulnerable to competition (so action may be taken to protect them).

Many protective actions can be taken without violating free trade obligations.

Fair Barrier 1

The best example of a fair protection policy is progressive liberalization.

For example, when the Chinese government introduced foreign direct investment (FDI) into the retail sector in China, it did so gradually, increasing the percentage of ownership permitted to foreign owners over 15 years.

In the course of the 15 years, local firms learnt the tricks of the retail trade and were able to compete effectively against the new entrants.

In contrast, the Indian government went from discouraging the participation of Indian private firms in the railways and defense industries, to allowing 100% FDI in one shot, thus failing to give Indian firms the time to develop the capabilities or technical know-how to compete in these markets, with the result that bullet train and metro rail equipment needs to be imported in entirety (or imported with a ‘made in India’ veneer – assembled in Indian factories entirely owned and operated by French firms).

Fair Barrier 2

Another policy that could protect and benefit businesses in certain sectors is grants for nascent industries with vast future potential. In the USA, there are small business grants, research grants and subsidies for small firms.

In contrast, the government of India seems to have failed to create a level playing field for local startups. I recall many years ago (at the height of the internet boom) when the government of India had a small business grants program, it had a clause specifically leaving out Indian software startups. Somehow the government of India had decided that they would not make research grants available to local startups in the highest-growth industry in India at the time.

Here’s more on how the Indian government seems to not be as accessible to local startups as it is to larger global startups/firms competing with them. This is a problem that needs to be fixed.

It’s easier for a foreign startup or MNC to get forgiveness for inadvertent violation than it is for a domestic startup (today foreign startups enjoy better access to political leaders and therefore easier forgiveness).

Uber got 1.5 years of forgiveness on payments that Ola did not get, and Amazon got forgiveness for having its own supply centres, and not following the marketplace model, which Flipkart did not get. Flipkart had to change its model to comply with the marketplace model. Both Ola and FK were hurt by this favoritism to Uber and Amazon.

Fair Barrier 3

Another method is to strengthen the weakest parts of the workforce in the face of competition from migrant/immigrant labour.

One policy that could help protect and improve the lot of weaker parts of the US workforce might be higher taxation, and using the tax money to:

Sponsor education programs (financial-need-based free college/training) to facilitate reemployment in growing industry segments.
Revive industries that could employ that part of the workforce in larger numbers (the manufacturing industry). One way to revive manufacturing that might be to create more outlets for US made goods and fund automation of manufacturing.
Visa barriers (Indian work visas require a minimum salary that effectively protects less skilled workers).

Fair Barrier 4

Using regulatory and tax barriers to FDI (as is done in China) to generate scalable revenue for local businesses.

Conclusion

In conclusion, I wanted to point out that if governments could start using models such as the one above to calculate who might win in various scenarios involving free trade (instead of assuming that everyone wins), they might be able to formulate better and more equitable economic policies for their citizens.

The Fine Balance of FDI

Algodones_sand-dune-fence — Courtesy of the Wikipedia

Some governments, such as those of New Zealand and China have opposed FDI (Foreign Direct Investment) in various forms.

We examine in this article what their reasons might be.

Do some countries reject FDI?

It appears that some do.

The Case of the USA

In the USA, there are no serious legal barriers to FDI, but there can be disapproval of and concerted attempts to block FDI.

In this article on factories set up by foreign investors in the USA, there is a discussion of opposition to Japanese manufacturers in the automobile sector decades ago. I quote:

After Honda Motors opened the first Japanese auto plant in the U.S., in Marysville, Ohio, in the early nineteen-eighties, followed by an engine factory in nearby Anna, Ohio, the company faced an onslaught of vicious anti-Japanese ads on TV and in print, often supported by American manufacturing trade and labor groups.

The comparably subdued response to Chinese manufacturers speaks, on one hand, to changing circumstances, especially the broad acceptance of globalization in the United States and the desire, on the part of some politicians and business leaders, to create manufacturing jobs by whatever means necessary. But it also follows from a conclusion that American companies have reached about their Chinese counterparts: namely, that they are, thus far, relatively inconsequential rivals.

The Case of New Zealand

In October 2015, the government of New Zealand blocked the Chinese firm Pengxin’s attempts to buy (through a local subsidiary) a farm called Lochinver.

The government explained their decision as follows:

Land information minister Louise Upston told the BBC that its decision in September to block Pengxin’s purchase of Lochinver farm does not mean the country is not interested in attracting foreign investment.

“It’s [foreign investment’s] an important part of our economic strategy, but equally when we do have an application for sensitive land and assets – we will put it across the 21 criteria that we need to assess and make a decision based on that,” she says.

“We weren’t convinced that this particular application met that threshold, which is substantial and identifiable benefits for New Zealanders.”

Dr William Rolleston, president of the Federated Farmers of New Zealand, a group that lobbies on behalf of its farmer members, says some farmers are concerned about the scale of the purchases.

“New Zealanders don’t have an issue on ownership at a low level. No one would be concerned if 5% of farmland was owned by overseas buyers,” he says.

“But if 95% of the land in New Zealand was owned by overseas buyers, I think we would have an issue – it would reduce our strategic options in the future.”

Dr Rolleston’s sentiment echoes public concern from 2012 when Pengxin bought 16 dairy farms and sparked a debate about national identity.

During the last elections in 2014, opposition politicians stoked those fears by saying that New Zealanders risked becoming “tenants in their own land”

So, it appears that in New Zealand, the government wishes to keep resources such as land in local hands.

The Case of China

China seems to have laws that make it difficult for foreign firms to compete with local firms, making it a logical choice for them to outsource the manufacture of their goods to Chinese firms.

This is explained on Chinese Law Blog in “Building and Operating a China Factory. Why Even Bother?”

For example, if you are a Wholly Foreign Owned Enterprise (WFOE), you need to pay the government 20% of the profits of your China operations.

I quote:

First off, in your first year, you are going to essentially waste around $50,000 in just forming your WFOE, securing various government approvals, paying someone to figure out your taxes, and making up for all the mistakes you will make because you will be in, what is for you, a very strange land. On top of that will be your taxes, which you are going to need to pay on just about everything. Figure 20% on profits and even if you do not make profits, figure on them being imputed to you. And figure on having to pay around 40% to various of the Chinese governments as taxes on the salaries you pay your employees.

And all of this is going to mean that your costs are going to be considerably higher than whatever Chinese factory you are currently using to make your product. In Buying A Chinese Company? Why China Deals DON’T Get Done, I wrote of the way this domestic-foreign price differential works in the context of a client looking to buy its Chinese manufacturer:

I said that there is a good chance the Chinese manufacturer is paying half of its employees completely under the table and reporting to the government only half of what it was paying the other half. I then talked of how there is also a good chance the Chinese manufacturer is underpaying its taxes and of how its rent also may be paid under the table. I then said that this sort of thing may be all well and good for Chinese companies, but that if the US manufacturer were to buy this Chinese manufacturer, it would need to do so as a WFOE and it would then immediately be on a “whole ‘nother level” with respect to China’s various tax authorities.

Joint ventures with Chinese firms on the other hand apparently are difficult to retain control of.

This explains why Apple, instead of running its own factories in China, outsources the manufacturing to a Chinese firm – Foxconn.

So it appears that China, through its taxation mechanisms and through selective enforcement of its laws, deliberately blocks FDI into China.

Numbers show that in the past few decades, China received considerable inflows of FDI.

However, much of that FDI is attributable to Hong Kong (Hong Kong’s share of China’s FDI was 55% – ten times that of the USA).

The imbalance in FDI numbers can be seen in the following table taken from an out-of-print volume from the National Bureau of Economic Research in China.

china_fdi_table

So, it is evident from the above that most of China’s 1990s FDI came from Hong Kong and therefore was not really Foreign DI.

So, what benefits does blocking FDI give to a local area?

Might barriers to FDI help economies?

In the case of China, the advantage seems obvious.

If foreign investors could open their own factories, they would hire Chinese labour and be able to own the scalable income from the proceeds of manufacturing, passing on only the salaries (which are low in China) to their employees in China.

It is only the inability of foreign investors to purchase and operate Chinese factories as cheaply that forces them to outsource to Chinese firms.

So, the barriers to FDI in China help drive business to Chinese firms.

These policies can also force the transfer of technology to local firms.

Contrast this with the case of India.

India had a similar boom driven by cheap labour in the form of software services.

There were hundreds of small, medium and large firms in Bangalore offering software services to the financial sector in the USA.

However, a trend I have seen in recent years is for US firms to buy up medium-sized software firms in India and get their software related work executed by these now captive software teams than to outsource work to Indian firms.

So, the inability to get work drives many local IT services firms to sell their operations to US-based firms.

This effectively reduces the total revenue earned locally to just the salaries paid instead of salaries+profits.

I am not certain of how the numbers compare with the boom years but I think the absence of a road up the IT services value chain results in the middle being cut out, leaving only the very large firms and a host of relatively tiny shops in the local IT services market.

So, had FDI barriers existed in the software sector, it would have forced foreign consumers of IT services to access Indian programmers through an Indian corporate entity, increasing the share of the pie that accrued to the local economy, and increasing the quantity of business opportunities for local IT firms.

Take the case of New Zealand again.

The argument put forth by the government there is that they need to keep control of land (a scarce resource in New Zealand).

In other words, the restrictions on FDI are meant to allow local firms the retain control of revenue generating resources.

Contrast this with the software industry in India again.

In software the key resources are human resources (the engineers). And around the world, software engineers are a scarce resource.

There was no barrier to FDI that would allow local firms to compete with foreign firms (which could pay more because of their stronger currencies) for those resources.

The result seems to be a stratification of software engineering resources by capability.

The engineers with the best skills (who could effectively develop products) mostly ended up in American software firms and can be paid salaries in the range of Rs. 60 lakhs per annum.

Engineers with less valuable skill-sets (who can at most configure, install, test or maintain products) seem to gravitate to Indian IT services firms where salaries seem to stagnate at about a quarter that amount.

This flight of resources might have contributed to preventing forays into product development by software services firms in India.

So, in a sense, by providing outside firms direct access to scarce resources, the opportunity to make a lot more money might have been allowed to slip away.

Conclusion

In conclusion, it may be said that sometimes, a clever use of barriers to FDI seems to help extract more revenue (by blocking access to valuable resources) and thus create larger surpluses for local businesses. Those surpluses in turn help build more infrastructure and capacity and improve the economy. The case of China, contrasted with that of India, seems to illustrate the case where such barriers might have been beneficial to the economy.