Kabir and Language

Kabir
Image from Wikipedia

Yesterday, I went to a concert of songs belonging to the tradition of a 15th century saint-poet called Kabir, and came across a very interesting song that he is said to have composed.

It went something like this.

The cow was milked

Before the calf was born

But after I sold the curd in the market

and this:

The ant went to its wedding

Carrying a gallon of oil

And an elephant and a camel under its arms

From the perspective of natural language processing and machine learning, the incongruous situations depicted in these poems turn out having an interesting pattern in them, as I will explain below.

I found more examples of Kabir’s “inverted verses” online.

The poems at http://www.sriviliveshere.com/mapping-ulat-bansi.html come with beautiful illustrations as well.

Here are a few more lines from Kabir’s inverted verse:

A tree stands without roots

A tree bears fruit without flowers

Someone dances without feet

Someone plays music without hands

Someone sings without a tongue

Water catches fire

Someone sees with blind eyes

A cow eats a lion

A deer eats a cheetah

A crow pounces on a falcon

A quail pounces on a hawk

A mouse eats a cat

A dog eats a jackal

A frog eats snakes

What’s interesting about all of these is that they’re examples of entity-relationships that are false.

Let me first explain what entities and relationships are.

Entities are the real or conceptual objects that we perceive as existing in the world we live in.  They are usually described using a noun phrase and qualified using an adjective.

Relationships are the functions that apply to an ordered list of entities and return a true or false value.

For example, if you take the sentence “The hunter hunts the fox,” there are two entities (1. the hunter, 2. the fox).  The relationship is “hunts”, it returns true for the two entities presented in that order.

The relationship “hunts” would return false if the entities were inverted (as in 1. the fox and 2. the hunter … as in the sentence “The fox hunts the hunter”).

The relationship and the entity can be stored in a database and hence can be considered as the structured form of an unstructured plain-language utterance.

In fact it is entities and relationships such as these that it was speculated would some day make up the semantic web.

Most of Kabir’s inverted verse seems to be based on examples of false entity relationships of dual arity (involving two entities), and that often, there is a violation of entity order which causes the entity function to return the value false.

In the “cow was milked” song, the relationship that is violated is the temporal relationship: “takes place before”.

In the “ant’s wedding” song, the relationship that is violated is that of capability: “can do”.

In the rest of the examples, relationships like “eats”, “hunts”, “plays”, “dances”, “bears fruit”, etc., are violated.

Other Commentary

In Osho’s “The Revolution”, he talks about Kabir’s interest in and distrust of language, quoting the poet as saying:

I HAVE BEEN THINKING OF THE DIFFERENCE BETWEEN WATER

AND THE WAVES ON IT. RISING,

WATER’S STILL WATER, FALLING BACK,

IT IS WATER. WILL YOU GIVE ME A HINT

HOW TO TELL THEM APART?

BECAUSE SOMEONE HAS MADE UP THE WORD ‘WAVE’,

DO I HAVE TO DISTINGUISH IT FROM ‘WATER’?

And Osho concludes with:

Kabir is not interested in giving you any answers — because he knows perfectly well there is no answer. The game of question and answers is just a game — not that Kabir was not answering his disciples’ questions; he was answering, but answering playfully. That quality you have to remember. He is not a serious man; no wise man can ever be serious. Seriousness is part of ignorance, seriousness is a shadow of the ego. The wise is always non-serious. There can be no serious answers to questions, not at least with Kabir — because he does not believe that there is any meaning in life, and he does not believe that you have to stand aloof from life to observe and to find the meaning. He believes in participation. He does not want you to become a spectator, a speculator, a philosopher.

Notes

This genre of verse seems to have been a tradition in folk religious movements in North India.  In “The Tenth Rasa: An Anthology of Indian Nonsense” by Michael Heyman, Sumanya Satpathy and Anushka Ravishankar, they talk about Namdev, a 13th century saint-poet as having authored such verses as well.

Ruminations on Consciousness

descartes_mind_and_body
Descartes’ illustration of the mind-body problem

There is an interesting unanswered question that humanity still hasn’t managed to put to rest, and it is:

“What is consciousness?”

Is human consciousness magical or mechanical?

Is there some magical thing called a soul in all animals that makes us who we are, drives our actions and makes us conscious of the world around us?

Various religious traditions have different explanations for consciousness.

Traditional Hypothesis

Semitic traditions – Judaism, Islam and Christianity – don’t talk much about consciousness but they have an implicit position on the subject.  On the other hand, Hindu philosophical traditions talk explicitly about it, and the following are some of the more well-known philosophical positions:

  1. a) Advaita – the consciousness of creatures on earth is essentially the same as that of the divine.
  2. b) Dvaita – there are two kinds of consciousness – that of earthly creatures and that of the divine.
  3. c) Vishishtadvaita – there are two kinds of consciousness – earthly and divine – and the former can become one with the latter.

So, Hindu philosophies take one of the following positions:

Dvaita:

phil_dvaita

Advaita:

phil_advaita

The view of all Semitic religions, though not explicitly stated, seems to be closer to that of Dvaita philosophy, in that souls are considered distinct entities from the gods (after death, these souls end up in a good place or a bad place for a long time).

I say gods in plural because all Semitic religions seem to believe in the existence of a good divine being and an evil divine being (satan/shaitan/iblees) who is different from the good divine being and not fully subject to him (which is different from Hindu philosophies where the concept of an absolutely bad/evil divine being doesn’t seem to exist).

phil_semitic

So, there are two or more (if you count the lesser divine beings called angels, jinns, etc) divine consciousnesses in Judaism, Islam or Christianity.

Judeo-Christian Beliefs

In Judeo-Christian literature, for example, a text about a man called Job deals with this concept of a bad divine being.  The Wikipedia says: “Between Job 1:9–10 and 2:4–5, Satan points out that God has given Job everything that a man could want, so of course Job would be loyal to God; Satan suggests that Job’s faith would collapse if all he has been given (even his health) were to be taken away from him. God therefore gives Satan permission to test Job.”  There are also in this belief system conceptions of other divine beings: a holy trinity, a pantheon of angels, etc.

Islamic Beliefs

In an account by al-Tabari, a 9th century scholar, the prophet Muhammad is described as having endorsed in verse three dieties of the Kaaba other than Al-Lah at one point (they were called Al-Lat, Al-Uzza, and Manat), and later withdrawn the endorsement with the explanation that Satan (who in Islamic theology is believed to have only the power to put ideas into people’s minds) had made him do it.  The verses endorsing these other deities (later withdrawn) are referred to in some places as the Satanic Verses (https://en.wikipedia.org/wiki/Satanic_Verses).

All the above religious belief systems imagine one single soul as resident in each living thing on earth.

However, in the Baha’i faith, it appears that there is a concept of a good and an evil side in each living thing, though possibly not as two consciousnesses.  Abdu’l-Bahá is supposed to have said: “This lower nature in man is symbolized as Satan — the evil ego within us, not an evil personality outside.”

In fiction, there have been imaginings of more than one conscious ‘soul’ being resident in a human.

Take the tale of Dr. Jekyll and Mr. Hyde.  In it, the character of Jekyll/Hyde is described as having two consciousnesses, one good and one bad – each akin to one of the principal divine beings in Semitic religions.

phil_jekyll_hyde.png

So, in Jekyll/Hyde’s world, there is a multiplicity of consciousnesses, not just in the divine plane, but also on earth.

Extrapolation

So, it appears that we can imagine multiple divine beings in existence, and multiple consciousnesses existing in each of us.  We can also imagine a single divine being in existence, and a single conscious soul.  We can even imagine the earthly soul/consciousness being the same as the divine soul/consciousness.

The obvious question is:  can we imagine the absence of the magical soul since we can imagine the absence of a divine being (atheistic belief systems have existed since times immemorial)?

One of the reasons for postulating the existence of divine beings is that they give us a way of explaining inexplicable phenomena.  In antiquity, when humans needed to explain thunder and tides, they imagined thunder gods and sea gods.  Later, when they became better able to explain (or at least to be conscious of unchangeable patterns in and to fend against) nature, they began to adopt more abstract conceptions of deities that reflected human consciousness, and where the religious traditions served to provide an explanation for the phenomenon of life, and an ethical framework for reasoning during the period of life.

Once the creation of living things could be explained, and social contracts became things one could reason about, humans seem to have found it easier to surmise that no gods were needed to explain creation, life and ethical values.

Similarly, as we become better able to explain how our minds work, and to understand perception, memory, cognition and language, and also phenomena such as hallucination and mental illnesses, beliefs in magical phenomena such as spirits taking possession of individuals have begun to diminish.

By extrapolation, one might suppose that with time, the belief in an immortal soul will also diminish.

Alternative Hypothesis

What is likely to replace the concept of a magical soul in all living things?

One of the modern theories of consciousness (according to David Chalmers) seems to be that the mind and the brain are one (see the mind-brain identity theory of 1950 mentioned in http://consc.net/papers/five.pdf).

However, since the brain itself is little understood, we’d merely be explaining something we don’t understand using something that we don’t understand (though we’d be seeing it as something physical).

It seems to me that it might be better to explain our consciousness to ourselves in terms of how computers do the things that we are conscious of doing, since computers are well-understood.

It appears that a definition of consciousness as:  “the ability to perceive the world, form a model of the world (including imagined worlds), retain a memory of the perceptions and models, reason independently about those models, and optionally, to act on the perceptions” would be accurate.

So any machine could be considered conscious if it is able to perceive the world, form a model of the world (including imagined worlds), retain a memory of the perceptions and models, reason independently about those models, and optionally act on the perceptions.

There are already theories that seem to come close to the above definition.  They are called Representationalist Theories.  And many of them, interestingly, seem to have been developed only in the last 20 years: http://plato.stanford.edu/entries/consciousness/#RepThe

Here are some other discussions of the aforesaid theories:

  1. http://plato.stanford.edu/entries/consciousness-representational/
  2. http://plato.stanford.edu/entries/qualia/

A concept such as qualia (see preceding link), which seems so troubling to a representationalist philosopher, would appear trivial to someone well-versed in machine learning, because in machine learning, we already have names for concepts that go beyond these, such as features and models.

So it seems to me at first glance that representationalist theories of consciousness with the addition of concepts from machine learning can adequately explain consciousness in all animals.

Consequences and More Questions

The consequences of reducing the being/soul/consciousness to a mechanical process would be very interesting.

If, we accept the above definition, we would have to think of humans as computers, because if human perceptions of the physical world are nothing more than a mental model of the same, then an electronic model of the world in a computer or a physical model in a mechanical device would also qualify as consciousness of the world.

Can we say that a computer or a mechanical model is conscious of the world in the same way that we are?

If we go with the theory that we have no magical soul, then the only alternative that remains is to accept that if a physical representation of the world can be created of its own volition by a machine that can also reason about it, then the machine is also conscious of the world.  In other words, our consciousness would have to be accepted as consisting of nothing more than our memories of the world we perceive and the models we create in our minds, and our ability to think about and reason over them.  Anything that can similarly perceive, model and remember things would have to be considered as possessing consciousness, leading to other interesting questions such as:  are humans machines, is ‘consciousness’ = ‘life’, can we have consciousness without life, and finally,  what is life?

Sample Programs for Teaching Programming in Kannada

Yesterday, I was asked to explain Arduino concepts to a group of teachers from rural schools in Karnataka at a workshop.

So, I created a set of slides and a set of illustrative computer programs in Kannada.

I was really keen to hear what the teachers had to say because I had been extremely apprehensive about whether anyone would be able to type software in Kannada (the standard keyboards available in India are ASCII keyboards labelled with Roman letters).

So, at the beginning of the class, I asked the teachers whether they could type Kannada using ASCII keyboards.

They said that they could.  They said that they were used to typing using Nudi or Baraha software (that allows one to type Kannada using a Roman alphabet keyboard).

Since I didn’t have Nudi or Baraha installed, I showed them how Google’s Input Tools worked, and they liked them very much (those with laptops insisted that I install Google Input Tools for them after the lecture).

Apparently, all the teachers could type using a Roman keyboard.  They could also all speak some English.

But their level of comfort with English was low when it came to reading and comprehension.

This group of teachers said they found it much easier to read Kannada than English though they typed Kannada on a Latin keyboard.

And they said that for that reason (ease of reading and comprehension), programming tools in the Kannada language would be useful to them.

Acknowledgements: The workshop yesterday was organized by Workbench Projects.  There had been a similar workshop at ArtScienceBLR on March 29th.  So, anyone wishing to learn to program Arduino boards in Kannada can contact either of these organizations.

You can download and explore the Indian language tools from here http://www.aiaioo.com/arduino_in_local_languages/download and the commands are listed here http://www.aiaioo.com/arduino_in_local_languages/index.php.

Below are screenshots of some of the programs:

  1.  Storing an Integer in Memory and Reading it Back

kannada_program_storing

2.  Adding Two Integers

kannada_program_adding.png

3.  Dividing Two Real Numbers

kannada_program_diving.png

4.  Logical Operations

kannada_program_comparing.png

5.  Conditional Transfer of Control

kannada_program_if.png

6.  Repetition

For Loop

kannada_program_for.png

While Loop

kannada_program_while.png

7.  Electronics

kannada_program_electronics.png

Programming in Hindi and Tamil in addition to Kannada

For children who are taught computer science in Indian languages, conventional software programming can pose a challenge because most programming languages use the Roman alphabet.

We’ve created a way to allow students and teachers to write Arduino programs in Indian languages (at the request of Arduino India).

Arduino boards can now be programmed in Hindi and Tamil in addition to Kannada (as already described in an earlier post).

The Indian language extensions can be downloaded from our website.

These are the words used as replacements for English words: http://www.aiaioo.com/arduino_in_local_languages/

The website allows one to comment on and discuss the keywords picked as replacements for English words.

There’s still a lot of work to be done on the choices.

Very specifically, we’re looking for ways to simplify the words so that school children will find them easier to remember and type.

Any ideas for facilitating the learning process for children would be very welcome.

Illustrative Example

For example, we used to translate analogWrite as “ಅನಂಕೀಯವಾಗಿ_ಬರೆ” in Kannada, (“अनंकीय_लिखो” in Hindi and “அனலாக்_எழுது” in Tamil) using the coined word ‘anankiiya’ (negation of ‘ankiiya’ meaning digital) or the transliteration of ‘analog’.

However, during a discussion, a physicist who helped with the Kannada (thank you, Padmalekha) suggested that we use the phrase “write without spaces” for analogWrite.

And then it hit us that we could just use the phrase “write value” for analogWrite and “write number” for digitalWrite.

The following translations for analogWrite: “ಮೌಲ್ಯವನ್ನು_ಬರೆ“, “मूल्य_लिखो” and “மதிப்பை_எழுது” are much more intuitive.

The new translations for digitalWrite are also just as easy to comprehend: “ಅಂಕೆಯನ್ನು_ಬರೆ“, “अंक_लिखो” and “எண்ணை_எழுது

The process of simplification is an ongoing one, and we hope in a few months’ time to have a generally agreed-upon set of translations, after taking everyone’s inputs into consideration.

Syntax

The Arduino IDE with extensions now supports syntax highlighting in Indian languages. This makes it easier to program in the local language.

This is how Kannada code looks:

Kannada Code

programming_kannada

And here is how it looks in Hindi and in Tamil.

Hindi Code

programming_hindi

Tamil Code

programming_tamil

 

A Naive Bayes classifier that outperforms NLTK’s

We found that by changing the smoothing parameters of a Naive Bayes classifier, we could get far better accuracy numbers for certain tasks.  By changing the Lidstone smoothing parameter from 0.05 to 0.5 or greater, we could go from an accuracy of about 50% to almost 70% on the task of question classification for question answering.

This is not at all surprising because, as described in an earlier post, the smoothing method used in the estimation of probabilities affects Naive Bayes classifiers greatly.

Below, we have provided an implementation of a Naive Bayes classifier which outperforms the Naive Bayes classifier supplied with NLTK 3.o by almost 10% on the task of classifying questions from the questions-train.txt file supplied with the textbook “Taming Text”.

Our Naive Bayes classifier (with a Lidstone smoothing parameter of 0.5) exhibits about 65% accuracy on the task of question classification, whereas the NLTK classifier has an accuracy of about 40% as shown below.

smoothing_graph

Finally, I’d like to say a few words about the import of this work.

Theoretically, by increasing the Lidstone smoothing parameter, we are merely compensating more strongly for absent features; we are negating the absence of a feature more vigorously;  reducing the penalty for the absence of a feature in a specific category.

Because increased smoothing lowers the penalty for feature absence, it could help increase the accuracy when a data-set has many low-volume features that do not contribute to predicting a category, but whose chance presence and absence may be construed in the learning phase to be correlated with a category.

Further investigation is required before we can say whether the aforesaid hypothesis would explain the effect of smoothing on the accuracy of classification in regard to the question classification data-set that we used.

However, this exercise shows that algorithm implementations would do well to leave the choice of Lidstone smoothing parameters to the discretion of the end user of a Naive Bayes classifier.

The source code of our Naive Bayes classifier (using Lidstone smoothing) is provided below:

This implementation of the Naive Bayes classifier was created by Geetanjali Rakshit, an intern at Aiaioo Labs.

 

import numpy as np
import random
import sys, math

class Classifier:
	def __init__(self, featureGenerator):
		self.featureGenerator = featureGenerator
		self._C_SIZE = 0
		self._V_SIZE = 0
		self._classes_list = []
		self._classes_dict = {}
		self._vocab = {}

	def setClasses(self, trainingData):
		for(label, line) in trainingData:
			if label not in self._classes_dict.keys():
				self._classes_dict[label] = len(self._classes_list)
				self._classes_list.append(label)
		self._C_SIZE = len(self._classes_list)
		return
		
	def getClasses(self):
		return self._classes_list

	def setVocab(self, trainingData):
		index = 0;
		for (label, line) in trainingData:
			line = self.featureGenerator.getFeatures(line)
			for item in line:
				if(item not in self._vocab.keys()):
					self._vocab[item] = index
					index += 1
		self._V_SIZE = len(self._vocab)
		return

	def getVocab(self):
		return self._vocab

	def train(self, trainingData):
		pass

	def classify(self, testData, params):
		pass

	def getFeatures(self, data):
		return self.featureGenerator.getFeatures(data)
		

class FeatureGenerator:
	def getFeatures(self, text):
		text = text.lower()
		return text.split()


class NaiveBayesClassifier(Classifier):
	def __init__(self, fg, alpha = 0.05):
		Classifier.__init__(self, fg)
		self.__classParams = []
		self.__params = [[]]
		self.__alpha = alpha

	def getParameters(self):
		return (self.__classParams, self.__params)

	def train(self, trainingData):
		self.setClasses(trainingData)
		self.setVocab(trainingData)
		self.initParameters()

		for (cat, document) in trainingData:
			for feature in self.getFeatures(document):
				self.countFeature(feature, self._classes_dict[cat])

	def countFeature(self, feature, class_index):
		counts = 1
		self._counts_in_class[class_index][self._vocab[feature]] = self._counts_in_class[class_index][self._vocab[feature]] + counts
		self._total_counts[class_index] = self._total_counts[class_index] + counts
		self._norm = self._norm + counts

	def classify(self, testData):
		post_prob = self.getPosteriorProbabilities(testData)
		return self._classes_list[self.getMaxIndex(post_prob)]

	def getPosteriorProbabilities(self, testData):
		post_prob = np.zeros(self._C_SIZE)
		for i in range(0, self._C_SIZE):
			for feature in self.getFeatures(testData):
				post_prob[i] += self.getLogProbability(feature, i)
			post_prob[i] += self.getClassLogProbability(i)
		return post_prob

	def getFeatures(self, testData):
		return self.featureGenerator.getFeatures(testData)

	def initParameters(self):
		self._total_counts = np.zeros(self._C_SIZE)
		self._counts_in_class = np.zeros((self._C_SIZE, self._V_SIZE))
		self._norm = 0.0

	def getLogProbability(self, feature, class_index):
		return math.log(self.smooth(self.getCount(feature, class_index),self._total_counts[class_index]))

	def getCount(self, feature, class_index):
		if feature not in self._vocab.keys():
			return 0
		else:
			return self._counts_in_class[class_index][self._vocab[feature]]

	def smooth(self, numerator, denominator):
		return (numerator + self.__alpha) / (denominator + (self.__alpha * len(self._vocab)))

	def getClassLogProbability(self, class_index):
		return math.log(self._total_counts[class_index]/self._norm)

	def getMaxIndex(self, posteriorProbabilities):
		maxi = 0
		maxProb = posteriorProbabilities[maxi]
		for i in range(0, self._C_SIZE):
			if(posteriorProbabilities[i] >= maxProb):
				maxProb = posteriorProbabilities[i]
				maxi = i
		return maxi


class Dataset:
	def __init__(self, filename):
		fp = open(filename, "r")
		i = 0
		self.__dataset = []
		for line in fp:
			if(line != "\n"):
				line = line.split()
				cat = line[0]
				sent = ""
				for word in range(1, len(line)):
					sent = sent+line[word]+" "
				sent = sent.strip()
				self.__dataset.append([cat, str(sent)])
				i = i+1
		random.shuffle(self.__dataset)	
		self.__D_SIZE = i
		self.__trainSIZE = int(0.6*self.__D_SIZE)
		self.__testSIZE = int(0.3*self.__D_SIZE)
		self.__devSIZE = 1 - (self.__trainSIZE + self.__testSIZE)

	def setTrainSize(self, value):
		self.__trainSIZE = int(value*0.01*self.__D_SIZE)
		return self.__trainSIZE

	def setTestSize(self, value):
		self.__testSIZE = int(value*0.01*self.__D_SIZE)
		return self.__testSIZE

	def setDevelopmentSize(self):
		self.__devSIZE = int(1 - (self.__trainSIZE + self.__testSIZE))
		return self.__devSIZE

	def getDataSize(self):
		return self.__D_SIZE
	
	def getTrainingData(self):
		return self.__dataset[0:self.__trainSIZE]

	def getTestData(self):
		return self.__dataset[self.__trainSIZE:(self.__trainSIZE+self.__testSIZE)]

	def getDevData(self):
		return self.__dataset[0:self.__devSIZE]



#============================================================================================

if __name__ == "__main__":
	
	# This Naive Bayes classifier implementation 10% better accuracy than the NLTK 3.0 Naive Bayes classifier implementation
	# at the task of classifying questions in the question corpus distributed with the book "Taming Text".

	# The "questions-train.txt" file can be found in the source code distributed with the book at https://www.manning.com/books/taming-text.
	
	# To the best of our knowledge, the improvement in accuracy is owed to the smoothing methods described in our blog:
	# https://aiaioo.wordpress.com/2016/01/29/in-a-naive-bayes-classifier-why-bother-with-smoothing-when-we-have-unknown-words-in-the-test-set/
	
	filename = "questions-train.txt"
	
	if len(sys.argv) > 1:
		filename = sys.argv[1]
	
	data = Dataset(filename)
	
	data.setTrainSize(50)
	data.setTestSize(50)
	
	train_set = data.getTrainingData()
	test_set = data.getTestData()
	
	test_data = [test_set[i][1] for i in range(len(test_set))]
	actual_labels = [test_set[i][0] for i in range(len(test_set))]
	
	fg = FeatureGenerator()
	alpha = 0.5 #smoothing parameter
	
	nbClassifier = NaiveBayesClassifier(fg, alpha)
	nbClassifier.train(train_set)
	
	correct = 0;
	total = 0;
	for line in test_data:
		best_label = nbClassifier.classify(line)
		if best_label == actual_labels[total]:
			correct += 1
		total += 1
	
	acc = 1.0*correct/total
	print("Accuracy of this Naive Bayes Classifier: "+str(acc))

 

 

Teaching programming in the Kannada language

An Arduino board is a tiny bare-bones computer.

When Arduino programming is taught in rural India, a problem that is often encountered is that students can only read and write their local language, and the Arduino programming language is English.

(This problem was first described to us by Prakash of Simple Labs in Chennai, a startup that has made it their mission to teach electronics and robotics to children in rural Tamil Nadu using Arduino).

So, we thought we’ve tried to solve the problem, and with help from the Arduino India office, we’ve managed to develop an extension to the Arduino programming environment that allows Arduino boards to be programmed in Kannada.

Here is a screenshot of a program to make an LED attached to Arduino pin number 13 blink on and off every couple of seconds.

programming_kannada

To do this, we had to translate various commands and functions into Kannada.

For example, we translated the return value of the function “void” as “ಖಾಲಿ” (pronounced ‘khaali’).  The word “ಖಾಲಿ” is short, in common use and unambiguous, so we chose it over “ಶೂನ್ಯ” (‘shoonya’) which could also mean zero.

The most difficult translation for the South Indian languages turns out to be the two words in English that are at the core of all programming and logic – the words ‘if‘ and ‘else‘.

There is no word for ‘if’ or ‘else’ in Kannada or Tamil.

For the moment, we’ve approximately translated ‘if’ and ‘else’ as “ಆದರೆ” and “ತಪ್ಪಿದರೆ” in Kannada.

Acknowledgement:  Many thanks to Padmalekha Kydala Ganesha for helping us translate mathematics and physics terms used in Arduino into Kannada (in particular ‘analog’ and ‘to the power of’).

In a Naive Bayes classifier, why bother with smoothing when we have unknown words in the test set?

I came across this question on Stack Overflow a few days ago, and found it very interesting, so I thought I’d post the answer to the question here on the blog as well.

Why do we bother with smoothing at all in a Naive Bayes classifier (when we can throw away the unknown features instead).

The answer to the question is: words are often unknown in some but not all classes.

Say there are two classes M and N with features A, B and C, as follows:

M: A=3, B=1, C=0

(In the class M, A appears 3 times and B only once)

N: A=0, B=1, C=3

(In the class N, C appears 3 times and B only once)

Let’s see what happens when we throw away features that appear zero times.

A) Throw Away Features That Appear Zero Times In Any Class

If we throw away features A and C because they appear zero times in any of the classes, then we are only left with feature B to classify documents with.

And losing that information is a bad thing as we will see below!

Say we’re presented with a test document as follows:

B=1, C=3

(It contains B once and C three times)

Since we’ve discarded the features A and C, we won’t be able to tell whether the above document belongs to class M or class N.

So, losing any feature information is a bad thing!

B) Throw Away Features That Appear Zero Times In All Classes

Is it possible to get around this problem by discarding only those features that appear zero times in all of the classes?

No, because that would create its own problems!

The following test document illustrates what would happen if we did that:

A=3, B=1, C=1

The probability of M and N would both become zero (because we did not throw away the zero probability of A in class N and the zero probability of C in class M).

C) Don’t Throw Anything Away – Use Smoothing Instead

Smoothing allows us to classify both the above documents correctly because:

  1. We do not lose count information in classes where such information is available and
  2. We do not have to contend with zero counts.

Naive Bayes Classifiers In Practice

The Naive Bayes classifier in NLTK throws away features that have zero counts in all of the classes.

This makes it perform poorly when trained using a hard EM procedure (where the classifier is bootstrapped up from very little training data) on data sets with a high occurrence of unknown words (like Twitter feeds).

Here is the relevant code from NLTK 3.0:

def prob_classify(self, featureset):
  # Discard any feature names that we've never seen before.
  # Otherwise, we'll just assign a probability of 0 to
  # everything.
  featureset = featureset.copy()
  for fname in list(featureset.keys()):
    for label in self._labels:
    if (label, fname) in self._feature_probdist:
      break
    else:
      #print 'Ignoring unseen feature %s' % fname
      del featureset[fname]

Related Blog Posts:

  1. Analysing documents for non-obvious differences

  2. Naive Bayes Classifier in OpenNLP