Programming in Hindi and Tamil in addition to Kannada

For children who are taught computer science in Indian languages, conventional software programming can pose a challenge because most programming languages use the Roman alphabet.

We’ve created a way to allow students and teachers to write Arduino programs in Indian languages (at the request of Arduino India).

Arduino boards can now be programmed in Hindi and Tamil in addition to Kannada (as already described in an earlier post).

The Indian language extensions can be downloaded from our website.

These are the words used as replacements for English words: http://www.aiaioo.com/arduino_in_local_languages/

The website allows one to comment on and discuss the keywords picked as replacements for English words.

There’s still a lot of work to be done on the choices.

Very specifically, we’re looking for ways to simplify the words so that school children will find them easier to remember and type.

Any ideas for facilitating the learning process for children would be very welcome.

Illustrative Example

For example, we used to translate analogWrite as “ಅನಂಕೀಯವಾಗಿ_ಬರೆ” in Kannada, (“अनंकीय_लिखो” in Hindi and “அனலாக்_எழுது” in Tamil) using the coined word ‘anankiiya’ (negation of ‘ankiiya’ meaning digital) or the transliteration of ‘analog’.

However, during a discussion, a physicist who helped with the Kannada (thank you, Padmalekha) suggested that we use the phrase “write without spaces” for analogWrite.

And then it hit us that we could just use the phrase “write value” for analogWrite and “write number” for digitalWrite.

The following translations for analogWrite: “ಮೌಲ್ಯವನ್ನು_ಬರೆ“, “मूल्य_लिखो” and “மதிப்பை_எழுது” are much more intuitive.

The new translations for digitalWrite are also just as easy to comprehend: “ಅಂಕೆಯನ್ನು_ಬರೆ“, “अंक_लिखो” and “எண்ணை_எழுது

The process of simplification is an ongoing one, and we hope in a few months’ time to have a generally agreed-upon set of translations, after taking everyone’s inputs into consideration.


The Arduino IDE with extensions now supports syntax highlighting in Indian languages. This makes it easier to program in the local language.

This is how Kannada code looks:

Kannada Code


And here is how it looks in Hindi and in Tamil.

Hindi Code


Tamil Code


Languages and Numbers and Ways of Counting to 8 !

This article is about how small numbers are represented in various languages.
Acknowledgement: much of this article is taken from the Wikipedia page about positional notation.
The base is the mathematical term for the number of digits you would use to count in a language.
For example, if you used the fingers of both hands to count, you would be using a base of 10.
If you used the fingers of one hand to count, you would be using a base of 5.
If you used the fingers of both hands and the toes of both feet, you would be using a base of 20.
Some languages have names for numbers that lead you to suspect that their users might have thought in terms of groups of 20.
French has an interesting way of describing numbers above 60.  In French, the word for 60 is “soixante”, the word for 75 is “soixante quinze” (sixty and fifteen) while 80 is “quatre-vingt” (four-twenties) and 95 is “quatre-vingt quinze” (four-twenties and fifteen).
And it is not just French.  English uses the word ‘score’ to describe a group of 20 things.  So, when we talk of “two score” we mean forty, and when we say “four score and seven” we mean 87.
The article also talks about Welsh and Irish and Maori:
The Irish language also used base-20 in the past, twenty being fichid, forty dhá fhichid, sixty trí fhichid and eighty ceithre fhichid. A remnant of this system may be seen in the modern word for 40, daoichead.
The Welsh language continues to use a base-20 counting system, particularly for the age of people, dates and in common phrases. 15 is also important, with 16–19 being “one on 15”, “two on 15” etc. 18 is normally “two nines”. A decimal system is commonly used.
Danish numerals display a similar base-20 structure.
The Maori language of New Zealand also has evidence of an underlying base-20 system as seen in the terms Te Hokowhitu a Tu referring to a war party (literally “the seven 20s of Tu”) and Tama-hokotahi, referring to a great warrior (“the one man equal to 20”).
Another interesting system is the base-12 system.
The Wikipedia article says:
Twelve is a useful base because it has many factors. It is the smallest common multiple of one, two, three, four and six. There is still a special word for “dozen” in English, and by analogy with the word for 102hundred, commerce developed a word for 122gross. The standard 12-hour clock and common use of 12 in English units emphasize the utility of the base. In addition, prior to its conversion to decimal, the old British currency Pound Sterling (GBP) partially used base-12; there were 12 pence (d) in a shilling (s), 20 shillings in a pound (£), and therefore 240 pence in a pound. Hence the term LSD or, more properly, £sd.
There was even a language that made use of a base-2 (binary) system for counting.  Base-2 (binary) is mainly used in computers today (because switches can represent binary numbers – a switch that is off represents the 0 digit and a switch that is on represents the 1 digit).  But apparently, native Australian languages use binary too.
A number of Australian Aboriginal languages employ binary or binary-like counting systems. For example, in Kala Lagaw Ya, the numbers one through six are urapon,ukasarukasar-uraponukasar-ukasarukasar-ukasar-uraponukasar-ukasar-ukasar.
The article also says that there is some evidence of the use of base-8 in language:
A base-8 system (octal) was devised by the Yuki tribe of Northern California, who used the spaces between the fingers to count, corresponding to the digits one through eight.[6] There is also linguistic evidence which suggests that the Bronze Age Proto-Indo Europeans (from whom most European and Indic languages descend) might have replaced a base-8 system (or a system which could only count up to 8) with a base-10 system. The evidence is that the word for 9, newm, is suggested by some to derive from the word for “new”, newo-, suggesting that the number 9 had been recently invented and called the “new number”.[7]
So much for bases.
Some languages have two sets of names for numerals!
Two Sets of Names for Numbers in Japanese and Korean
Japanese and Korean use two sets of names for numbers while counting.
In Japanese, there is a set of names that are typically used when small quantities are involved:
“hitotsu”, “futatsu”, “mittsu”, “yottsu”, “itsutsu“, “muttsu”, “nanatsu“, “yattsu“, “kokonotsu“, “to” (1 to 10).
But for larger numbers and for zero, the names used are ones derived from Chinese.
“ichi”, “ni”, “san”, “shi”, “go”, “roku”, “shichi”, “hachi”, “kyu”, “ju”.
These numbers correspond to the Chinese digits:
“yī”, “èr”, “sān”, “sì”, “wǔ”, “liù”, “qī”, “bā”, “jiǔ”, “shí”.
And similarly in Korean, you would use one set of names for small quantities (for example, hours in the day):
“hana”, “dul”, “seth”, “neth”, “thasoth”, “yosoth”, “ilgop”, “yodolp”, “ahop”, “yol”.
But to describe larger quantities, like minutes or the days in a month, you’d go with names based on Chinese:
“il”, “i”, “sam”, “sa”, “o”, “yug”, “chhil”, “phal”, “ku”, “ship”.
Finally, we come to some interesting irregularities in south Indian languages.
Irregular Numbering
In Tamil (a language spoken in south India), the word for 90 is “pre-hundred”.
The first ten numbers in Tamil go:
“ondru”, “irendu”, “muundru”, “naangu”, “aindhu”, “aaru”, “eelu”, “ettu”, “ombadhu”, “patthu”
But the word “ombadhu” which means 9 is not used in 90.
In Tamil, the name for 80 is derived from the name for 8 by adding a suffix like in English.  Just as “eight” becomes “eight-y”, in Tamil, “ettu” becomes “embathu”.
But the name for 90 is not derived from the number for 9.  Instead,it is “pre-hundred”.  (In Tamil, 90 is “thonnuuru” – hundred being “nuuru”).  So, when counting from 90 to 99, you use the suffix one would normally associate with the hundred’s position.
So 91 is “pre-hundred and one”.  It is pronounced “thonnuutri-ondru” in Tamil.  92 is “pre-hundred and two”.  It is pronounced “thonnuutri-rendu” in Tamil.
I’ve not come across many languages in which 90 is described as pre-hundred.  But Hindi (a language from the north of India) has a similar feature.
In many Indian languages spoken in the north of India, the names of the first ten numbers are similar to their names in Latin.  For example, Hindi has:
“ek”, “dho”, “thiin”, “chaar”, “paanch”, “che”, “saath”, “aaT”, “nov”, “dhas”
The Hindi names for various numbers are similar to the Sanskrit names of those numbers:
“ekam”, “dve”, “thriini”, “chathvaari”, “pancha”, “shath”, “saptha”, “ashta”, “nava”, “dhasha”
But when you get to 29 in Hindi, you say “pre-30”.  The word in Hindi is “unthees” (“thees” means 30 in Hindi).
Similarly, 39 is “pre-40” (“unchaaliis” where “chaaliis” means 40).
This is different from how you count in Sanskrit.
In Sanskrit, 39 is “navatrimshat” (nine and thirty) and 29 is “navavimshatihi” (nine and twenty).
Now the absence of a regular name for numbers with 9 in them supports a theory that Indic languages might once have used base-8 for counting.
I quote from the Wikipedia article again:
There is also linguistic evidence which suggests that the Bronze Age Proto-Indo Europeans (from whom most European and Indic languages descend) might have replaced a base-8 system (or a system which could only count up to 8) with a base-10 system. The evidence is that the word for 9, newm, is suggested by some to derive from the word for “new”, newo-, suggesting that the number 9 had been recently invented and called the “new number”.[7]
The assertion seems to have been made in an article titled ‘The Indo-European system of numerals from ‘1’ to ‘10’’ by Eugenio Ramón Luján Martínez.
Eugenio argues that each of the numerals in Indo-European languages gradually came into use when required by necessity, starting with the numbers 2 and 3 (which started as deictics – like in the words ‘duo’ and ‘trio’).
There’s an overview of his arguments in this article: http://smallislandnotesan.blogspot.in/2008/01/indo-european-numbers-1-10.html
Counting on the Fingers
To a twenty-first century human, a base-10 system of counting seems like the natural way to count.
But a base-8 system could have felt more natural than a base-10 system to early humans to count with.
This is because it is only possible to count to ten on the fingers of one’s hands if one has developed the technique of bending them to mark the number up to which one has counted.
If a person uses the technique of touching the thumb to a finger to mark a count, then one can only count up to 4 on each hand (and therefore only up to 8 on both hands).
Indian musicians still keep count of the rythmic patterns in music (the thaalas) by touching the tips of their fingers with the thumb (counting in multiples of 3 or 4).
So it is indeed possible that at some point in the distant past, speakers of Indo-European languages did indeed count in groups of 8.

Funky language features – the mystery of the missing possessive verb

The verb ‘have’ is used to indicate possession.  When a speaker of the English language says, “I have a car“, the listener can infer that the speaker possesses a car.

Have” is a word that we use a lot.  I doubt anyone can imagine English without the word “have” in it.

So, it will come as a surprise to many to know that many Indian languages have no such verb.

Yes, you heard it right.  Many Indian languages have no verb like “have”.

Speakers of those languages say “There is a car near me” instead.

Below is “I have a vehicle” in three Indian languages:

Tamil:  En kitta vandi irukku  (translation into English: there is a vehicle near me)

Kannada:  Nanna hatthira gaadi idhe   (translation into English: there is a vehicle near me)

Hindi:  Mere paas gaadi hai    (translation into English: there is a vehicle near me)

Expressing Possession in Asian Languages

Some other Asian languages lack a word for “have”.

Japanese does not have a word for “have”.  Neither does Korean.

In Malay, the word for “is” is “ada”.

But “ada” can be used to mean “have” as well, as you can see from the examples below.

In the following examples, “saya” means “mine/my” (the meanings of the other Malay words are obvious).

Malay: Guru saya ada motokar baru.   (translation:  My teacher has a new car)

Malay: Bapa saya ada di rumah.      (translation:  My father is in the house)

Mandarin Chinese is an exception to this pattern.  It has a verb meaning “have”.  It is 有 (yǒu).  有 (yǒu) can also mean “to exist”, but the word commonly used for “is” is different.  It is 是 (shì) meaning “to be”.

So, a good number of widely spoken languages in South Asia don’t use a possessive verb.

But this does not mean that these Asian languages lack a mechanism to express possession.

It only means that the expression of possession and ownership uses alternative mechanisms such as idiomatic expressions (“is near” in the case of Indic languages) and context (word order and semantics in the case of Malay) in large parts of South and South-East Asia.

Expressing Possession in European Languages

In Europe, the possessive verb seems to be the preferred tool to denote possession.

We’ve already encountered the verb “have” in English, and we know that it is distinct from the verb “is”.

Below are examples from a few other European languages:


I am = Je suis

I have = J’ai


I am = Jestem

I have = mam

Modern Greek:

I am = Είμαι (Eímai)

I have = έχω (écho̱)


I am = sum

I have = habeo

Expressing Possession in Sanskrit

Sanskrit, unlike ancient Greek and Latin does not have a possessive verb.

I asked a Sanskrit scholar if possessive verbs like “have” appear anywhere in the Vedas.

He answered in the negative.

There is no evidence for the existence of possessive verbs in Vedic Sanskrit.

Some Interpretations and Flights of Fantasy

Some economists surmise that early human societies (hunter-gatherer societies) did not know the concept of ownership.

In early human societies, food from a hunt was shared, because it could not be hoarded (there was only so much food that one could eat, and what was not eaten would spoil).

So, early languages would not have had a verb like “have”.

The most important conversations in those languages would have been sort of like:

Person 1:  “Is there food?

Person 2:  “Nope.  There is no food today.

Another type of conversation that would have been critical to self-preservation would have gone like this:

Person 1:  “There is a tiger behind you!  Run!

Person 2:  “There is an antelope to your right!”

In societies centered around herding, the herds could have been common property.

Daily conversations would have gone:

Person 1:  “How many cows are there?

Person 2:  “There are 200 cows.

Sentences like “I have thirty cows” weren’t yet needed.

Economists surmise that it was farming that gave rise to concepts like ownership and property.

Farming for the first time allowed people to have a surplus of food.

This excess food could be stored, divided and traded.

Trade might have motivated the invention of language tools for talking about ownership.

It seems that in Europe languages converged on one such tool – the possessive verb.

It seems that in India languages chose another such tool – the idiomatic usage of the verb “is near”.

Historical Linguistics Questions

There is no evidence for the use of possessive verbs in Sanskrit.

However, I do not know if ancient (Vedic) Sanskrit used the idiomatic “is near” mechanism found in modern Indian languages for expressing ownership.

If it didn’t, it would suggest that the Indian vernacular mechanism for expressing ownership evolved after the period of time when the Vedas were composed or in a different geographical area.

If it did, it would suggest that the Vedas were composed after the Indian mechanisms for expressing possession were developed and in the same geographical area (assuming accurate oral transmission that preserved ancient language features).

I’d be very grateful if someone with a better knowledge of Vedic Sanskrit would be able to tell me whether such an idiomatic usage of “is near” to indicate ownership is attested in Vedic Sanskrit texts.

I’d also love to find out what mechanisms for expressing the idea of ownership existed in Old and Avestan Persian.

(Modern Persian – Farsi – has a verb “daestaen” meaning “have”, but Farsi is very different from Old Persian).

I’ve made a lot of assumptions in proposing those historical implications.  But this article was written merely to discuss possibilities.


I’ll add examples from other languages below as and when I get them from readers (with their permission to post them here).


Omar Khayyam (http://www.linkedin.com/profile/view?id=97267188) in a comment on LinkedIn (http://www.linkedin.com/groups/Funky-language-features-mystery-missing-1356867.S.5838734689329766403) said:

Arabic has no “have”. You don’t need a verb to say “I have a car” = “عِــنْــدِي سَــيَّـــارَةٌ” (By me a car). Nevertheless, there are the verbs “مَــلَــكَ” and “امْتَلَكَ” (to possess/own), which are used to stress that something belongs to someone, like, for example, in juridical documents. In a newspaper article you’d write “الأمير الوليد يمتلك طائرة خاصّة من نوع بوينغ ٧٤٧” (Prince Al-Walid owns a Boeing 747″ rather than “عِنْدَ الأمير الوليد طائرة خاصّة من نوع بوينغ ٧٤٧ “, even if it is grammatically correct.
As to the verb “to be”, Arabic has no need of it in the present tense. For example, “مَلِكُ الـمَـغْرِبِ غَـنِــيٌّ جِدَّا ” (word for word = The King of Morocco very rich). But in the past you need the verb “كَـانَ ” (to be/to exist). For example, “كَـانَ الملك الحسن الثّاني غنيّا جدّا ” (King Hassan II was very rich).

Funky language features – the third spatial deictic reference in Japanese, Korean and Tamil

The words ‘here’ and ‘there’ are spatial deictic references that are familiar to all English speakers.

‘Here’ means ‘near the speaker’.

‘There’ means ‘not near the speaker’.

Two words related to ‘here’ and ‘there’ are ‘this’ and ‘that’ which work much like ‘the’ but refer to things that are ‘near the speaker’ or ‘not near the speaker’.

So, in English, all spatial deictic references are relative to the speaker.

Here is an illustration of spatial deixis taken from the Wikipedia article on deixis.

But there are languages in which there are more than two spatial deictic references.

Japanese, Korean and Tamil have three each.

In Japanese, they are koko, soko and asoko.

In Korean, they are yogi, kugi and chogi.  (Here is a very nice lesson on deixis in Korean http://www.talktomeinkorean.com/lessons/l1l7).

In Tamil, they are inge, unge and ange.

The reason for the additional deictic reference is that in these languages, distances are perceived not just with respect to the speaker, but also with respect to the listener.

So,  in Japanese, Korean and Tamil respectively, koko, yogi and inge mean ‘near the speaker’.

Then, soko, kugi and unge mean ‘near the listener’.

Finally, asoko, chogi and ange mean ‘far from both the speaker and the listener’.

The “near the listener” deixis seems like a rather useless feature to have in a language (it is disappearing from modern Tamil).

In the modern world, when you talk to someone face to face (not on the phone), you are usually standing just a few feet from them.

So, anything “near the speaker” is also “near the listener”.  One of those spatial references is therefore redundant.

But then, if one of the spatial references was so useless, why did it appear in Korean and Japanese in addition to Tamil?

Perhaps it has something to do with the fact that Korea and South India are peninsulas, and Japan is an island.

All three countries have long coastlines.

So, some ancestors of the inhabitants of Korea, Japan and South India might have lived off of deep-water fishing.

On the ocean there is an immediate use for the “near the listener” deictic.

Imagine a fleet of boats spread out on the ocean looking for fish to spear or net.

The boatmen would have no features to use to communicate directions.

The only features they’d have had to identify positions would have been their own boats.

So, they’d probably have had conversations with each other that went as follows:

Boat 1:  Are there any fish near you (the listener)?

Boat 2:  No, there are no fish near me (the speaker).  Are there any fish near you (the listener)?

Boat 1:  No, there are no fish near me (the speaker).  We should look for fish away from both of us (pointing)?

In such conversations, all three deictics would have been used.

The sentence “Are there any fish near you (the listener)?” would have used the word soko (in Japanese), kugi (in Korean) and unge (in Tamil).

The sentence “No, there are no fish near me (the speaker)” would have used the word koko (in Japanese), yogi (in Korean) and inge (in Tamil).

The sentence “We should look for fish away from both of us (pointing)” would have used the word asoko (in Japanese), chogi (in Korean) and ange (in Tamil).

I am just guessing at all this, of course.  Part of the fun of working in linguistics is that you can extrapolate from tenuous linguistic clues, and indulge in wild flights of fantasy.

But what I am proposing is not entirely unimaginable.

In 2011, in a small cave (called the Jerimalai cave) in East Timor, archaeologists found bones from 2843 individual fish, some of which were caught 42000 years ago.  50% of the bones were those of deep-water tuna fish. The finds also included fish hooks dating from between 23000 and 16000 years ago.

More details on the Jerimalai find here: http://news.discovery.com/history/archaeology/ancient-human-fishermen-111128.htm

Japanese and Tamil – The Work of Susumu Ohno


My father recently pointed me to the research work of Dr. Susumu Ohno, a Japanese linguist who studied ancient Japanese as well as ancient Tamil (a language spoken in South India).

Dr. Ohno (in a paper titled “The Genealogy of the Japanese Language”) made a number of interesting observations about phonological similarities and the existence of cognates (similar-sounding words) in the some forms of both languages.

For example, he noted that the in some dialects of Japanese, the words for “father”, “mother”, “elder brother” and “elder sister” are similar to the words used in Tamil.

In some Honshu and Ryukyu dialects of Japanese, the words for father, mother, elder brother and elder sister are “accha”, “aaya”, “annyaa” and “anne”.  Ohno argues that these words resemble the words “acchan”, “aaya”, “anna” and “annai” in Tamil.

I found that his observations supported some arguments that I had made in a blog entry in 2010 (I’d attempted to draw a 3-way comparison between Japanese, Tamil and Australian aboriginal languages).

He proposes a theory that in early Japanese, there were no e and o sounds – that these sounds were replacements for ai or ia and ua.

I quote:

The vowels in group B are believed to have resulted from the merging of two vowels, as follows:  ia>e, ai>e, ui>i, oi>i, ua>o

Though I don’t have a reference, I am told that T. P. Meenakshi Sundaram made an almost identical assertion in the case of Tamil.

You also see some evidence of such a transformation in the Tulu word “yan-ku” (to me). The corresponding word in Tamil is “en-akku”.  The correspondence makes you think that sometime in the past, they used to say “yan-akku” in Tamil instead of “en-akku”.

You see a similar correspondence in Kannada.  The Kannada word for why can be written and pronounced as “yaake” or as “eke”.  So “ia” seems to be replaceable with “e” there.

Similarly in Tamil, the word “evan” (who) can also be pronounced (colloquially) as “yaveng”.

So, if both ancient Tamil and Japanese used just a, i and u sounds, their phonetics begins to resemble that of Australian languages like Dyirbal.

Regarding consonants, Ohno notes the following correspondences:


Consonants at head of word
k-, s-, t-, n-, F-, m-, y-, w

Consonants mid-word
-k- , -s-, -t-, -n-, -F-, -m-, -y-, -w-,
-r-, -ng-, -nz-, -nd-, -nb


Consonants at head of word
k-, c-, t-, n-‘ n-, p-, m-, y-, v

Consonants mid-word
-k- , -c-, -t-, -n-, -p-, -m-, -y-, -v-,
-t- , -n-, -r-, -1-, -r-, -1-, -r-,
-nt- -nc, , -nt-, -mp-

Unfortunately, I don’t know Dyirbal or any Australian language.  So, I can’t check if these rules apply to them as well.  I can’t wait to get hold of a linguistic analysis of Dyirbal by an Indian or Japanese linguist.

Article on a possible migration: to Australia from India

When we started writing the Aiaioo Labs blog, I’d written an article “An echo of voices” on how Tamil, a South Indian language, is phonetically similar to Dyirbal, a native Australian language.

Now, a genetic study seems to have found some evidence of migration from India to Australia some 4000 years ago.

I still haven’t had the opportunity to learn any Australian languages.  I can’t wait to do so.

An Echo of Voices

A long time ago, on a different blog, I’d written about the grammatical and semantic similarities between Tamil and Japanese (and Korean).

Recently, I read that Tamil bears a striking resemblance to the aboriginal/native languages of Australia.

What I found was (thanks dad for some valuable assistance) that Tamil has or is thought to have had sound patterns that are considered distinguishing features of the languages of Australia.

Before I list the semblances, let me give you a quick overview of some characteristics of Australian languages (most of this information has been gleaned from Wikipedia):

Feature 1

Their languages have four to six ‘n’ sounds, and these sounds are associated with places of articulation (where the tongue touches the roof of the mouth).  So, in the language called Dyirbal, we have the following consonants (I’ve highlighted the nasal sounds):

  Bilabial Alveolar Alveolo-Palatal Retroflex Velar
Plosive p t c   k
Nasal m n ɲ   ŋ
Trill   r      
Flap       ɽ
Approximant central     j   w
lateral   l    

In the languages of the Pama-Nyungan family, we have the following consonant sounds (again I’ve highlighted the nasal sounds):

  Bilabial Apico
Laminal Dorso
Stop p t rt c, cʸ k
Nasal m n rn ñ ng
Lateral   l rl λ  
Rhotic   rr r    
Semivowel w     y  

Feature 2

Australian languages are characterised by an absence of fricatives (hissing/rubbing sounds like ‘s’, ‘h’ and ‘sh’) as you can see from the tables above.

Feature 3

Australian languages have only three vowel sounds: ‘a’, ‘i’, and ‘u’.

Now, you will notice that the above three characteristics of Australian languages are pretty distinctive.  They’re extraordinary, and distinguishing features.

You would probably agree that if any other language had the above features, it might be said to resemble Australian languages in how it sounds.

Now, let me list the characteristics of Tamil that I think can help one make the argument that at the phonetic level, Tamil resembles languages spoken by native Australians:

Feature 1 in Tamil

There are six nasal sounds in Tamil:

Plosives p (b) t̪ (d̪)   ʈ (ɖ) tʃ (dʒ) k (ɡ)
Nasals m n ɳ ɲ ŋ

This feature is also found in Malayalam but not in the other languages of South India.

Now for those of you who are surprised by the number of nasals, don’t be.  English has four nasals.  It’s just that the language does not use them to distinguish between different words.

Don’t believe me?  Oh well, here goes!  The first nasal sound in English is ‘m’.  The second is ‘n’ as in ‘bang’.  The third is ‘n’ as in ‘hand’.  There is a fourth (very rare) nasal.  This is the ‘n’ in ‘London’ (when the word is pronounced in a pompous manner, the ‘n’ gets to be more plosive/hard than otherwise).

Ok, I made a mistake.  English does distinguish between ‘m’ and ‘n’.  Notice how the script gives it away.

Feature 2 in Tamil

The Tamil script does not have letters for ‘h’, ‘s’ and ‘sh’.  The lack of the corresponding consonants in the script does evoke suspicions that the sounds were not present and therefore the corresponding characters not needed at the time the early Tamil scripts came into being.

Another interesting observation that supports this hypothesis is that some dialects of Tamil prefer the use of ‘ch’ sounds to the use of the standard Tamil ‘s’ and ‘sh’ sounds.  In these dialects, ‘seri’ becomes ‘cheri’, ‘sAppAdu’ becomes ‘chAppAdu’, and ‘sonnAn’ becomes ‘chonnAn’.

Feature 3 in Tamil

Establishing the third feature in Tamil is a bit difficult.  Modern Tamil has five simple vowel sounds ‘a’, ‘i’, ‘u’, ‘e’, ‘o’ (taught in that order to kids, just like in Japanese -notice how ‘a’, ‘i’ and ‘u’ come before ‘e’ and ‘o’).  However, there is another tentative link.

In a 1960s book, one Dr. T P Meenakshi Sundaram performed a comparative historical linguistic study of Tamil, and he surmised that early forms of Tamil had only three vowel sounds!

According to Dr. Sundaram, those three sounds were … surprise, surprise … ‘a’, ‘i’ and ‘u’!  He said that the sound for ‘e’ was originally composed of ‘i’ and ‘a’ sounds.

This I have personally observed.  In some rural dialects of Tamil/Malayalam, ‘Enna pEchi pEsurAn’ is still pronounced as ‘Yanna PiAchi PiAsurAn’ (come and talk to my grandma!)


All the similarities I have listed are at a purely phonological level.

However, I did look at whole words (nouns and verbs) in Australian languages and they did not resemble corresponding Tamil words at all.  But there is another level of similarity – semantic.  Semantics is the way word distinctions are used to convey meaning.

One interesting pattern is the use of words to convey distinctions of importance to prevalent kinship systems.  Let me explain.

Kinship Terms

The Australian languages of the Western Desert have the following words for parents and uncles and aunts (from a post on the Australian Anthropology forum by someone called Laurent Dousset):

I’ll give you an Australian example (Western Desert):

Mother: ngunytju
Mother’s sister: ngunytju
Mother’s brother: kamuru
Father: mama
Father’s brother: mama
Father’s sister: kurntili.

A mama is married to a ngunytju and a kamuru is married to a kurntili. These do not have to be actual kamuru(s) and kurntili(s), but are usually classificatory ones.

Now you will agree that this is very similar to the use of words for parents and their brothers and sisters in Tamil, Malayalam and Kannada.

Now, back to Japanese.  I have a test that I wish to perform to help me determine if Australian languages might really be related to Tamil, and I’m going to turn to Japanese for help.

Deictic References

One feature of Japanese that I found incredibly fascinating was the way words were used to refer to distances.

In Japanese, there are three types of distances [I believe these terms are also called deictic references, so I’m going to call them such, though I’m not really sure] and they are (koko – near the speaker, soko – near the listener, and asoko – far from both).

Such deictic references, it turns out, also used to exist in Tamil.  Sri Lankan Tamil still uses the third kind of deictic reference (ivan – he who is near the speaker, uvan – he who is near the listener, and avan – he who is far from both).

You also notice this distinction in the old saying: ‘ikkara ukkara pachcha’ which means ‘from the shore near me, the shore near you looks green’, and you can also argue that you see a bit of it in ‘unnai’ (you-accusative) and ‘avanai’ (him-accusative).

What I would love to do is find someone from Australia who can tell me if these triple deictic references are also features of Australian languages.


Well, I am not going to comment on the interesting question of what this means/implies.  These similarities could simply mean nothing.  The similarities could have been the result of random language mutations.

But then again, maybe, just maybe, the ancestors of the native inhabitants of Australia stood on these very shores a hundred thousand years ago.  And just maybe, as I listen to my grandmother, I am hearing an echo of voices long gone from this world.


Thanks to dad for telling me about the work of T P Meenakshi Sundaram.  Thanks to mom for helping me with the thoughts on deictic references.


One of my friends wrote to me with excellent counterarguments, so I’m adding them to this post, just so you have a complete picture.

The problem he discovered with my logic is as follows.

My main claim is that that the three features (which I’ll refer to as F1, F2 and F3) occurring together is a very very rare event, making their occurring together in two unrelated languages even rarer.  However, for this claim to hold, the joint probability p(F1, F2, F3) would have to be very very low.

My friend pointed out that p(F1,F2,F3) need not be a very low number if the features are strongly interdependent, that is, when you see one such feature, you’re bound to see the others as well.

Now my friend also mentioned that F3 is a universal feature – all language initially started with only three vowels, so if you take any language and drill back in time far enough, you’ll be left with just ‘a’, ‘i’ and ‘u’.  This also implies that F3 is independent of F1 and F2 and p(F3) is 1.

Now, because of the independence of F3, p(F1, F2, F3) can be written as p(F2|F1)p(F1)p(F3).  Since p(F3) == 1, we can take it out of the picture and think of p(F1, F2, F3) as p(F2|F1)p(F1).

Now my friend pointed out that phonological features occur in clusters.  So, a large number of alveolar articulation points in a language would be a good indicator that the language has a paucity of fricatives.  So, p(F2|F1) is also close to 1.  So we’re left with p(F1, F2, F3) = p(F1).  p(F1) is not likely to be low enough to establish beyond reasonable doubt that the two languages are interrelated.

In order to complete my case, I’d still have to do all of the following:

a)  find more such features
b)  show that p(F) is low
c)  show that the conditional probabilities are low (high feature independence)

Thanks Dr. M___ C___ for pointing this out!

Now the traditional methods of comparative historical linguistics use features of languages called cognates (similar sounding words).  In doing so, they are biased in how they assign languages to language families.  Using cognates alone, Japanese would be assigned to the same language family as Chinese, but not if we looked at the syntactic, semantic and phonological features of Japanese.  So, I feel that the comparative methodology is incomplete and would need to be supplemented by some other features at the semantic/syntactic levels maybe wrapped into some kind of probabilistic framework.