Tag: language

Funky language features – some things that you can never say in English and what that might tell us about human languages

Inexpressibility in English

There is a common expression that is widely used in South Indian languages that can’t be translated into English no matter how hard you try.  This post is about things that can’t be expressed in certain languages.  There are some things that cannot be expressed in even the most eclectic of languages though they can in others.

Now I have the unenviable task of trying to tell you in English what cannot be said in English!

Here goes.

Imagine two grown-up people A and B who meet on the street in South India.  B is with her son.  When A meets B, A feels that it would be impolite to not inquire about B’s son.

So, A asks B an open question about B’s son.

B replies, with a big smile and slow polite nods:  “This is my 2nd son.”

What is the question that A would have asked B, to elicit that response from B?

It is impossible to frame an open question in English that would elicit the answer that B gave.

But this exchange is something that South Indian parents have all the time.

When two South Indian parents run into each other, it is highly likely that one might ask the other (in their language) something like, “Oh, what a cute little boy/girl/child!  Whichth son of yours is this?”

The other parent would then reply very proudly: “This is my eldest son/daughter/child” or “This is my 2nd son/daughter/child”.

There is no way to ask someone in English that question because the word or even the concept of “whichth” doesn’t exist in English (and possible doesn’t exist in any European language).

Here’s how you would say that in Kannada (a language used in South India).

A:  Ivanu nimma yeshtaneya maga?  (This boy your whichth son?)

B:  Ivanu nanna eradaneya maga.  (This boy my 2nd son)

Acknowledgement:  This phrase was something I overheard someone discussing when I was a child.  I think it was someone working on translation theory.  I have no recollection of who it was.

Conditional Inexpressibility in South Indian languages

In South Indian languages, there are two ways of saying “and” / “or”.  One way is through a word meaning “and” or “or”.  In Kannada, the words would be “matthu” (means “and”) and “athava” (means “or”).

Another way is using a suffix.  In Kannada you can say something and add the suffix “aa” to indicate “or”.  You can add the suffix “uu” to indicate “and”.

You will find that in South Indian languages you can only express ORs of ANDs using the suffixes.  You cannot express ANDs of ORs.

So, using the suffix forms, we can say “A and B or C and D” but not “A or B and C or D”.

In Kannada, that would be “A-uu B-uu -aa, C-uu D-uu -aa”.  You cannot say “A-aa B-aa -uu, C-aa D-aa -uu”.

You will find a similar restriction in Japanese (though Japanese does not have a suffix form for AND).

Implications for Practical Linguistics

Years ago, we worked on a research project related to natural language programming.  We designed a programming language that would allow humans to program computers by saying things to them.  So, you could say things like: “x égale 2. Si x multiplié par 3 est moins que 5, dis “Salut” sinon dis “Ciao”!

The natural language programming system was designed to help students in rural India learn programming (they often don’t know English and so can’t use an English-based programming language).

It works only in the domain of numbers.  A Fibonacci number generator would looked like this in bad German: “z ist gleich 1. y ist gleich 1. x ist gleich 0. während x ist weniger als 13, z wird y plus x. Danach x wird y und y wird z. Danach schreib z.

(We didn’t put much work into it.  It’s just a research prototype.  But you can play with the technology yourself at http://www.aiaioo.com/cms).

Anyway, since South Indian languages and Japanese favour AND over OR, in this programming language, we specified that AND gets precedence over OR.

Implications for Universal Grammar

I recently read a small book on the latest efforts by Chomsky’s research group to find common grammatical frameworks that can be applied to all languages.

Personally, I do not much like the approach of using grammar to try to explain language.

People can speak a language even if they have only ever heard a few sentences in that language.

They would of course have to limit their use of the language to those few sentences and the variants thereof, but they are still generating language.

It is impossible to construct a grammar of a language from a few sentences.

So it is unlikely that the human language comprehension/generation system uses grammar as we formally understand the concept.

Chomsky believes that there is some language faculty that has a grammar of sorts that generates language and that the output of this faculty is transformed into Chinese or English as the case may be through the use of some simple transformation tools.

If this were true, than one can argue that what is expressible in one language must be expressible in another language.

This must be true at least for commonly used expressions.

But we find that it is not true.

The fact that obvious concepts can’t be expressed in a language with as large a vocabulary as English makes me wonder if there is a common universal grammar, and if languages are as comprehensive as we’d like to believe.

If all languages are derivable from a common grammar, then a concept such as “whichth” which is so common in Indian languages, should have been derivable from that common universal grammar in English just as it is in South Indian languages.

It seems more likely that languages evolve from societal and environmental needs (needs to express things from a cultural or practical perspective) and are nothing but a set of shared signals.

These shared signals eventually evolve to allow for the use of parameters, to allow for a fitting of expressions into slots recursively, that gives rise to an appearance of grammar.

Each language evolves that appearance of grammar independently and there’s nothing more to it.  Or at least, that’s someone’s pet theory.

For some other surprisingly non-universal language features, you might want to take a look at two of our articles on deictic references and ‘possessive verbs’:

  1. Funky language features – the third spatial deictic reference in Japanese, Korean and Tamil
  2. Funky language features – the mystery of the missing possessive verb

Funky language features – the third spatial deictic reference in Japanese, Korean and Tamil

The words ‘here’ and ‘there’ are spatial deictic references that are familiar to all English speakers.

‘Here’ means ‘near the speaker’.

‘There’ means ‘not near the speaker’.

Two words related to ‘here’ and ‘there’ are ‘this’ and ‘that’ which work much like ‘the’ but refer to things that are ‘near the speaker’ or ‘not near the speaker’.

So, in English, all spatial deictic references are relative to the speaker.

Here is an illustration of spatial deixis taken from the Wikipedia article on deixis.

But there are languages in which there are more than two spatial deictic references.

Japanese, Korean and Tamil have three each.

In Japanese, they are koko, soko and asoko.

In Korean, they are yogi, kugi and chogi.  (Here is a very nice lesson on deixis in Korean http://www.talktomeinkorean.com/lessons/l1l7).

In Tamil, they are inge, unge and ange.

The reason for the additional deictic reference is that in these languages, distances are perceived not just with respect to the speaker, but also with respect to the listener.

So,  in Japanese, Korean and Tamil respectively, koko, yogi and inge mean ‘near the speaker’.

Then, soko, kugi and unge mean ‘near the listener’.

Finally, asoko, chogi and ange mean ‘far from both the speaker and the listener’.

The “near the listener” deixis seems like a rather useless feature to have in a language (it is disappearing from modern Tamil).

In the modern world, when you talk to someone face to face (not on the phone), you are usually standing just a few feet from them.

So, anything “near the speaker” is also “near the listener”.  One of those spatial references is therefore redundant.

But then, if one of the spatial references was so useless, why did it appear in Korean and Japanese in addition to Tamil?

Perhaps it has something to do with the fact that Korea and South India are peninsulas, and Japan is an island.

All three countries have long coastlines.

So, some ancestors of the inhabitants of Korea, Japan and South India might have lived off of deep-water fishing.

On the ocean there is an immediate use for the “near the listener” deictic.

Imagine a fleet of boats spread out on the ocean looking for fish to spear or net.

The boatmen would have no features to use to communicate directions.

The only features they’d have had to identify positions would have been their own boats.

So, they’d probably have had conversations with each other that went as follows:

Boat 1:  Are there any fish near you (the listener)?

Boat 2:  No, there are no fish near me (the speaker).  Are there any fish near you (the listener)?

Boat 1:  No, there are no fish near me (the speaker).  We should look for fish away from both of us (pointing)?

In such conversations, all three deictics would have been used.

The sentence “Are there any fish near you (the listener)?” would have used the word soko (in Japanese), kugi (in Korean) and unge (in Tamil).

The sentence “No, there are no fish near me (the speaker)” would have used the word koko (in Japanese), yogi (in Korean) and inge (in Tamil).

The sentence “We should look for fish away from both of us (pointing)” would have used the word asoko (in Japanese), chogi (in Korean) and ange (in Tamil).

I am just guessing at all this, of course.  Part of the fun of working in linguistics is that you can extrapolate from tenuous linguistic clues, and indulge in wild flights of fantasy.

But what I am proposing is not entirely unimaginable.

In 2011, in a small cave (called the Jerimalai cave) in East Timor, archaeologists found bones from 2843 individual fish, some of which were caught 42000 years ago.  50% of the bones were those of deep-water tuna fish. The finds also included fish hooks dating from between 23000 and 16000 years ago.

More details on the Jerimalai find here: http://news.discovery.com/history/archaeology/ancient-human-fishermen-111128.htm

Should Cecilia have said “insecure” instead of “unsecure”?

In this funny PhD Comic, the main character – Cecilia (the girl in red) – says:

“Do you realize how unsecure your coffee distribution system is?”

That made me wonder – should she have said ‘insecure’?

Even the WordPress spell-checker has a problem with “unsecure”.

It thinks that “unsecure” is a spelling error.

However, the word “insecure” doesn’t sound as if it were the right term to use in the context of computer security.

That is because the word “insecure” is usually used in the context of a person to mean a person who is not confident and self-assured.

To call a computer “insecure” would be a bit like saying that the computer had self-image issues.

Others have written about this cognitive dissonance as well (see http://english.stackexchange.com/questions/19653/insecure-or-unsecure-when-dealing-with-security for a nice discussion).

Given the problem, the author of the cartoon seems to be justified in using a newly-minted word (one not found in any dictionary) in order to describe the lack of security.

This is also very interesting because it throws some light on how words are born.

Before I can explain what I mean, I’ll need you to take a look the Oxford dictionary’s definitions of the word “insecure” (from the Oxford English Dictionary online search at http://oxforddictionaries.com/definition/english/insecure?q=insecure):



  • 1   uncertain or anxious about oneself; not confident:  a rather gauche, insecure young man,  a top model who is notoriously insecure about her looks
  • 2   (of a thing) not firm or fixed; liable to give way or break:  an insecure footbridge 

                 not sufficiently protected; easily broken into:  an insecure computer system

  • 3   (of a job or situation) liable to change for the worse; not permanent or settled:  badly paid and insecure jobsa financially insecure period

There are three ways in which the word “insecure” can be used.

The second usage would have been perfect for the context of computer security.

But the first usage might be conflated with the second in that context.

And that is because (sorry, I no longer recall the references to support this claim) computers appear to the human mind to have human-like characteristics (we say things like “Google tells me that …” or “my computer has gone to sleep”).

So, the only word in the dictionary that can do the job – the word “insecure” – has a conflict of interest.

And therefore, a new word needs to be coined that is not susceptible to the same sort of ambiguity.

And if the new word “unsecure” catches on, then one day, the second sense of the word “insecure” could become extinct in the context of computers.

Oh well, “it’s only words!”


A friend pointed out that the Google NGram Viewer shows a history of the use of the word “unsecure”: http://books.google.com/ngrams/graph?content=unsecure.

The word seems to have been in use between 1650 and 1850 (there is evidence of use in literature), and has in more recent times simply fallen out of circulation (being eclipsed by “insecure” in around 1750).  Thanks, Prashant.

(You can also search for those early usages in books – http://books.google.com/books?id=WmpCAAAAcAAJ&pg=PA12&dq=%22unsecure%22&hl=en&sa=X&ei=aOcLUq7aA-3iyAHu8YGwAg&ved=0CDMQ6AEwAA#v=onepage&q=%22unsecure%22&f=false)

Japanese and Tamil – The Work of Susumu Ohno


My father recently pointed me to the research work of Dr. Susumu Ohno, a Japanese linguist who studied ancient Japanese as well as ancient Tamil (a language spoken in South India).

Dr. Ohno (in a paper titled “The Genealogy of the Japanese Language”) made a number of interesting observations about phonological similarities and the existence of cognates (similar-sounding words) in the some forms of both languages.

For example, he noted that the in some dialects of Japanese, the words for “father”, “mother”, “elder brother” and “elder sister” are similar to the words used in Tamil.

In some Honshu and Ryukyu dialects of Japanese, the words for father, mother, elder brother and elder sister are “accha”, “aaya”, “annyaa” and “anne”.  Ohno argues that these words resemble the words “acchan”, “aaya”, “anna” and “annai” in Tamil.

I found that his observations supported some arguments that I had made in a blog entry in 2010 (I’d attempted to draw a 3-way comparison between Japanese, Tamil and Australian aboriginal languages).

He proposes a theory that in early Japanese, there were no e and o sounds – that these sounds were replacements for ai or ia and ua.

I quote:

The vowels in group B are believed to have resulted from the merging of two vowels, as follows:  ia>e, ai>e, ui>i, oi>i, ua>o

Though I don’t have a reference, I am told that T. P. Meenakshi Sundaram made an almost identical assertion in the case of Tamil.

You also see some evidence of such a transformation in the Tulu word “yan-ku” (to me). The corresponding word in Tamil is “en-akku”.  The correspondence makes you think that sometime in the past, they used to say “yan-akku” in Tamil instead of “en-akku”.

You see a similar correspondence in Kannada.  The Kannada word for why can be written and pronounced as “yaake” or as “eke”.  So “ia” seems to be replaceable with “e” there.

Similarly in Tamil, the word “evan” (who) can also be pronounced (colloquially) as “yaveng”.

So, if both ancient Tamil and Japanese used just a, i and u sounds, their phonetics begins to resemble that of Australian languages like Dyirbal.

Regarding consonants, Ohno notes the following correspondences:


Consonants at head of word
k-, s-, t-, n-, F-, m-, y-, w

Consonants mid-word
-k- , -s-, -t-, -n-, -F-, -m-, -y-, -w-,
-r-, -ng-, -nz-, -nd-, -nb


Consonants at head of word
k-, c-, t-, n-‘ n-, p-, m-, y-, v

Consonants mid-word
-k- , -c-, -t-, -n-, -p-, -m-, -y-, -v-,
-t- , -n-, -r-, -1-, -r-, -1-, -r-,
-nt- -nc, , -nt-, -mp-

Unfortunately, I don’t know Dyirbal or any Australian language.  So, I can’t check if these rules apply to them as well.  I can’t wait to get hold of a linguistic analysis of Dyirbal by an Indian or Japanese linguist.