- Languages and Numbers and Ways of Counting to 8 !
- Funky language features – some things that you can never say in English and what that might tell us about human languages
- The Heizenberg Uncertainty Principle of Social Media B2B Marketing
- Funky language features – the mystery of the missing possessive verb
- Funky language features – the third spatial deictic reference in Japanese, Korean and Tamil
- RT @EPRI: MT: Machine Learning algorithm can increase RT rate by 680% via @DrSturg Machine learning decodes the science of RTs http://t.co/… 1 month ago
- RT @aficionado: No, machine learning doesn’t resolve how the mosaic theory applies wapo.st/1ou9ldX 1 month ago
- RT @AJ_ay_it: Use of Bayesian models predict Indian election more accurately than traditional polls: livemint.com/Opinion/fmsmq9… #MRX (via @sidi… 1 month ago
- Some things that you can never say in English and what that might tell us about human… goo.gl/fb/xsGOp 3 months ago
- Funky language features – the mystery of the missing possessive verb goo.gl/fb/9X5Kc 5 months ago
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- January 2012
- December 2011
- November 2011
- June 2011
- November 2010
- August 2010
Funky language features – some things that you can never say in English and what that might tell us about human languages
Inexpressibility in English
There is a common expression that is widely used in South Indian languages that can’t be translated into English no matter how hard you try. This post is about things that can’t be expressed in certain languages. There are some things that cannot be expressed in even the most eclectic of languages though they can in others.
Now I have the unenviable task of trying to tell you in English what cannot be said in English!
Imagine two grown-up people A and B who meet on the street in South India. B is with her son. When A meets B, A feels that it would be impolite to not inquire about B’s son.
So, A asks B an open question about B’s son.
B replies, with a big smile and slow polite nods: “This is my 2nd son.”
What is the question that A would have asked B, to elicit that response from B?
It is impossible to frame an open question in English that would elicit the answer that B gave.
But this exchange is something that South Indian parents have all the time.
When two South Indian parents run into each other, it is highly likely that one might ask the other (in their language) something like, “Oh, what a cute little boy/girl/child! Whichth son of yours is this?”
The other parent would then reply very proudly: “This is my eldest son/daughter/child” or “This is my 2nd son/daughter/child”.
There is no way to ask someone in English that question because the word or even the concept of “whichth” doesn’t exist in English (and possible doesn’t exist in any European language).
Here’s how you would say that in Kannada (a language used in South India).
A: Ivanu nimma yeshtaneya maga? (This boy your whichth son?)
B: Ivanu nanna eradaneya maga. (This boy my 2nd son)
Acknowledgement: This phrase was something I overheard someone discussing when I was a child. I think it was someone working on translation theory. I have no recollection of who it was.
Conditional Inexpressibility in South Indian languages
In South Indian languages, there are two ways of saying “and” / “or”. One way is through a word meaning “and” or “or”. In Kannada, the words would be “matthu” (means “and”) and “athava” (means “or”).
Another way is using a suffix. In Kannada you can say something and add the suffix “aa” to indicate “or”. You can add the suffix “uu” to indicate “and”.
You will find that in South Indian languages you can only express ORs of ANDs using the suffixes. You cannot express ANDs of ORs.
So, using the suffix forms, we can say “A and B or C and D” but not “A or B and C or D”.
In Kannada, that would be “A-uu B-uu -aa, C-uu D-uu -aa”. You cannot say “A-aa B-aa -uu, C-aa D-aa -uu”.
You will find a similar restriction in Japanese (though Japanese does not have a suffix form for AND).
Implications for Practical Linguistics
Years ago, we worked on a research project related to natural language programming. We designed a programming language that would allow humans to program computers by saying things to them. So, you could say things like: “x égale 2. Si x multiplié par 3 est moins que 5, dis “Salut” sinon dis “Ciao”!“
The natural language programming system was designed to help students in rural India learn programming (they often don’t know English and so can’t use an English-based programming language).
It works only in the domain of numbers. A Fibonacci number generator would looked like this in bad German: “z ist gleich 1. y ist gleich 1. x ist gleich 0. während x ist weniger als 13, z wird y plus x. Danach x wird y und y wird z. Danach schreib z.“
(We didn’t put much work into it. It’s just a research prototype. But you can play with the technology yourself at http://www.aiaioo.com/cms).
Anyway, since South Indian languages and Japanese favour AND over OR, in this programming language, we specified that AND gets precedence over OR.
Implications for Universal Grammar
I recently read a small book on the latest efforts by Chomsky’s research group to find common grammatical frameworks that can be applied to all languages.
Personally, I do not much like the approach of using grammar to try to explain language.
People can speak a language even if they have only ever heard a few sentences in that language.
They would of course have to limit their use of the language to those few sentences and the variants thereof, but they are still generating language.
It is impossible to construct a grammar of a language from a few sentences.
So it is unlikely that the human language comprehension/generation system uses grammar as we formally understand the concept.
Chomsky believes that there is some language faculty that has a grammar of sorts that generates language and that the output of this faculty is transformed into Chinese or English as the case may be through the use of some simple transformation tools.
If this were true, than one can argue that what is expressible in one language must be expressible in another language.
This must be true at least for commonly used expressions.
But we find that it is not true.
The fact that obvious concepts can’t be expressed in a language with as large a vocabulary as English makes me wonder if there is a common universal grammar, and if languages are as comprehensive as we’d like to believe.
If all languages are derivable from a common grammar, then a concept such as “whichth” which is so common in Indian languages, should have been derivable from that common universal grammar in English just as it is in South Indian languages.
It seems more likely that languages evolve from societal and environmental needs (needs to express things from a cultural or practical perspective) and are nothing but a set of shared signals.
These shared signals eventually evolve to allow for the use of parameters, to allow for a fitting of expressions into slots recursively, that gives rise to an appearance of grammar.
Each language evolves that appearance of grammar independently and there’s nothing more to it. Or at least, that’s someone’s pet theory.
For some other surprisingly non-universal language features, you might want to take a look at two of our articles on deictic references and ‘possessive verbs’:
The verb ‘have’ is used to indicate possession. When a speaker of the English language says, “I have a car“, the listener can infer that the speaker possesses a car.
“Have” is a word that we use a lot. I doubt anyone can imagine English without the word “have” in it.
So, it will come as a surprise to many to know that many Indian languages have no such verb.
Yes, you heard it right. Many Indian languages have no verb like “have”.
Speakers of those languages say “There is a car near me” instead.
Below is “I have a vehicle” in three Indian languages:
Tamil: En kitta vandi irukku (translation into English: there is a vehicle near me)
Kannada: Nanna hatthira gaadi idhe (translation into English: there is a vehicle near me)
Hindi: Mere paas gaadi hai (translation into English: there is a vehicle near me)
Expressing Possession in Asian Languages
Some other Asian languages lack a word for “have”.
Japanese does not have a word for “have”. Neither does Korean.
In Malay, the word for “is” is “ada”.
But “ada” can be used to mean “have” as well, as you can see from the examples below.
In the following examples, “saya” means “mine/my” (the meanings of the other Malay words are obvious).
Malay: Guru saya ada motokar baru. (translation: My teacher has a new car)
Malay: Bapa saya ada di rumah. (translation: My father is in the house)
Mandarin Chinese is an exception to this pattern. It has a verb meaning “have”. It is 有 (yǒu). 有 (yǒu) can also mean “to exist”, but the word commonly used for “is” is different. It is 是 (shì) meaning “to be”.
So, a good number of widely spoken languages in South Asia don’t use a possessive verb.
But this does not mean that these Asian languages lack a mechanism to express possession.
It only means that the expression of possession and ownership uses alternative mechanisms such as idiomatic expressions (“is near” in the case of Indic languages) and context (word order and semantics in the case of Malay) in large parts of South and South-East Asia.
Expressing Possession in European Languages
In Europe, the possessive verb seems to be the preferred tool to denote possession.
We’ve already encountered the verb “have” in English, and we know that it is distinct from the verb “is”.
Below are examples from a few other European languages:
I am = Je suis
I have = J’ai
I am = Jestem
I have = mam
I am = Είμαι (Eímai)
I have = έχω (écho̱)
I am = sum
I have = habeo
Expressing Possession in Sanskrit
Sanskrit, unlike ancient Greek and Latin does not have a possessive verb.
I asked a Sanskrit scholar if possessive verbs like “have” appear anywhere in the Vedas.
He answered in the negative.
There is no evidence for the existence of possessive verbs in Vedic Sanskrit.
Some Interpretations and Flights of Fantasy
Some economists surmise that early human societies (hunter-gatherer societies) did not know the concept of ownership.
In early human societies, food from a hunt was shared, because it could not be hoarded (there was only so much food that one could eat, and what was not eaten would spoil).
So, early languages would not have had a verb like “have”.
The most important conversations in those languages would have been sort of like:
Person 1: “Is there food?“
Person 2: “Nope. There is no food today.“
Another type of conversation that would have been critical to self-preservation would have gone like this:
Person 1: “There is a tiger behind you! Run!“
Person 2: “There is an antelope to your right!”
In societies centered around herding, the herds could have been common property.
Daily conversations would have gone:
Person 1: “How many cows are there?“
Person 2: “There are 200 cows.“
Sentences like “I have thirty cows” weren’t yet needed.
Economists surmise that it was farming that gave rise to concepts like ownership and property.
Farming for the first time allowed people to have a surplus of food.
This excess food could be stored, divided and traded.
Trade might have motivated the invention of language tools for talking about ownership.
It seems that in Europe languages converged on one such tool – the possessive verb.
It seems that in India languages chose another such tool – the idiomatic usage of the verb “is near”.
Historical Linguistics Questions
There is no evidence for the use of possessive verbs in Sanskrit.
However, I do not know if ancient (Vedic) Sanskrit used the idiomatic “is near” mechanism found in modern Indian languages for expressing ownership.
If it didn’t, it would suggest that the Indian vernacular mechanism for expressing ownership evolved after the period of time when the Vedas were composed or in a different geographical area.
If it did, it would suggest that the Vedas were composed after the Indian mechanisms for expressing possession were developed and in the same geographical area (assuming accurate oral transmission that preserved ancient language features).
I’d be very grateful if someone with a better knowledge of Vedic Sanskrit would be able to tell me whether such an idiomatic usage of “is near” to indicate ownership is attested in Vedic Sanskrit texts.
I’d also love to find out what mechanisms for expressing the idea of ownership existed in Old and Avestan Persian.
(Modern Persian – Farsi – has a verb “daestaen” meaning “have”, but Farsi is very different from Old Persian).
I’ve made a lot of assumptions in proposing those historical implications. But this article was written merely to discuss possibilities.
I’ll add examples from other languages below as and when I get them from readers (with their permission to post them here).
Omar Khayyam (http://www.linkedin.com/profile/view?id=97267188) in a comment on LinkedIn (http://www.linkedin.com/groups/Funky-language-features-mystery-missing-1356867.S.5838734689329766403) said:
Arabic has no “have”. You don’t need a verb to say “I have a car” = “عِــنْــدِي سَــيَّـــارَةٌ” (By me a car). Nevertheless, there are the verbs “مَــلَــكَ” and “امْتَلَكَ” (to possess/own), which are used to stress that something belongs to someone, like, for example, in juridical documents. In a newspaper article you’d write “الأمير الوليد يمتلك طائرة خاصّة من نوع بوينغ ٧٤٧” (Prince Al-Walid owns a Boeing 747″ rather than “عِنْدَ الأمير الوليد طائرة خاصّة من نوع بوينغ ٧٤٧ “, even if it is grammatically correct.
As to the verb “to be”, Arabic has no need of it in the present tense. For example, “مَلِكُ الـمَـغْرِبِ غَـنِــيٌّ جِدَّا ” (word for word = The King of Morocco very rich). But in the past you need the verb “كَـانَ ” (to be/to exist). For example, “كَـانَ الملك الحسن الثّاني غنيّا جدّا ” (King Hassan II was very rich).
The words ‘here’ and ‘there’ are spatial deictic references that are familiar to all English speakers.
‘Here’ means ‘near the speaker’.
‘There’ means ‘not near the speaker’.
Two words related to ‘here’ and ‘there’ are ‘this’ and ‘that’ which work much like ‘the’ but refer to things that are ‘near the speaker’ or ‘not near the speaker’.
So, in English, all spatial deictic references are relative to the speaker.
Here is an illustration of spatial deixis taken from the Wikipedia article on deixis.
But there are languages in which there are more than two spatial deictic references.
Japanese, Korean and Tamil have three each.
In Japanese, they are koko, soko and asoko.
In Korean, they are yogi, kugi and chogi. (Here is a very nice lesson on deixis in Korean http://www.talktomeinkorean.com/lessons/l1l7).
In Tamil, they are inge, unge and ange.
The reason for the additional deictic reference is that in these languages, distances are perceived not just with respect to the speaker, but also with respect to the listener.
So, in Japanese, Korean and Tamil respectively, koko, yogi and inge mean ‘near the speaker’.
Then, soko, kugi and unge mean ‘near the listener’.
Finally, asoko, chogi and ange mean ‘far from both the speaker and the listener’.
The “near the listener” deixis seems like a rather useless feature to have in a language (it is disappearing from modern Tamil).
In the modern world, when you talk to someone face to face (not on the phone), you are usually standing just a few feet from them.
So, anything “near the speaker” is also “near the listener”. One of those spatial references is therefore redundant.
But then, if one of the spatial references was so useless, why did it appear in Korean and Japanese in addition to Tamil?
Perhaps it has something to do with the fact that Korea and South India are peninsulas, and Japan is an island.
All three countries have long coastlines.
So, some ancestors of the inhabitants of Korea, Japan and South India might have lived off of deep-water fishing.
On the ocean there is an immediate use for the “near the listener” deictic.
Imagine a fleet of boats spread out on the ocean looking for fish to spear or net.
The boatmen would have no features to use to communicate directions.
The only features they’d have had to identify positions would have been their own boats.
So, they’d probably have had conversations with each other that went as follows:
Boat 1: Are there any fish near you (the listener)?
Boat 2: No, there are no fish near me (the speaker). Are there any fish near you (the listener)?
Boat 1: No, there are no fish near me (the speaker). We should look for fish away from both of us (pointing)?
In such conversations, all three deictics would have been used.
The sentence “Are there any fish near you (the listener)?” would have used the word soko (in Japanese), kugi (in Korean) and unge (in Tamil).
The sentence “No, there are no fish near me (the speaker)” would have used the word koko (in Japanese), yogi (in Korean) and inge (in Tamil).
The sentence “We should look for fish away from both of us (pointing)” would have used the word asoko (in Japanese), chogi (in Korean) and ange (in Tamil).
I am just guessing at all this, of course. Part of the fun of working in linguistics is that you can extrapolate from tenuous linguistic clues, and indulge in wild flights of fantasy.
But what I am proposing is not entirely unimaginable.
In 2011, in a small cave (called the Jerimalai cave) in East Timor, archaeologists found bones from 2843 individual fish, some of which were caught 42000 years ago. 50% of the bones were those of deep-water tuna fish. The finds also included fish hooks dating from between 23000 and 16000 years ago.
More details on the Jerimalai find here: http://news.discovery.com/history/archaeology/ancient-human-fishermen-111128.htm
In our last post, we spoke about various control mechanisms that can be implemented to support direct democracy (which we interpreted to mean the control of the allocation of common resources by the people who pooled in).
We also examined how these controls could be used to curtail man-in-the-middle corruption.
In this article, we examine a more sophisticated form of direct democracy called a deliberative democracy.
In a deliberative democracy, in addition to the control mechanisms prescribed for direct democracy, there need to be mechanisms to allow deliberation (discussion) before a referendum or any other action is taken.
I quote from the Wikipedia article on deliberative democracy:
Deliberative democracy holds that, for a democratic decision to be legitimate, it must be preceded by authentic deliberation, not merely the aggregation of preferences that occurs in voting.
In elitist deliberative democracy, principles of deliberative democracy apply to elite societal decision-making bodies, such as legislatures and courts; in populist deliberative democracy, principles of deliberative democracy apply to groups of lay citizens who are empowered to make decisions.
The article on direct democracy had the following to say:
Democratic theorists have identified a trilemma due to the presence of three desirable characteristics of an ideal system of direct democracy, which are challenging to deliver all at once. These three characteristics are participation – widespread participation in the decision making process by the people affected; deliberation – a rational discussion where all major points of view are weighted according to evidence; and equality – all members of the population on whose behalf decisions are taken have an equal chance of having their views taken into account.
(Aside to computer scientists: doesn’t this trilemma remind you of the CAP theorem that applies to database systems? Here’s a simple explanation of the CAP theorem: http://ksat.me/a-plain-english-introduction-to-cap-theorem/).
So, for example, representative democracy satisfies the requirement for deliberation and equality but sacrifices participation.
Participatory democracy allows inclusive participation and deliberation but sacrifices equality.
And then there is direct democracy which supports participation and equality, but not deliberation.
The problem seems to be that when a large number of people are invited to participate in a deliberation (and given that deliberations take time), it will not be possible to compensate them all for their time. Consequently, only those more interested in the issue being debated (or more likely to benefit from one position or the other) are more likely to participate, biasing the sample in their favour (all sections of the population are no longer equally represented in the discussion/decision).
So, it seems that all the three properties desired in an ideal democratic system - participation, equality and deliberation - cannot be present at the same time in a real democratic system.
But then, a while ago, we began wondering if this trilemma is merely a result of the lack of suitable technology and not really a fundamental property of democracy. So, we proposed a design for (though we have not yet realized it) a tool that can support the participation of a large number of people in deliberations. We call it the MCT (Mass Communication Tool).
It could be used as a method to enable direct democracies to support deliberations in which all citizens can participate, ahead of a vote on any subject.
It uses text clustering algorithms to solve the problems of volume as well as numeric asymmetry in the flow of communications between the deliberating participants and the moderators of the communications.
There’s a brief overview of the system in our lab profile.
MCTs are bound to have a huge impact on our experience of representative government. A typical use case would involve a public figure, (say President Obama), sounding out the electorate before introducing legislation on say healthcare reform.By first discussing the competing proposals with large numbers of people, it might be possible for the initiator of the discussion to get a sense of what might or might not work and what the response to the legislation was likely to be.
An MCT would have to be capable of supporting a live dialog involving a large number of people.
It would use natural language processing and machine learning to enable a few moderators (for example, the CEO of a company) to interact with a large number of people (for example, all the employees of the company) in real time (for example, during a virtual all-hands meeting), get a synopsis of a large number of concurrent discussions in real time, and participate in a significant fraction of the discussions as they are taking place.
The system would consist of:
- an aggregator of messages (built from natural language processing components) that groups together messages and discussions with identical semantic content;
- a hierarchical clustering system (built from natural language processing components) that assigns aggregated messages their place in a hierarchy by specificity with more general messages closer to the root of the hierarchy and more specific messages closer to the leaves of the hierarchy;
- a summarization system (built from natural language processing components) that creates a summary of the aggregate of all messages in a sub-tree; and
- a reply routing system (built from natural language processing components) that routes replies from cluster to cluster based on their relevance to the discussion threads.
Direct democracy can be broadly interpreted to mean the control of the allocation of common resources by the people who pooled in.
One common resource is tax money.
In most countries, those who pay taxes only have a say in whom they can elect to power.
Those who pay taxes rarely have a say in how the tax money is spent.
There is a middleman (someone who works in government) who decides how the tax money is spent.
The problem with having a middleman decide the allocation of common resources, is that the resources could end up being allocated very inefficiently due to man-in-the-middle corruption.
Here is an article about man-in-the-middle corruption:
The way out is to let the people who contributed to the common pool decide on how the resources are allocated.
Tool 1: Apportioning
One way to do this is to embed direct democracy mechanisms into the contribution mechanism.
For example, tax-payers could be given the ability to tie a portion of their tax contribution to expenditure categories.
They could be given the right to apportion, out of every $100 that they have paid in taxes, a certain amount to each of the following major categories: education, healthcare, social security, infrastructure and defence (leaving a certain percentage to the finance minister’s discretion).
It could also be left to the tax-payers to specify how much money the government may borrow on their behalf.
This direct control could very likely have prevented the debt crises of Greece and Ireland (and especially in countries where people are averse to taking on debt), and might also have given people in the USA some control over their government’s borrowings.
Tool 2: Referendum
Another mechanism is the referendum. It is already being used in all democratic countries, but mainly for the selection of the middleman.
Fortunately, things seem to be moving well beyond that stage. In India, baby steps are being taken towards bringing about a direct democratic model of government.
A new political party came to power in Delhi on an anti-corruption platform. The first thing they did was conduct an informal referendum to ask the people of Delhi if they should form a minority government.
So, referendums are one of the mechanisms of direct democracy.
This mechanism can also be used to prevent or reduce man-in-the-middle corruption.
Take the example of a road that needs to be surfaced. Normally, a government official would have issued contracts based on the bribes paid to him by the contending contractors (rendering a selection on the basis of quality very unlikely).
If instead, the people living on the street that needed to be surfaced had been given all the relevant information needed to make a good choice and asked to select the best contractor for themselves instead, the middle-man would have been eliminated and the quality driven up instead of down.
Now, I am going to talk about some problems that I think affect research funding in the USA (and other countries with government-funded research). Most research funding in the USA comes from government bodies like the NSF, the NRO and DARPA.
Now the following is only my personal opinion, but I think that research efforts in some fields in those countries might be distorted to some extent by the needs of these funding bodies.
Well, I can only speak for the research areas in which Aiaioo Labs is active. We focus on a narrow research space – predominantly on text analytics and natural language processing.
In this space, I see a lot of low-hanging fruit that nobody in the USA or Europe ever picks. And that’s quite inexplicable as these are often problems with very obvious applications to the software products space. And yet I see no papers from California on them.
There are topics on which all the papers I see are from India (often by students who don’t even publish them in international conferences) and sometimes by researchers from Singapore – completely ignored by the main research community.
At other times, I’ve noticed areas of research that DARPA had spent much money on in the 1970s and that researchers had pursued very enthusiastically in that decade. I’ve seen those lines of research being abandoned in the 90s (possibly once the funding priorities changed) and not being revived again, though product firms are working on those technologies again in California in 2013.
I find it hard to explain why these areas of research are being ignored, except by the remote possibility that they have passed under the radar of the guys in the Naval Research Office which makes it unlikely that grants will be provided for them.
That is again, if my conjecture is right, a man-in-the-middle problem. The agenda for research is possibly not being driven by the research community or by the market (the needs of start-ups in California) but by people guessing at what sort of proposals might get funded (and that might be encouraging people to stay with what government knows).
So, I shall propose another direct democracy tool to solve this problem as well:
Tool 3: Suggestions + Referendum
Here, each of the participants (researchers) bidding for the grants would put in suggestions about what the next important thing to focus on as a research community might be. Then they could all vote on the suggestions. The allocation of research grants could then be guided by the suggestions and the votes received by each suggestion.
Controls as Rights
In a sense, you can think of these three control mechanisms as three rights that people who contribute toward a common pool of resources will have in a direct democracy:
1) The right to apportion
2) The right to be consulted
3) The right to suggest
There is a nice article on Wikipedia on direct democracy. The article talks of two of the control mechanisms proposed in this article – referendum and initiative (which corresponds somewhat to suggestions) – and proposes one that I hadn’t mentioned – the right to recall. It doesn’t talk about apportioning.
Here is an interesting video of a Mohalla Sabha (it’s an interesting participation mechanism that a political organization is experimenting with in Delhi).