Languages and Numbers and Ways of Counting to 8 !

This article is about how small numbers are represented in various languages.
Acknowledgement: much of this article is taken from the Wikipedia page about positional notation.
Bases
The base is the mathematical term for the number of digits you would use to count in a language.
For example, if you used the fingers of both hands to count, you would be using a base of 10.
If you used the fingers of one hand to count, you would be using a base of 5.
If you used the fingers of both hands and the toes of both feet, you would be using a base of 20.
Base-20
Some languages have names for numbers that lead you to suspect that their users might have thought in terms of groups of 20.
French has an interesting way of describing numbers above 60.  In French, the word for 60 is “soixante”, the word for 75 is “soixante quinze” (sixty and fifteen) while 80 is “quatre-vingt” (four-twenties) and 95 is “quatre-vingt quinze” (four-twenties and fifteen).
And it is not just French.  English uses the word ‘score’ to describe a group of 20 things.  So, when we talk of “two score” we mean forty, and when we say “four score and seven” we mean 87.
The article also talks about Welsh and Irish and Maori:
The Irish language also used base-20 in the past, twenty being fichid, forty dhá fhichid, sixty trí fhichid and eighty ceithre fhichid. A remnant of this system may be seen in the modern word for 40, daoichead.
The Welsh language continues to use a base-20 counting system, particularly for the age of people, dates and in common phrases. 15 is also important, with 16–19 being “one on 15″, “two on 15″ etc. 18 is normally “two nines”. A decimal system is commonly used.
Danish numerals display a similar base-20 structure.
The Maori language of New Zealand also has evidence of an underlying base-20 system as seen in the terms Te Hokowhitu a Tu referring to a war party (literally “the seven 20s of Tu”) and Tama-hokotahi, referring to a great warrior (“the one man equal to 20″).
Base-12
Another interesting system is the base-12 system.
The Wikipedia article says:
Twelve is a useful base because it has many factors. It is the smallest common multiple of one, two, three, four and six. There is still a special word for “dozen” in English, and by analogy with the word for 102hundred, commerce developed a word for 122gross. The standard 12-hour clock and common use of 12 in English units emphasize the utility of the base. In addition, prior to its conversion to decimal, the old British currency Pound Sterling (GBP) partially used base-12; there were 12 pence (d) in a shilling (s), 20 shillings in a pound (£), and therefore 240 pence in a pound. Hence the term LSD or, more properly, £sd.
Base-2
There was even a language that made use of a base-2 (binary) system for counting.  Base-2 (binary) is mainly used in computers today (because switches can represent binary numbers – a switch that is off represents the 0 digit and a switch that is on represents the 1 digit).  But apparently, native Australian languages use binary too.
A number of Australian Aboriginal languages employ binary or binary-like counting systems. For example, in Kala Lagaw Ya, the numbers one through six are urapon,ukasarukasar-uraponukasar-ukasarukasar-ukasar-uraponukasar-ukasar-ukasar.
Base-8
The article also says that there is some evidence of the use of base-8 in language:
A base-8 system (octal) was devised by the Yuki tribe of Northern California, who used the spaces between the fingers to count, corresponding to the digits one through eight.[6] There is also linguistic evidence which suggests that the Bronze Age Proto-Indo Europeans (from whom most European and Indic languages descend) might have replaced a base-8 system (or a system which could only count up to 8) with a base-10 system. The evidence is that the word for 9, newm, is suggested by some to derive from the word for “new”, newo-, suggesting that the number 9 had been recently invented and called the “new number”.[7]
So much for bases.
Some languages have two sets of names for numerals!
Two Sets of Names for Numbers in Japanese and Korean
Japanese and Korean use two sets of names for numbers while counting.
In Japanese, there is a set of names that are typically used when small quantities are involved:
“hitotsu”, “futatsu”, “mittsu”, “yottsu”, “itsutsu“, “muttsu”, “nanatsu“, “yattsu“, “kokonotsu“, “to” (1 to 10).
But for larger numbers and for zero, the names used are ones derived from Chinese.
“ichi”, “ni”, “san”, “shi”, “go”, “roku”, “shichi”, “hachi”, “kyu”, “ju”.
These numbers correspond to the Chinese digits:
“yī”, “èr”, “sān”, “sì”, “wǔ”, “liù”, “qī”, “bā”, “jiǔ”, “shí”.
And similarly in Korean, you would use one set of names for small quantities (for example, hours in the day):
“hana”, “dul”, “seth”, “neth”, “thasoth”, “yosoth”, “ilgop”, “yodolp”, “ahop”, “yol”.
But to describe larger quantities, like minutes or the days in a month, you’d go with names based on Chinese:
“il”, “i”, “sam”, “sa”, “o”, “yug”, “chhil”, “phal”, “ku”, “ship”.
Finally, we come to some interesting irregularities in south Indian languages.
Irregular Numbering
In Tamil (a language spoken in south India), the word for 90 is “pre-hundred”.
The first ten numbers in Tamil go:
“ondru”, “irendu”, “muundru”, “naangu”, “aindhu”, “aaru”, “eelu”, “ettu”, “ombadhu”, “patthu”
But the word “ombadhu” which means 9 is not used in 90.
Tamil
In Tamil, the name for 80 is derived from the name for 8 by adding a suffix like in English.  Just as “eight” becomes “eight-y”, in Tamil, “ettu” becomes “embathu”.
But the name for 90 is not derived from the number for 9.  Instead,it is “pre-hundred”.  (In Tamil, 90 is “thonnuuru” – hundred being “nuuru”).  So, when counting from 90 to 99, you use the suffix one would normally associate with the hundred’s position.
So 91 is “pre-hundred and one”.  It is pronounced “thonnuutri-ondru” in Tamil.  92 is “pre-hundred and two”.  It is pronounced “thonnuutri-rendu” in Tamil.
I’ve not come across many languages in which 90 is described as pre-hundred.  But Hindi (a language from the north of India) has a similar feature.
Hindi
In many Indian languages spoken in the north of India, the names of the first ten numbers are similar to their names in Latin.  For example, Hindi has:
“ek”, “dho”, “thiin”, “chaar”, “paanch”, “che”, “saath”, “aaT”, “nov”, “dhas”
The Hindi names for various numbers are similar to the Sanskrit names of those numbers:
“ekam”, “dve”, “thriini”, “chathvaari”, “pancha”, “shath”, “saptha”, “ashta”, “nava”, “dhasha”
But when you get to 29 in Hindi, you say “pre-30″.  The word in Hindi is “unthees” (“thees” means 30 in Hindi).
Similarly, 39 is “pre-40″ (“unchaaliis” where “chaaliis” means 40).
This is different from how you count in Sanskrit.
Sanskrit
In Sanskrit, 39 is “navatrimshat” (nine and thirty) and 29 is “navavimshatihi” (nine and twenty).
Now the absence of a regular name for numbers with 9 in them supports a theory that Indic languages might once have used base-8 for counting.
I quote from the Wikipedia article again:
There is also linguistic evidence which suggests that the Bronze Age Proto-Indo Europeans (from whom most European and Indic languages descend) might have replaced a base-8 system (or a system which could only count up to 8) with a base-10 system. The evidence is that the word for 9, newm, is suggested by some to derive from the word for “new”, newo-, suggesting that the number 9 had been recently invented and called the “new number”.[7]
The assertion seems to have been made in an article titled ‘The Indo-European system of numerals from ‘1’ to ‘10’’ by Eugenio Ramón Luján Martínez.
Eugenio argues that each of the numerals in Indo-European languages gradually came into use when required by necessity, starting with the numbers 2 and 3 (which started as deictics – like in the words ‘duo’ and ‘trio’).
There’s an overview of his arguments in this article: http://smallislandnotesan.blogspot.in/2008/01/indo-european-numbers-1-10.html
Counting on the Fingers
To a twenty-first century human, a base-10 system of counting seems like the natural way to count.
But a base-8 system could have felt more natural than a base-10 system to early humans to count with.
This is because it is only possible to count to ten on the fingers of one’s hands if one has developed the technique of bending them to mark the number up to which one has counted.
If a person uses the technique of touching the thumb to a finger to mark a count, then one can only count up to 4 on each hand (and therefore only up to 8 on both hands).
Indian musicians still keep count of the rythmic patterns in music (the thaalas) by touching the tips of their fingers with the thumb (counting in multiples of 3 or 4).
So it is indeed possible that at some point in the distant past, speakers of Indo-European languages did indeed count in groups of 8.
Posted in Uncategorized | Tagged , , , , , , , | 1 Comment

Funky language features – some things that you can never say in English and what that might tell us about human languages

Inexpressibility in English

There is a common expression that is widely used in South Indian languages that can’t be translated into English no matter how hard you try.  This post is about things that can’t be expressed in certain languages.  There are some things that cannot be expressed in even the most eclectic of languages though they can in others.

Now I have the unenviable task of trying to tell you in English what cannot be said in English!

Here goes.

Imagine two grown-up people A and B who meet on the street in South India.  B is with her son.  When A meets B, A feels that it would be impolite to not inquire about B’s son.

So, A asks B an open question about B’s son.

B replies, with a big smile and slow polite nods:  “This is my 2nd son.”

What is the question that A would have asked B, to elicit that response from B?

It is impossible to frame an open question in English that would elicit the answer that B gave.

But this exchange is something that South Indian parents have all the time.

When two South Indian parents run into each other, it is highly likely that one might ask the other (in their language) something like, “Oh, what a cute little boy/girl/child!  Whichth son of yours is this?”

The other parent would then reply very proudly: “This is my eldest son/daughter/child” or “This is my 2nd son/daughter/child”.

There is no way to ask someone in English that question because the word or even the concept of “whichth” doesn’t exist in English (and possible doesn’t exist in any European language).

Here’s how you would say that in Kannada (a language used in South India).

A:  Ivanu nimma yeshtaneya maga?  (This boy your whichth son?)

B:  Ivanu nanna eradaneya maga.  (This boy my 2nd son)

Acknowledgement:  This phrase was something I overheard someone discussing when I was a child.  I think it was someone working on translation theory.  I have no recollection of who it was.

Conditional Inexpressibility in South Indian languages

In South Indian languages, there are two ways of saying “and” / “or”.  One way is through a word meaning “and” or “or”.  In Kannada, the words would be “matthu” (means “and”) and “athava” (means “or”).

Another way is using a suffix.  In Kannada you can say something and add the suffix “aa” to indicate “or”.  You can add the suffix “uu” to indicate “and”.

You will find that in South Indian languages you can only express ORs of ANDs using the suffixes.  You cannot express ANDs of ORs.

So, using the suffix forms, we can say “A and B or C and D” but not “A or B and C or D”.

In Kannada, that would be “A-uu B-uu -aa, C-uu D-uu -aa”.  You cannot say “A-aa B-aa -uu, C-aa D-aa -uu”.

You will find a similar restriction in Japanese (though Japanese does not have a suffix form for AND).

Implications for Practical Linguistics

Years ago, we worked on a research project related to natural language programming.  We designed a programming language that would allow humans to program computers by saying things to them.  So, you could say things like: “x égale 2. Si x multiplié par 3 est moins que 5, dis “Salut” sinon dis “Ciao”!

The natural language programming system was designed to help students in rural India learn programming (they often don’t know English and so can’t use an English-based programming language).

It works only in the domain of numbers.  A Fibonacci number generator would looked like this in bad German: “z ist gleich 1. y ist gleich 1. x ist gleich 0. während x ist weniger als 13, z wird y plus x. Danach x wird y und y wird z. Danach schreib z.

(We didn’t put much work into it.  It’s just a research prototype.  But you can play with the technology yourself at http://www.aiaioo.com/cms).

Anyway, since South Indian languages and Japanese favour AND over OR, in this programming language, we specified that AND gets precedence over OR.

Implications for Universal Grammar

I recently read a small book on the latest efforts by Chomsky’s research group to find common grammatical frameworks that can be applied to all languages.

Personally, I do not much like the approach of using grammar to try to explain language.

People can speak a language even if they have only ever heard a few sentences in that language.

They would of course have to limit their use of the language to those few sentences and the variants thereof, but they are still generating language.

It is impossible to construct a grammar of a language from a few sentences.

So it is unlikely that the human language comprehension/generation system uses grammar as we formally understand the concept.

Chomsky believes that there is some language faculty that has a grammar of sorts that generates language and that the output of this faculty is transformed into Chinese or English as the case may be through the use of some simple transformation tools.

If this were true, than one can argue that what is expressible in one language must be expressible in another language.

This must be true at least for commonly used expressions.

But we find that it is not true.

The fact that obvious concepts can’t be expressed in a language with as large a vocabulary as English makes me wonder if there is a common universal grammar, and if languages are as comprehensive as we’d like to believe.

If all languages are derivable from a common grammar, then a concept such as “whichth” which is so common in Indian languages, should have been derivable from that common universal grammar in English just as it is in South Indian languages.

It seems more likely that languages evolve from societal and environmental needs (needs to express things from a cultural or practical perspective) and are nothing but a set of shared signals.

These shared signals eventually evolve to allow for the use of parameters, to allow for a fitting of expressions into slots recursively, that gives rise to an appearance of grammar.

Each language evolves that appearance of grammar independently and there’s nothing more to it.  Or at least, that’s someone’s pet theory.

For some other surprisingly non-universal language features, you might want to take a look at two of our articles on deictic references and ‘possessive verbs':

  1. Funky language features – the third spatial deictic reference in Japanese, Korean and Tamil
  2. Funky language features – the mystery of the missing possessive verb
Posted in Uncategorized | Tagged , , , , , | 10 Comments

The Heizenberg Uncertainty Principle of Social Media B2B Marketing

Reposted with minor modifications from the Selasdia Blog: http://www.selasdia.com/blog/?p=228

We have on numerous occasions come across B2B social media marketers pondering a seemingly inexplicable phenomenon.  Their social media leads just don’t convert!

Being a vendor of B2B marketing and sales tools we took a long deep think about it, and finally came up with a possible answer with a touch of quantum physics to it.

Here goes:

All our studies so far suggest that it’s possible to find leads, but that the leads will not convert into a sale immediately, that some time will elapse before a sale takes place.

That is in and of itself a very interesting phenomenon.  It appears that using social media, we can find the “who” or the “when”, but not the two together (at least not very often).

This observation reminded me of Heizenberg’s Uncertainty Principle in quantum physics. The principle is about a lower bound on the product of momentum and position of a particle.

Position (blue) and momentum (red) probability densities
Illustration of position (blue) and momentum (red) probability densities courtesy of Wikipedia

In other words, it is possible to tell where a particle is or how fast it is going as accurately as desired, but never both.

Our work on Selasdia for the past few years leads us to conjecture that there is something similar going on in the space of B2B lead generation using social media.

I don’t have an equation worked out that I can use to prove the conjecture.

But, I’ll describe two experiments that we ran and argue that what we observed in those experiments (and what others have observed in other sales conversion measurements) can be explained by the who/when uncertainty conjecture.

Let’s start with the experiments.

Experiment 1:  Nailing the When but not the Who

Selasdia is a robotic salesman that can build lists of potential buyers and support marketing and sales efforts with very focused nurturing.

In one of our experiments with Selasdia, we were able to use its intention analysis capabilities to spot prospects asking on social media for what one of our clients was selling.

[ For those who don't know, intention analysis is a filtering capability centered around intentions, for example, an intention to purchase, inquire, complain, quit, etc.  Here's ademo of intention analysis on our research lab's web-page. ]

Well, it worked really well.  People were asking for that product quite frequently on social media and Selasdia was picking up as many as four requests each day.

4 leads a day sounds like bliss, right?  Well, wait till you hear what happened when the salespeople called up the leads.

When they called up the social media leads, they invariably found that the people who had asked for vendors on social media did not have a large enough budget (or were not willing to pay quite enough) for what they had asked for on social media.

So, here was a case of being able to tell “when” precisely someone experienced a pain or felt a need, but not being able to find such expressions of interest from the right people (the ones with the right budget).

Experiment 2:  Nailing the Who but not being able to tell When

In another engagement, a company that needed to sell a tool to online clothes retailers approached us.

Selasdia was able to automatically build a list of online clothes retailers and their CEOs. So, all the whos were known.

However, Selasdia’s intention analysis component was never able to catch any of the prospects expressing a need for the product on social media.

It was an example of a situation where we could tell “who” we needed to reach out to, but not “when” we needed to reach out to them.

Arguments in support of the uncertainty conjecture

In a previous article titled “Learning from our failures – two reasons why social media sales conversion rates could be low” we proposed an explanation for the outcome of Experiment 1.

We proposed that people who know the value of what they want would not use a process where a vendor would have to find them and would instead actively look for a vendor.

So, people posting their needs on social media probably don’t take it very seriously.  That would explain the low conversion rates.

[Note that the poor conversion rate of social media marketing was noted in an article by an agency called Inside Sales a while ago (http://www.insidesales.com/insider/lead-generation/why-social-media-is-overrated-for-lead-generation).]

We’ve tried to establish through our reasoning so far that having accurate knowledge of the when (knowledge of an explicitly expressed need) might be only half the battle.

That is because the person who expressed the need might not be willing to spend a lot of money on it.

Now regarding Experiment 2, we noticed that as the ticket value of the item being sold goes up, the probability of seeing someone asking for it on social media seems to go down.

So, the cost of the item being sold seems to be inversely related to the number of buyers asking publicly for it.

For very expensive offerings (typical of B2B software sales) there is a very small set of buyers who need it and who can take a decision on buying it.

So, it is easy to make a list of them (as shown in Experiment 2), to prioritize them and to reach out to them.

So we can build comprehensive lists of the whos to sell to.

But the whens become difficult to determine (because high-priced purchases don’t get discussed openly on social media).

Explanation in terms of BANT

A popular set of criteria used for B2B lead qualification is the BANT criteria.

BANT stands for Budget, Authority, Need and Timing.

The Budget and Authority concepts attach to the person / the who.  Does the person who might make the purchase have the budget and the authority?

The Need and Timing concepts attach to the time / the when.  Does the person have a need at this time?  If not, when will the person have the need?

So, we can break down the BANT criteria into BA and NT as follows:

BA = who

NT = when

What we have argued above is that through social media marketing alone, it might be difficult to find both the BA (the who) and NT (the when) at the same time.

So what can a B2B marketer do in the absence of one half of the BANT qualifying criteria for lead generation?

Strategies for Compensation

Strategy 1:  One strategy for compensation is nurturing.

When the BA is known but the NT is not, it makes sense to nurture the prospect for a suitable period of time.

The idea is that by engaging prospects who fulfill the BA requirements of the BANT qualification criteria, it is possible to make them aware of their need and that they might pick the brand that they came most in contact with, when they eventually realize that they have a need (the NT part of the BANT criteria).

This is typically a strategy that the marketing team would execute.  Any leads resulting from the strategy would come in through the inbound marketing channels that have been put in place.

Strategy 2:  Another strategy for compensation is switching channels

Since there is evidence that social media channels result in low conversion, the strategy is you to switch from social media to email channels for pursuing a lead where the need and the timing are known.

Since it is not known who the right decision maker is, it makes sense to analyse the decision makers in the organization and then to approach the most suitable ones through email.

This is typically a strategy that the marketing team would execute with the help of the inside sales team of an organization.

Selasdia’s Support for Compensation Strategies

In support of Strategy 1, Selasdia now not only helps B2B marketing/sales teams build lists of customers, but also supports nurturing strategies that are very tightly integrated with content publishing and social listening channels.

In support of Strategy 2, Selasdia now has information gathering processes that can locate decision makers who are not on social media and can help engage them using email in addition to social media.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Funky language features – the mystery of the missing possessive verb

The verb ‘have’ is used to indicate possession.  When a speaker of the English language says, “I have a car“, the listener can infer that the speaker possesses a car.

Have” is a word that we use a lot.  I doubt anyone can imagine English without the word “have” in it.

So, it will come as a surprise to many to know that many Indian languages have no such verb.

Yes, you heard it right.  Many Indian languages have no verb like “have”.

Speakers of those languages say “There is a car near me” instead.

Below is “I have a vehicle” in three Indian languages:

Tamil:  En kitta vandi irukku  (translation into English: there is a vehicle near me)

Kannada:  Nanna hatthira gaadi idhe   (translation into English: there is a vehicle near me)

Hindi:  Mere paas gaadi hai    (translation into English: there is a vehicle near me)

Expressing Possession in Asian Languages

Some other Asian languages lack a word for “have”.

Japanese does not have a word for “have”.  Neither does Korean.

In Malay, the word for “is” is “ada”.

But “ada” can be used to mean “have” as well, as you can see from the examples below.

In the following examples, “saya” means “mine/my” (the meanings of the other Malay words are obvious).

Malay: Guru saya ada motokar baru.   (translation:  My teacher has a new car)

Malay: Bapa saya ada di rumah.      (translation:  My father is in the house)

Mandarin Chinese is an exception to this pattern.  It has a verb meaning “have”.  It is 有 (yǒu).  有 (yǒu) can also mean “to exist”, but the word commonly used for “is” is different.  It is 是 (shì) meaning “to be”.

So, a good number of widely spoken languages in South Asia don’t use a possessive verb.

But this does not mean that these Asian languages lack a mechanism to express possession.

It only means that the expression of possession and ownership uses alternative mechanisms such as idiomatic expressions (“is near” in the case of Indic languages) and context (word order and semantics in the case of Malay) in large parts of South and South-East Asia.

Expressing Possession in European Languages

In Europe, the possessive verb seems to be the preferred tool to denote possession.

We’ve already encountered the verb “have” in English, and we know that it is distinct from the verb “is”.

Below are examples from a few other European languages:

French:

I am = Je suis

I have = J’ai

Polish:

I am = Jestem

I have = mam

Modern Greek:

I am = Είμαι (Eímai)

I have = έχω (écho̱)

Latin:

I am = sum

I have = habeo

Expressing Possession in Sanskrit

Sanskrit, unlike ancient Greek and Latin does not have a possessive verb.

I asked a Sanskrit scholar if possessive verbs like “have” appear anywhere in the Vedas.

He answered in the negative.

There is no evidence for the existence of possessive verbs in Vedic Sanskrit.

Some Interpretations and Flights of Fantasy

Some economists surmise that early human societies (hunter-gatherer societies) did not know the concept of ownership.

In early human societies, food from a hunt was shared, because it could not be hoarded (there was only so much food that one could eat, and what was not eaten would spoil).

So, early languages would not have had a verb like “have”.

The most important conversations in those languages would have been sort of like:

Person 1:  “Is there food?

Person 2:  “Nope.  There is no food today.

Another type of conversation that would have been critical to self-preservation would have gone like this:

Person 1:  “There is a tiger behind you!  Run!

Person 2:  “There is an antelope to your right!”

In societies centered around herding, the herds could have been common property.

Daily conversations would have gone:

Person 1:  “How many cows are there?

Person 2:  “There are 200 cows.

Sentences like “I have thirty cows” weren’t yet needed.

Economists surmise that it was farming that gave rise to concepts like ownership and property.

Farming for the first time allowed people to have a surplus of food.

This excess food could be stored, divided and traded.

Trade might have motivated the invention of language tools for talking about ownership.

It seems that in Europe languages converged on one such tool – the possessive verb.

It seems that in India languages chose another such tool – the idiomatic usage of the verb “is near”.

Historical Linguistics Questions

There is no evidence for the use of possessive verbs in Sanskrit.

However, I do not know if ancient (Vedic) Sanskrit used the idiomatic “is near” mechanism found in modern Indian languages for expressing ownership.

If it didn’t, it would suggest that the Indian vernacular mechanism for expressing ownership evolved after the period of time when the Vedas were composed or in a different geographical area.

If it did, it would suggest that the Vedas were composed after the Indian mechanisms for expressing possession were developed and in the same geographical area (assuming accurate oral transmission that preserved ancient language features).

I’d be very grateful if someone with a better knowledge of Vedic Sanskrit would be able to tell me whether such an idiomatic usage of “is near” to indicate ownership is attested in Vedic Sanskrit texts.

I’d also love to find out what mechanisms for expressing the idea of ownership existed in Old and Avestan Persian.

(Modern Persian – Farsi – has a verb “daestaen” meaning “have”, but Farsi is very different from Old Persian).

I’ve made a lot of assumptions in proposing those historical implications.  But this article was written merely to discuss possibilities.

ADDENDUM:

I’ll add examples from other languages below as and when I get them from readers (with their permission to post them here).

Arabic

Omar Khayyam (http://www.linkedin.com/profile/view?id=97267188) in a comment on LinkedIn (http://www.linkedin.com/groups/Funky-language-features-mystery-missing-1356867.S.5838734689329766403) said:

Arabic has no “have”. You don’t need a verb to say “I have a car” = “عِــنْــدِي سَــيَّـــارَةٌ” (By me a car). Nevertheless, there are the verbs “مَــلَــكَ” and “امْتَلَكَ” (to possess/own), which are used to stress that something belongs to someone, like, for example, in juridical documents. In a newspaper article you’d write “الأمير الوليد يمتلك طائرة خاصّة من نوع بوينغ ٧٤٧” (Prince Al-Walid owns a Boeing 747″ rather than “عِنْدَ الأمير الوليد طائرة خاصّة من نوع بوينغ ٧٤٧ “, even if it is grammatically correct.
As to the verb “to be”, Arabic has no need of it in the present tense. For example, “مَلِكُ الـمَـغْرِبِ غَـنِــيٌّ جِدَّا ” (word for word = The King of Morocco very rich). But in the past you need the verb “كَـانَ ” (to be/to exist). For example, “كَـانَ الملك الحسن الثّاني غنيّا جدّا ” (King Hassan II was very rich).

Posted in History, Linguistics, Uncategorized | Tagged , , , , , , , , , , , , , , , , | Leave a comment

Funky language features – the third spatial deictic reference in Japanese, Korean and Tamil

The words ‘here’ and ‘there’ are spatial deictic references that are familiar to all English speakers.

‘Here’ means ‘near the speaker’.

‘There’ means ‘not near the speaker’.

Two words related to ‘here’ and ‘there’ are ‘this’ and ‘that’ which work much like ‘the’ but refer to things that are ‘near the speaker’ or ‘not near the speaker’.

So, in English, all spatial deictic references are relative to the speaker.

Here is an illustration of spatial deixis taken from the Wikipedia article on deixis.

But there are languages in which there are more than two spatial deictic references.

Japanese, Korean and Tamil have three each.

In Japanese, they are koko, soko and asoko.

In Korean, they are yogi, kugi and chogi.  (Here is a very nice lesson on deixis in Korean http://www.talktomeinkorean.com/lessons/l1l7).

In Tamil, they are inge, unge and ange.

The reason for the additional deictic reference is that in these languages, distances are perceived not just with respect to the speaker, but also with respect to the listener.

So,  in Japanese, Korean and Tamil respectively, koko, yogi and inge mean ‘near the speaker’.

Then, soko, kugi and unge mean ‘near the listener’.

Finally, asoko, chogi and ange mean ‘far from both the speaker and the listener’.

The “near the listener” deixis seems like a rather useless feature to have in a language (it is disappearing from modern Tamil).

In the modern world, when you talk to someone face to face (not on the phone), you are usually standing just a few feet from them.

So, anything “near the speaker” is also “near the listener”.  One of those spatial references is therefore redundant.

But then, if one of the spatial references was so useless, why did it appear in Korean and Japanese in addition to Tamil?

Perhaps it has something to do with the fact that Korea and South India are peninsulas, and Japan is an island.

All three countries have long coastlines.

So, some ancestors of the inhabitants of Korea, Japan and South India might have lived off of deep-water fishing.

On the ocean there is an immediate use for the “near the listener” deictic.

Imagine a fleet of boats spread out on the ocean looking for fish to spear or net.

The boatmen would have no features to use to communicate directions.

The only features they’d have had to identify positions would have been their own boats.

So, they’d probably have had conversations with each other that went as follows:

Boat 1:  Are there any fish near you (the listener)?

Boat 2:  No, there are no fish near me (the speaker).  Are there any fish near you (the listener)?

Boat 1:  No, there are no fish near me (the speaker).  We should look for fish away from both of us (pointing)?

In such conversations, all three deictics would have been used.

The sentence “Are there any fish near you (the listener)?” would have used the word soko (in Japanese), kugi (in Korean) and unge (in Tamil).

The sentence “No, there are no fish near me (the speaker)” would have used the word koko (in Japanese), yogi (in Korean) and inge (in Tamil).

The sentence “We should look for fish away from both of us (pointing)” would have used the word asoko (in Japanese), chogi (in Korean) and ange (in Tamil).

I am just guessing at all this, of course.  Part of the fun of working in linguistics is that you can extrapolate from tenuous linguistic clues, and indulge in wild flights of fantasy.

But what I am proposing is not entirely unimaginable.

In 2011, in a small cave (called the Jerimalai cave) in East Timor, archaeologists found bones from 2843 individual fish, some of which were caught 42000 years ago.  50% of the bones were those of deep-water tuna fish. The finds also included fish hooks dating from between 23000 and 16000 years ago.

More details on the Jerimalai find here: http://news.discovery.com/history/archaeology/ancient-human-fishermen-111128.htm

Posted in History, Linguistics, Uncategorized | Tagged , , , , , , , , | 3 Comments

Text Analytics Tools for Deliberative Democracy

In our last post, we spoke about various control mechanisms that can be implemented to support direct democracy (which we  interpreted to mean the control of the allocation of common resources by the people who pooled in).

We also examined how these controls could be used to curtail man-in-the-middle corruption.

In this article, we examine a more sophisticated form of direct democracy called a deliberative democracy.

In a deliberative democracy, in addition to the control mechanisms prescribed for direct democracy, there need to be mechanisms to allow deliberation (discussion) before a referendum or any other action is taken.

I quote from the Wikipedia article on deliberative democracy:

Deliberative democracy holds that, for a democratic decision to be legitimate, it must be preceded by authentic deliberation, not merely the aggregation of preferences that occurs in voting.

In elitist deliberative democracy, principles of deliberative democracy apply to elite societal decision-making bodies, such as legislatures and courts; in populist deliberative democracy, principles of deliberative democracy apply to groups of lay citizens who are empowered to make decisions.

The article on direct democracy had the following to say:

Democratic theorists have identified a trilemma due to the presence of three desirable characteristics of an ideal system of direct democracy, which are challenging to deliver all at once. These three characteristics are participation – widespread participation in the decision making process by the people affected; deliberation – a rational discussion where all major points of view are weighted according to evidence; and equality – all members of the population on whose behalf decisions are taken have an equal chance of having their views taken into account.

(Aside to computer scientists: doesn’t this trilemma remind you of the CAP theorem that applies to database systems? Here’s a simple explanation of the CAP theorem: http://ksat.me/a-plain-english-introduction-to-cap-theorem/).

So, for example, representative democracy satisfies the requirement for deliberation and equality but sacrifices participation.

Participatory democracy allows inclusive participation and deliberation but sacrifices equality.

And then there is direct democracy which supports participation and equality, but not deliberation.

The problem seems to be that when a large number of people are invited to participate in a deliberation (and given that deliberations take time), it will not be possible to compensate them all for their time. Consequently, only those more interested in the issue being debated (or more likely to benefit from one position or the other) are more likely to participate, biasing the sample in their favour (all sections of the population are no longer equally represented in the discussion/decision).

So, it seems that all the three properties desired in an ideal democratic system – participation, equality and deliberation – cannot be present at the same time in a real democratic system.

But then, a while ago, we began wondering if this trilemma is merely a result of the lack of suitable technology and not really a fundamental property of democracy.  So, we proposed a design for (though we have not yet realized it) a tool that can support the participation of a large number of people in deliberations.  We call it the MCT (Mass Communication Tool).

It could be used as a method to enable direct democracies to support deliberations in which all citizens can participate, ahead of a vote on any subject.

It uses text clustering algorithms to solve the problems of volume as well as numeric asymmetry in the flow of communications between the deliberating participants and the moderators of the communications.

There’s a brief overview of the system in our lab profile.

MCTs are bound to have a huge impact on our experience of representative government.  A typical use case would involve a public figure, (say President Obama), sounding out the electorate before introducing legislation on say healthcare reform.

By first discussing the competing proposals with large numbers of people, it might be possible for the initiator of the discussion to get a sense of what might or might not work and what the response to the legislation was likely to be.

An MCT would have to be capable of supporting a live dialog involving a large number of people.

It would use natural language processing and machine learning to enable a few moderators (for example, the CEO of a company) to interact with a large number of people (for example, all the employees of the company) in real time (for example, during a virtual all-hands meeting), get a synopsis of a large number of concurrent discussions in real time, and participate in a significant fraction of the discussions as they are taking place.

The system would consist of:

  1. an aggregator of messages (built from natural language processing components) that groups together messages and discussions with identical semantic content;
  2. a hierarchical clustering system (built from natural language processing components) that assigns aggregated messages their place in a hierarchy by specificity with more general messages closer to the root of the hierarchy and more specific messages closer to the leaves of the hierarchy;
  3. a summarization system (built from natural language processing components) that creates a summary of the aggregate of all messages in a sub-tree; and
  4. a reply routing system (built from natural language processing components) that routes replies from cluster to cluster based on their relevance to the discussion threads.
Posted in Uncategorized | Tagged , , , , , , , , , , , , | 1 Comment

Direct Democracy and Implications for Research

Direct democracy can be broadly interpreted to mean the control of the allocation of common resources by the people who pooled in.

One common resource is tax money.

In most countries, those who pay taxes only have a say in whom they can elect to power.

Those who pay taxes rarely have a say in how the tax money is spent.

There is a middleman (someone who works in government) who decides how the tax money is spent.

The problem with having a middleman decide the allocation of common resources, is that the resources could end up being allocated very inefficiently due to man-in-the-middle corruption.

Here is an article about man-in-the-middle corruption:

The way out is to let the people who contributed to the common pool decide on how the resources are allocated.

Control Mechanisms

Tool 1: Apportioning

One way to do this is to embed direct democracy mechanisms into the contribution mechanism.

For example, tax-payers could be given the ability to tie a portion of their tax contribution to expenditure categories.

They could be given the right to apportion, out of every $100 that they have paid in taxes, a certain amount to each of the following major categories: education, healthcare, social security, infrastructure and defence (leaving a certain percentage to the finance minister’s discretion).

It could also be left to the tax-payers to specify how much money the government may borrow on their behalf.

This direct control could very likely have prevented the debt crises of Greece and Ireland (and especially in countries where people are averse to taking on debt), and might also have given people in the USA some control over their government’s borrowings.

Tool 2: Referendum

Another mechanism is the referendum.  It is already being used in all democratic countries, but mainly for the selection of the middleman.

Fortunately, things seem to be moving well beyond that stage.  In India, baby steps are being taken towards bringing about a direct democratic model of government.

A new political party came to power in Delhi on an anti-corruption platform.  The first thing they did was conduct an informal referendum to ask the people of Delhi if they should form a minority government.

So, referendums are one of the mechanisms of direct democracy.

This mechanism can also be used to prevent or reduce man-in-the-middle corruption.

Take the example of a road that needs to be surfaced.  Normally, a government official would have issued contracts based on the bribes paid to him by the contending contractors (rendering a selection on the basis of quality very unlikely).

If instead, the people living on the street that needed to be surfaced had been given all the relevant information needed to make a good choice and asked to select the best contractor for themselves instead, the middle-man would have been eliminated and the quality driven up instead of down.

Now, I am going to talk about some problems that I think affect research funding in the USA (and other countries with government-funded research).  Most research funding in the USA comes from government bodies like the NSF, the NRO and DARPA.

Now the following is only my personal opinion, but I think that research efforts in some fields in those countries might be distorted to some extent by the needs of these funding bodies.

Well, I can only speak for the research areas in which Aiaioo Labs is active.  We focus on a narrow research space – predominantly on text analytics and natural language processing.

In this space, I see a lot of low-hanging fruit that nobody in the USA or Europe ever picks.  And that’s quite inexplicable as these are often problems with very obvious applications to the software products space.  And yet I see no papers from California on them.

There are topics on which all the papers I see are from India (often by students who don’t even publish them in international conferences) and sometimes by researchers from Singapore – completely ignored by the main research community.

At other times, I’ve noticed areas of research that DARPA had spent much money on in the 1970s and that researchers had pursued very enthusiastically in that decade.  I’ve seen those lines of research being abandoned in the 90s (possibly once the funding priorities changed) and not being revived again, though product firms are working on those technologies again in California in 2013.

I find it hard to explain why these areas of research are being ignored, except by the remote possibility that they have passed under the radar of the guys in the Naval Research Office which makes it unlikely that grants will be provided for them.

That is again, if my conjecture is right, a man-in-the-middle problem.  The agenda for research is possibly not being driven by the research community or by the market (the needs of start-ups in California) but by people guessing at what sort of proposals might get funded (and that might be encouraging people to stay with what government knows).

So, I shall propose another direct democracy tool to solve this problem as well:

Tool 3:  Suggestions + Referendum

Here, each of the participants (researchers) bidding for the grants would put in suggestions about what the next important thing to focus on as a research community might be.  Then they could all vote on the suggestions.  The allocation of research grants could then be guided by the suggestions and the votes received by each suggestion.

Controls as Rights

In a sense, you can think of these three control mechanisms as three rights that people who contribute toward a common pool of resources will have in a direct democracy:

1)  The right to apportion

2)  The right to be consulted

3)  The right to suggest

There is a nice article on Wikipedia on direct democracy.  The article talks of two of the control mechanisms proposed in this article – referendum and initiative (which corresponds somewhat to suggestions) – and proposes one that I hadn’t mentioned – the right to recall.  It doesn’t talk about apportioning.

Here is an interesting video of a Mohalla Sabha (it’s an interesting participation mechanism that a political organization is experimenting with in Delhi).

Posted in Uncategorized | Tagged , , , , , , , , , | 1 Comment