Category: Intention Analysis

Wishful Thinking and Leprechauns

I recently came across a lovely cartoon on Leprechauns and social media.

Fortunately for us, we have a leprechaun in the office.

(So, now you know where we get our startup funding from).

Here’s a picture of the guy (that’s the cubicle he shares with Selasdia):

Just kidding!

One of our business partners brought the little pewter leprechaun in the picture back to India for us from Ireland.

It might have once been popularly believed in Ireland that leprechauns had the ability to grant Wishes.

And we find Wishes immensely interesting because some of the earliest work on Intention Analysis started out as an attempt to detect and classify Wishes.

In fact, one of the loveliest papers on the subject started out with an attempt to study what people wished for (wanted) on New Years Day.

You can read the paper here: http://pages.cs.wisc.edu/~jerryzhu/pub/wish.pdf

It has a very beautiful title: “May All Your Wishes Come True: A Study of Wishes and How to Recognize Them”

You also find the word Wishes in the title of one of the first attempts in research literature to find “buy” intentions:

http://www.aclweb.org/anthology-new/W/W10/W10-0207.pdf

It is a paper titled, again quite poetically (what’s with Wishes and beautiful titles!) “Wishful Thinking – Finding suggestions and ‘buy’ wishes from product reviews”.

This paper was written by a research team working at Cognizant (India) in 2010.

Sentiment Analysis, Intention Analysis and the Direction of Fit

As you know, for some years now, all of us who form part of the NLP research team at Aiaioo Labs have been working on a technology for text analysis called ‘Intention Analysis‘.

It was something few had heard of when we started.

Today, a lot more people know the term.

But there has been not a great deal of research work published on Intention Analysis in the last 20 years.

So, we’re really happy to be one of the first research teams in ‘recent times’ to delve into the subject again.

We’re also really thrilled to be able to let you know that we’ve just been allowed to demonstrate our work on Intention Analysis at the COLING 2012 conference which will be held in Bombay (now officially known as Mumbai).

I hope I shall get to meet many of you in Bombay in a couple of weeks.

The theory that defines and shapes our work on Intention Analysis is known as ‘Speech Act Theory’.

One of the earlier philosophers to work on it was John Rogers Searle.

He augmented the theory with the concept of Intentional States. (Incidentally, Intentional States are not even defined in the Wikipedia).

According to Searle, Intentional States could be either Beliefs or Desires.

He differentiated Beliefs from Desires by their direction of fit.

The direction of ﬁt of an intentional state is said to be ‘mind-to-world’ if through the performance of the speech act, a mental state is established, revealed or altered.

The direction of ﬁt of a speech act or intentional state is said to be ‘world-to-mind’ if the performance of the speech act alters the state of the world.

So, sentiment analysis is all about Beliefs. The direction of fit is mind-to-world. You see things in the world, and form opinions about them.

Intention analysis on the other hand is all about Desires. The direction of fit is world-to-mind. You try to fit the world to a model of how the world should be that resides in your mind.

If you would like to learn more, you can find our paper here: www.aiaioo.com/publications/coling2012.pdf

Discovering intent for retargeting

Text-based Intention Analysis

At Aiaioo Labs, we have been developing text analytics tools for the discovery of intent for a number of years, as explained in the following articles:

In recent months and years, we have seen other firms come up with similar technologies, studies and initiatives:

These are all text-based studies, using the content of conversations on social media to discover intentions.

However, the most recent attempts at improving the targeting of ads have focused on methods for discovering intent without using any unstructured information. Below, we look at two of the methods.

One of the methods is advertisement retargeting.

Retargeting

Retargeting involves showing ads to customers based on intent discovered from past visits to sales portals. For instance, if you visited Amazon’s books section, you have demonstrated the intent to purchase books. It’s clear as crystal, and there is no text analysis involved.

Now if Amazon tells Facebook that a particular user visited Amazon’s books section, Facebook can use the information to show suitable Amazon ads to that user, even after the user has left Amazon. That is called retargeting. It is in essence a kind of intention analysis, but without the use of text.

Here is an article about retargeting on Facebook:

It’s Become Tragically Clear That Facebook Chased The Wrong Business For Years

Another method is the use of special purpose social networks.

Special Purpose Social Networks

When most people go on LinkedIn, they have careers or business contacts on their minds. So, LinkedIn might work well for job ads or B2B branding exercises.

This article on Forbes talks about this type of intent analysis: Why Consumer Intent Drives The Value Of Social Networks

It’s very interesting to see wholly new ways to guess at social networking users’ intentions.

Text-Based

Advantages: doesn’t need data from outside the website
Disadvantages: might be perceived as an invasion of privacy, needs conversations, low volume, low accuracy

Retargeting

Advantages: high volume, high accuracy, no privacy issues, needs no conversations
Disadvantages: needs data from outside the website

Special purpose networks

Advantages: high volume, no privacy issues, needs no conversations
Disadvantages: low accuracy, the space of intentions that can be assumed is limited per network

A AA I II U UU and their role in some ancient #languages

Vowels in Sanskrit are very interesting. You have long and short forms of the a, i and u vowel sounds, but the e and o sounds come in only one length.

The obvious question is, “Why don’t you see long and short forms of the e and o vowel sounds in Sanskrit (and in Hindi)?”

It turns out that Australian languages have only three vowel sounds – a, i and u. These sounds appear in both long and short forms.

For example the phonology of the Walpiri language has six vowel sounds: a, aa, i, ii, u and uu.

In a blog post in 2010 on the similarities between languages in Australia and the languages of South India and Japan, I had mentioned that Old Tamil was believed to have had only a, i and u sounds (in both long and short form), and that the phonology of Tamil (with its five nasals, a retroflex t/r sound and l variants) resembles that of languages like Walpiri very closely.

This possibility of a native Australian substratum for Tamil suggests that a similar process might have occurred in Sanskrit.

It might have acquired the use of long and short vowel sounds from the languages of the people who eventually settled in Australia 60,000 – 80,000 years ago.

But where could the e and o sounds have come from?

I picked up a book on Persian, because Old Persian is very closely related to Sanskrit (words like Asura, Deva, Mitra, Ashwa, Bratr, Pitr and the numbers are shared in the two languages). But modern Persian turned out to be a very different kettle of fish. Modern Persian had no long vowels at all.

Then, I examined the phonology of Old Persian. And when I did so, I was to put it mildly, quite shocked. Old Persian, it turned out, had only six simple vowel forms. And there are no prizes for guessing what they are …

a, aa, i, ii, u, uu

And then it turned out that not just Old Persian, but also Classical Arabic, had only six vowel sounds, and they are … yeah … a, aa, i, ii, u and uu.

That was very interesting.

The Arabian peninsula, Persia and India were on the coastal route to Australia from Africa.

So, it may be that Arabic, Persian and Tamil developed similar vowel sounds in their classical forms because of shared 80,000 year old memories.

The e and o sounds entered Persian (Early New Persian) at a later date. In the case of Tamil, I think (I am not sure about this) that some linguists (I’ve been referred to the works of T P Meenakshi Sundaram but I can’t get hold of his books – they seem to be out of print) think it was the ia sound that became e and the ua sound that became o.

For example, in Tulu, a language related to Tamil, the word that means ‘for me’ is ‘yanku’ whereas in Tamil, it is ‘enakku’. It is possible that the ya of the Tulu word became e in Tamil.

So, in Early New Persian, you had the sounds a, aa, i, ii, u, uu, e and o … (either by transformation or through borrowing) … and that’s identical to the set of vowel sounds that we have in Sanskrit.

Later, the long forms of Early New Persian – the aa, ii and uu sounds – were lost in the Iranian dialect of Persian (but still exist in the Dari and Tajik dialects).

There’s a matrix showing this transformation in the article on Persian phonology in the Wikipedia.

So, can we conclude from all this that native Australian languages formed the substratum of the oldest languages of the Semitic, Indo-Iranian and South Indian language families?

The reason we can’t make that claim is that there are two other hypotheses which are equally plausible and which can explain the similar vowel vocabulary of these languages: a) that all humans start speaking by making a, i and u sounds mixed with clicks or consonants and b) that these three languages happened to choose to use identical vowel vocabularies entirely by chance and entirely independently.

Until we are able to refute these two hypotheses, we cannot say with any certainty that they had a common ancestor.

But it remains a very tempting hypothesis.

The Process of Change

The progression of vowel sounds in Persian is very interesting. You had ancient Persian where the vowel sounds were identical to those in Tamil. Then you had a period where Persian vowel sounds were identical to Sanskrit. The vowels used in Iranian today are identical to those in English. How did that come to be?

I just couldn’t help thinking that it might be possible to surmise that there were two forces at work:

a) The creation/inclusion of sounds (whenever it was necessary to accommodate borrowed words)

b) The deletion of sounds when maintaining certain distinctions was no longer necessary

The first force enters the picture when new words are encountered by a culture. Let’s say the original speakers of Indo-European languages (who might have been among the first to tame horses) began to travel farther and farther from their homes in Scythia (the grasslands to the north of Persia) and ran into the people on the coasts who used words comprising a, aa, i, ii, u and uu sounds.

The new language born from this confluence would have had to have all the sounds needed for the Scythian language as well as for the coastal language (the a, aa, i, ii, u and uu sounds).

Could that have led to other vowel sounds in the combined language disappearing completely?

I think it is possible that such a mechanism did exist.

Some of the older words in Sanskrit, the Romance and the Germanic languages exhibit a phenomenon called ‘ablaut’. Some of the old words could keep their meaning when different vowel sounds were fitted into them (the words are sometimes considered to be ‘irregular’). An example in English would be the words ‘sing’, ‘sang’, ‘sung’ and ‘song’. An example in German are the words for ‘is’ and ‘are’. In German, the equivalent words are ‘ist‘ and ‘sind‘. In Spanish, the equivalent words are ‘es‘, ‘son‘. In Sanskrit, the words are ‘asti’ and ‘santi’. The constant in all these patterns are the consonants s and n or t. It seems as if the meaning of ‘to be’ is preserved as vowels are inserted into this pattern of consonants in various ways. This is somewhat similar to how words in Semitic languages are formed (kitaab, kaatib, maktab are three Arabic words with similar meanings).

So, it is possible that in the early Iranian languages, the consonant patterns were not difficult to fit over the a, aa, i, ii, u and uu vowel patterns of the substrate language. If that had indeed happened, the other vowel sounds might have fallen into disuse because they might have been not entirely necessary. This could have happened any time between the domestication of horses some 10,000 years ago and the period of Old Persian some 5,000 years ago.

The next phase in the transformation might have begun when, with the passage of time, the old words of the coastal languages fell into disuse. As they disappeared, there would no longer have been any need to maintain the distinction between long and short vowels. That might explain why, in modern Iran, these vowel distinctions have largely disappeared, but not in India (where the close proximity of South Indian languages keeps those old words alive and consequently makes it necessary to continue maintaining the distinction between long and short vowels).

After-speculation

The story I told above posits roving horse-riders. But it seems such people did exist. Some vowels in Vedic Sanskrit apparently had tones. The only languages with tones in Asia are Chinese and South East Asian languages. Such languages exist outside East Asia only in the Americas and in Africa. So, it is quite possible that the tonal sounds in Vedic Sanskrit were picked up when some roving horsemen came into contact with people in China.

The more a population of people traveled through lands with different language structures, the more complex their own language would have become.

An interesting story is that of the Kingdom of the Mitanni that existed in what is now the area of Kurdistan (to the north of Iraq and the east of Turkey). The rulers of that Kingdom just might have had some connection to India. Here are some excerpt from the Wikipedia:

“The names of the Mitanni aristocracy frequently are of Indo-Aryan origin, but it is specifically their deities which show Indo-Aryan roots (Mitra, Varuna, Indra, Nasatya), though some think that they are more immediately related to the Kassites. The common people’s language, the Hurrian language is neither Indo-European nor Semitic, but a Language Isolate.

Some theonyms, proper names and other terminology of the Mitanni exhibit an Indo-Aryan superstrate, suggesting that an Indo-Aryan elite imposed itself over the Hurrian population in the course of the Indo-Aryan expansion. In a treaty between the Hittites and the Mitanni, the deities Mitra, Varuna, Indra, and Nasatya (Ashvins) are invoked.

Kikkuli’s horse training text includes technical terms such as aika (eka, one), tera (tri, three), panza (pancha, five), satta (sapta, seven), na (nava, nine), vartana (vartana, turn, round in the horse race). The numeral aika “one” is of particular importance because it places the superstrate in the vicinity of Indo-Aryan proper as opposed to Indo-Iranian or early Iranian (which has “aiva”) in general.

Sanskritic interpretations of Mitanni royal names render Artashumara (artaššumara) as Arta-smara “who thinks of Arta/Ṛta” (Mayrhofer II 780), Biridashva (biridašṷa, biriiašṷa) as Prītāśva “whose horse is dear” (Mayrhofer II 182), Priyamazda (priiamazda) as Priyamedha “whose wisdom is dear” (Mayrhofer II 189, II378), Citrarata as citraratha “whose chariot is shining” (Mayrhofer I 553), Indaruda/Endaruta as Indrota “helped by Indra” (Mayrhofer I 134), Shativaza (šattiṷaza) as Sātivāja “winning the race price” (Mayrhofer II 540, 696), Šubandhu as Subandhu ‘having good relatives” (a name in Palestine, Mayrhofer II 209, 735), Tushratta (tṷišeratta, tušratta, etc.) as *tṷaiašaratha, Vedic Tvastr “whose chariot is vehement” (Mayrhofer, Etym. Wb., I 686, I 736).”

Did you notice that the Mitanni names don’t use long vowels?

How can you find out who wants to buy your products?

One way to understand a brand’s strengths is to see how often people say they want to buy that brand, on Twitter or other social media.

We have a demo of Twitter analysis for finding customers.

The following link takes you to a live demonstration of purchase intention detection being run on a Twitter stream:

http://www.aiaioo.com/demos/demo_use_case.html

What you will see on the page is tweets of people wanting to purchase something.

It is a fully automatic way of learning how many people want to buy your products, and who they are.