How communication devices could turn into personal assistants

Mobile phones are no longer just communication devices. They are also used as follows:
a) As consoles for entertainment
b) As personal task management and planning tools

Keeping the second use in mind, it would be in the cell phone manufacturers’ best interests to develop highly integrated task management tools for cell-phones.

Cell phones today lack seamless integration between the email and SMS applications and the task management applications.

When SMS messages or Email messages are received by the user about a task, there is little to no assistance in transferring the relevant data to a task management application.

What seems to be needed is a way to assist the user in capturing the information related to the task and moving it to a third-party task management application on the cell-phone.

But intention recognition algorithms can do just that!

If an SMS says “What is the name of the person we met the other day?” it should be possible to detect that this is an inquiry and that the recipient now has the task of sending a response.

Another example is the following SMS “Can you please send me the draft report by 2 pm”. This SMS contains a directive and the recipient now has a task to complete by a deadline.

Yet another example is the following “Would 4 pm work for you?” This is a meeting request.

These categories of tasks can be identified by looking at incoming and outgoing SMS messages and classifying them into various categories of tasks.

There is also an entity extraction task involved (identifying the deadlines and time ranges).

Once task intentions are identified, the phone could take the following steps:
a) The phone confirms with the user whether or not a task needs to be created.
b) The phone passes the task to the user’s preferred task management tool.

I just wanted to point you to an interesting dissenting view from a Google engineer. In the blog post “Will Google fight Apple’s Siri with Alfred?“, Alexei Oreskovic quotes Google’s head of mobile Andy Rubin as saying:

“I don’t believe that your phone should be your assistant. Your phone is a tool for communicating. You shouldn’t be communicating with the phone; you should be communicating with somebody on the other side of the phone.”

However, the same article also says the following:

“On Tuesday Google said it had acquired the tech company that has developed Alfred, a smartphone app that acts as a “personal assistant” to make recommendations based on your interests and your “context,” such as location, time of day, intent and social information.”


Intentions and Information

In the post “Intent on Intentions”, I’d talked a bit about the Speech Act Theory of Searle and Winograd.

In this blog post, I’d like to look at all other utterances. What purpose do utterances have if they are meaningful, but are not a Speech Act?

It turns out that meaningful utterances that do not convey Speech Acts, typically convey information. Information in turn comes in two flavours – events and facts. Facts represent states of the world (they describe relations between entities or describe properties of entities). Events represent changes.

For example, “London is in England” is a fact, whereas “London Bridge is falling down” is an event.

Entities are the things being talked about. In the sentences used to illustrate events and facts above, the following entities may be observed: “London”, “England” and “London Bridge”.

The distinction between intentions, events and facts is not watertight. There are times when utterances can cross the boundaries and fall into more than one of these categories.

Interestingly, there are different uses for the three kinds of text analysis (analysis of intention, analysis of events, and analysis of fact) and types of data that they may be applied to.

Data Sources

  • Event Analysis: News articles, because news
    reports are always about important happenings or changes in the state of the world, and
    hence are rich with events and also with facts.
  • Fact Analysis: Wikipedia, other Encyclopedias and Knowledge Bases are full of facts,
    but don’t necessarily report current events.
    They may contain information on events that
    took place in another age.
  • Intention Analysis: Emails Messages, Customer Feedback, Social
    Media Messages

Enterprise Applications

  • Event Analysis: Media Monitoring Tools,
    Opportunity Identification Tools, Conformance and Discovery Tools
  • Fact Analysis: Enterprise Search, Semantic
    Web, Logic and Inference Engines
  • Intention Analysis: CRM Tools, Collaboration Tools, Task Management Tools, Communication Devices

Here is a link to a whitepaper on the topic of doing a 360 degree analysis of text.


Intent to Opine

In the last blog post, I talked about intention analysis and what it does.

Intention analysis is the identification of intentions from text. Some examples of intentions are:

a) intention to complain
b) intention to inquire
c) intention to issue a directive
d) intention to buy

In this post, I am claiming that Sentiment Analysis needs Intention Analysis.

Yes, the results of sentiment analysis will be inaccurate unless you know that the intent of the speaker is to express an opinion.

Background

When sentiment analysis was initially proposed by researchers, they applied it to the analysis of product reviews.

The intention of a reviewer is obvious. Reviewers have only one intention: the intention to opine (either to praise or to criticize).

However, with the growth of social media, especially Twitter, the same sentiment analysis methods began to be applied to the analysis of twitter streams and other social media streams.

Now that’s where there is a problem.

Not every message on Twitter that mentions a particular product or brand intends to express an opinion about the brand!

Below are a few illustrative examples.

Example 1: “Is the Canon EOS 5 a good camera?”

This sentence is not an expression of positive opinion, but an inquiry about the Canon EOS 5.

In other words, the intent of the speaker is not to express an opinion, but to inquire.

Example 2: “I am looking to buy me a good Canon camera”

Here, the intention of the user is to purchase a product (people only indicate a preference for good things … no one really looks to buy a bad camera).

However, most sentiment analysis tools will identify this sentence as an expression of positive sentiment.

Example 3: “Take me to a good movie.”

Here, the speaker’s intent is to direct someone to do something.

A directive is not an assertion, and so does not always imply an intention to opine.

Example 4: “My good old Porsche for sale (cheap)”

Here, the speaker’s intent is to talk up something they’re selling.

The intent here is not to express sentiment about the brand.

Conclusion

So, what we can learn from the above examples is that sentiment analysis is not meant to be applied without reservations to Social Media Analysis.

In other words, for sentiment analysis to be accurate when applied to social media, it needs to be supported by intention analysis.

Demonstration

We recently released a sentiment analysis API that has the ability to filter out many kinds of intention including the ones listed above. We’d love to get your thoughts on our work. The demo is available at the following URL:

Demonstration of VakSent (a Sentiment Analysis API from Aiaioo Labs)

Do write me at cohan@aiaioo.com with intent to opine!


Intent on Intentions – Vakintent API

We have been exploring intention analysis for some time now and we are pleased to announce the launch of the first ever commercial API for broad-based intention analysis, called Vakintent.

Here is a demo of the Vakintent Intention Analysis API:  Demonstration of VakIntent, the Intention Analysis API from Aiaioo Labs

Definition

Intention Analysis is the identification of intentions from text, be it the intention to purchase or the intention to sell or to complain, accuse or to inquire, in incoming customer messages or in call center transcripts.


Uses

Intention Analysis has already given us some evidence of its usefulness.

In July 2011, we used intention analysis to study the GooglePlus launch.  We especially looked at quit intentions to see how frequently people were threatening to quit FB over time and saw how the number dropped sharply once people got to try GooglePlus (once the by-invite-only period ended).

This was a powerful observation, because in just four days, we could tell that GooglePlus couldn’t replace Facebook, at least not yet. Here is the study: http://www.aiaioo.com/cami


Background

The work that intention analysis is based on goes as far back as 1962 when J. L. Austin noted that not all utterances are statements whose truth and falsity are at stake, and that there was a class of utterances like “I pronounce you man and wife” that are actions [taken from Winograd, 1987].

In 1975, Searle identified the following broad categories of illocutionary (causing an action to happen) speech acts [from Winograd, 1987]:

  • Assertive – Committing the speaker to the truth of a proposition
  • Directive – Attempting to get the listener to do something
  • Commissive – Committing the speaker to a course of action
  • Declaration – Bringing about something (eg., pronouncing someone married)
  • Expressive – Expressing a psychological state

Interestingly, the expressives include expression of opinion which corresponds to the modern day task of sentiment analysis.

However, utterances have more uses than purely informative uses like “They’re planning to remodel the west wing next summer” or purely expressive uses like expression of sentiment.

In a paper in 1987, titled “A Language/Action Perspective on the Design of Cooperative Work”, Winograd proposes the concept of a “Conversation for Action (CfA)”.

Prior Work

Cognizant Technologies

There was a paper at ACL 2010 titled “Wishful Thinking – Finding suggestions and ‘buy’ wishes from product reviews” http://aclweb.org/anthology/W/W10/W10-0207.pdf by Krishna Bhavsar et al from Cognizant Technologies .

Lampert and Dale

Another recent attempt to build computer systems capable of analysing intention was made by Robert Dale and Andrew Lampert at Macquarie University. A paper that I’d recommend to you is their work on detecting emails containing requests for action: “Andrew Lampert, Robert Dale and Cécile Paris [2010] Detecting Emails Containing Requests for Action. Pages 984–992 in Proceedings of NAACL 2010, 1st–6th June 2010, Los Angeles, USA“. Our own work leads us to believe that the difficulty of detecting directives is rather higher than for other intentions, so what they’ve done in this project is quite impressive.

WisdomTap

WisdomTap (www.wisdomtap.com) has a very interesting buy intention offering. Their value proposition is “Your Customers announce their intent to buy by asking for product and service recommendations on Twitter.  We find customers who need your products and services.  We connect you to your customers at the right time.”

Twitchell

Twitchell et al have studied “Using Speech Act Theory to Model Conversations for Automated Classification and Retrieval”.

Carnegie Mellon

CMU has released a speech act corpus: through the Jangada and Ciranda projects.


Vakintent Demonstration Consoles

Here are some links to demos:

Name Description URL
Vakintent Intention Demo Demonstration of VakIntent, the Intention Analysis API from Aiaioo Labs
Vaksent Sentiment Dem Demonstration of VakSent, the Sentiment Analysis API from Aiaioo Labs

Case Study URL
Competitive Analysis http://www.aiaioo.com/cami

Vakintent API

The Vakintent API offered by Aiaioo Labs can identify 11 intentions, the objects of those intentions and their holders.

Please feel free to write me at cohan@aiaioo.com for more information.


Vaksent API for Sentiment Analysis

Aiaioo Labs has just released an API for fine-grained sentiment analysis.

A demonstration of the Vaksent Sentiment Analysis Engine is available here: http://www.aiaioo.com:8080/annotator-0.1/automation/demoView/1

The key features of the sentiment analysis system are: a) identification of the holder of the opinion (who holds that opinion), and b) identification of the object of the sentiment (what exactly is the sentiment expressed about).

Technology

We use a cascade of algorithms to identify sequentially 1) sentiment-conveying phrases, 2) entities (to identify objects being spoken about), 3) relations (to identify which sentiment applies to which entity) and 4) negations (to identify which relations are negated). This combination makes for a very sophisticated sentence level and entity level analysis of sentiment.

The main goal of this system was to have roughly domain independent behaviour (no imbalance in performance when used on financial data, product data or entertainment). Such a balance is pretty hard to achieve (some measurements suggest that human annotators agree with each other only 79% of the time when attempting to identify the sentiment of sentences/entities in certain types of text).

Evaluation

The accuracies that we measured for different domains are as follows.

Domain of Entertainment:

Accuracy = 0.7103

Precision = {negative=0.7222, positive=0.6997}

Recall = {negative=0.6837, positive=0.737}

F-Score = {negative=0.7027, positive=0.7181}

Tested on a total of 10662 sentences.

This was evaluated using the Bo Pang data set. As you can see the errors are roughly balanced on the positive and the negative side to get what we hope is a fairly unskewed error curve. This allows averaging to work as a strategy to cancel out noise.

Domain of Products:

Accuracy = 0.7266

Precision = {negative=0.5963, positive=0.8462}

Recall = {negative=0.7807, positive=0.6953}

F-Score = {negative=0.6823, positive=0.7671}

Tested on a total of 3731 sentences.

The data set used was the Bing Liu corpora (the first two) covering mostly electronic products. We have roughly the same performance again on products, but the curve is now slightly skewed.

Domain of Finance (evaluation incomplete):

Accuracy = 0.6896

Precision = {negative=0.7037, positive=0.6666}

Recall = {negative=0.7755, positive=0.5789}

F-Score = {negative=0.7387, positive=0.6212}

Tested on a total of 87 sentences.

We have roughly the same performance again on finance, but the evaluation data set is very small. We’re working on performing a more reliable evaluation.

Examples

Here is what Vaksent http://www.aiaioo.com:8080/annotator-0.1/automation/demoView/1 says about two sentences provided as examples:

I {- deny -} that [- it can never [+ be said that this is not [- a {!+ beautiful +!} ( car ) -] +] -] . = [ negative ]

( John ) and not [- ( Bruce ) -] said that this is not [- a {!- bad -!} ( car ) -] . = [ positive ]


Speech Recognition using PocketSphinx on Win32

The zeroth thing you need is the Pocketsphinx binaries.
Just download the win32 binaries from the Sphinx website (download pocketsphinx, sphinxbase, sphinxtrain and cmuclmtk from the Sphinx website).

The first thing you need to do is build a language model or a grammar.

The grammar can be something simple in a format called JSGF, and this is the easier way to get a speech recognizer up and running. Alternatively, you can use a language model. The language model can be built using the instructions on the Sphinx site. You can create it starting from a file with sentences like this:

<s> I WANT A NEXTCUBE ZERO FOUR ZERO </s> <s> I WANT THE NEXTCUBE ZERO FOUR ZERO </s> <s> I NEED A NEXTCUBE ZERO FOUR ZERO </s> <s> I NEED THE NEXTCUBE ZERO FOUR ZERO </s> <s> I AM LOOKING FOR A NEXTCUBE ZERO FOUR ZERO </s> <s> I AM LOOKING FOR THE NEXTCUBE ZERO FOUR ZERO </s> <s> I AM SEEKING A NEXTCUBE ZERO FOUR ZERO </s> <s> I AM SEEKING THE NEXTCUBE ZERO FOUR ZERO </s> 

A sample JSGF file would be (modified from the sample on the Sphinx website) … note that I’ve made all the words capitals because the CMU phonetic dictionary has all the words listed in caps (make sure that any language model is all caps as well, except for the sentence boundaries):

#JSGF V1.0; /** * JSGF Grammar for Hello World example */ grammar hello; public <greet> = (GOOD MORNING | HELLO | HI) ( PAUL | RITA | WILL ); 

The second thing you need is an Acoustic Model

An acoustic model maps sound features from the speech recognizer to phonemes.
Voxforge provides a free acoustic model for Pocketsphinx that you can use.

The third thing you need is a phonetic dictionary

The phonetic dictionary maps the recognized phonemes to actual words in your language. For English, there is a phonetic dictionary available from CMU

You will just need to download one file: cmudict.0.7a_SPHINX_40

Now, you have all the components you need!

Running Pocketsphinx

With JSGF:

$ pocketsphinx-0.7-win32/pocketsphinx_continuous.exe \
-hmm voxforge-en-r0_1_3/model_parameters/voxforge_en_sphinx.cd_cont_3000 \
-jsgf greet.jsgf \
-dict cmudict.0.7a_SPHINX_40

With a language model:

$ pocketsphinx-0.7-win32/pocketsphinx_continuous.exe \
-hmm voxforge-en-r0_1_3/model_parameters/voxforge_en_sphinx.cd_cont_3000 \ -lm cmuclmtk-0.7-win32/output.lm.DMP \
-dict cmudict.0.7a_SPHINX_40l

Any additional phonetic entries in the phonetic dictionary can be created using the CMU dictionary phoneme set

Education

  1. Videos on speech recognition
  2. Lectures on speech recognition
  3. Voxforge has an article on what an acoustic model is

NLP Workshop

The IASNLP 2011 workshop turned out to be a good opportunity to learn a little bit about speech research.

(See the article: http://www.aiaioo.com/cms/index.php?id=28)

Here are two of the faculty who work on speech at IIIT-H:

1. Yegnanarayana: http://speech.iiit.ac.in/~yegna (many publications on signal processing, noise cancellation, feature extraction, ANNs).

2. Kishore Prahallad: http://www.iiit.net/people/faculty/kishore (speech synthesis and spoken dialog systems)

IIIT-H also has research on grammar and translation.

Dr. Rajeev Sangal (http://www.iiit.net/~sangal/) works on Dependency Parsing, Transfer Based Machine Translation and Anaphora Resolution.


Robotics Workshop

On the first day of a three-day workshop, I built a line-follower robot that successfully navigated what the instructor promised was a very difficult course (he said it would be impossible to navigate using a simple on-off algorithm).

The trick I used to complete the course was to run the DC motors on half-voltage and adjust sensor angles so that both always fed the ‘brain’ an excellent set of signals.

I came up with the idea owing to my experience with text analytics. The most critical task in text analysis is feature engineering. With a good set of features, you can get excellent results even if the machine learning algorithm is very simple. Unfortunately, very little work goes into feature engineering and feature combination methods for NLP.

So, I guess my weekend dabbling in robotics taught me an important lesson – no matter how good your machine learning algorithms (the brains of the system) are, they can’t do nothing without eyes.


An echo of voices

A long time ago, on a different blog, I’d written about the grammatical and semantic similarities between Tamil and Japanese (and Korean).

Recently, I read that Tamil bears a striking resemblance to the aboriginal/native languages of Australia.

What I found was (thanks dad for some valuable assistance) that Tamil has or is thought to have had sound patterns that are considered distinguishing features of the languages of Australia.

Before I list the semblances, let me give you a quick overview of some characteristics of Australian languages (most of this information has been gleaned from Wikipedia):

Feature 1

Their languages have four to six ‘n’ sounds, and these sounds are associated with places of articulation (where the tongue touches the roof of the mouth).  So, in the language called Dyirbal, we have the following consonants (I’ve highlighted the nasal sounds):

  Bilabial Alveolar Alveolo-Palatal Retroflex Velar
Plosive p t c   k
Nasal m n ɲ   ŋ
Trill   r      
Flap       ɽ
Approximant central     j   w
lateral   l    

In the languages of the Pama-Nyungan family, we have the following consonant sounds (again I’ve highlighted the nasal sounds):

  Bilabial Apico-
alveolar
Apico-
postalveolar
Laminal Dorso-
velar
Stop p t rt c, cʸ k
Nasal m n rn ñ ng
Lateral   l rl λ  
Rhotic   rr r    
Semivowel w     y  

Feature 2

Australian languages are characterised by an absence of fricatives (hissing/rubbing sounds like ‘s’, ‘h’ and ‘sh’) as you can see from the tables above.

Feature 3

Australian languages have only three vowel sounds: ‘a’, ‘i’, and ‘u’.

Now, you will notice that the above three characteristics of Australian languages are pretty distinctive.  They’re extraordinary, and distinguishing features.

You would probably agree that if any other language had the above features, it might be said to resemble Australian languages in how it sounds.

Now, let me list the characteristics of Tamil that I think can help one make the argument that at the phonetic level, Tamil resembles languages spoken by native Australians:

Feature 1 in Tamil

There are six nasal sounds in Tamil:

Plosives p (b) t̪ (d̪)   ʈ (ɖ) tʃ (dʒ) k (ɡ)
 
Nasals m n ɳ ɲ ŋ

This feature is also found in Malayalam but not in the other languages of South India.

Now for those of you who are surprised by the number of nasals, don’t be.  English has four nasals.  It’s just that the language does not use them to distinguish between different words.

Don’t believe me?  Oh well, here goes!  The first nasal sound in English is ‘m’.  The second is ‘n’ as in ‘bang’.  The third is ‘n’ as in ‘hand’.  There is a fourth (very rare) nasal.  This is the ‘n’ in ‘London’ (when the word is pronounced in a pompous manner, the ‘n’ gets to be more plosive/hard than otherwise).

Ok, I made a mistake.  English does distinguish between ‘m’ and ‘n’.  Notice how the script gives it away.

Feature 2 in Tamil

The Tamil script does not have letters for ‘h’, ‘s’ and ’sh’.  The lack of the corresponding consonants in the script does evoke suspicions that the sounds were not present and therefore the corresponding characters not needed at the time the early Tamil scripts came into being.

Another interesting observation that supports this hypothesis is that some dialects of Tamil prefer the use of ‘ch’ sounds to the use of the standard Tamil ‘s’ and ‘sh’ sounds.  In these dialects, ‘seri’ becomes ‘cheri’, ‘sAppAdu’ becomes ’chAppAdu’, and ‘sonnAn’ becomes ‘chonnAn’.

Feature 3 in Tamil

Establishing the third feature in Tamil is a bit difficult.  Modern Tamil has five simple vowel sounds ‘a’, ‘i’, ‘u’, ‘e’, ‘o’ (taught in that order to kids, just like in Japanese -notice how ‘a’, ‘i’ and ‘u’ come before ‘e’ and ‘o’).  However, there is another tentative link.

In a 1960s book, one Dr. T P Meenakshi Sundaram performed a comparative historical linguistic study of Tamil, and he surmised that early forms of Tamil had only three vowel sounds!

According to Dr. Sundaram, those three sounds were … surprise, surprise … ’a', ‘i’ and ‘u’!  He said that the sound for ‘e’ was originally composed of ‘i’ and ‘a’ sounds.

This I have personally observed.  In some rural dialects of Tamil/Malayalam, ‘Enna pEchi pEsurAn’ is still pronounced as ’Yanna PiAchi PiAsurAn’ (come and talk to my grandma!)

Semantics

All the similarities I have listed are at a purely phonological level.

However, I did look at whole words (nouns and verbs) in Australian languages and they did not resemble corresponding Tamil words at all.  But there is another level of similarity – semantic.  Semantics is the way word distinctions are used to convey meaning.

One interesting pattern is the use of words to convey distinctions of importance to prevalent kinship systems.  Let me explain.

Kinship Terms

The Australian languages of the Western Desert have the following words for parents and uncles and aunts (from a post on the Australian Anthropology forum by someone called Laurent Dousset):

I’ll give you an Australian example (Western Desert):

Mother: ngunytju
Mother’s sister: ngunytju
Mother’s brother: kamuru
Father: mama
Father’s brother: mama
Father’s sister: kurntili.

A mama is married to a ngunytju and a kamuru is married to a kurntili. These do not have to be actual kamuru(s) and kurntili(s), but are usually classificatory ones.

Now you will agree that this is very similar to the use of words for parents and their brothers and sisters in Tamil, Malayalam and Kannada.

Now, back to Japanese.  I have a test that I wish to perform to help me determine if Australian languages might really be related to Tamil, and I’m going to turn to Japanese for help.

Deictic References

One feature of Japanese that I found incredibly fascinating was the way words were used to refer to distances.

In Japanese, there are three types of distances [I believe these terms are also called deictic references, so I'm going to call them such, though I'm not really sure] and they are (koko – near the speaker, soko – near the listener, and asoko – far from both).

Such deictic references, it turns out, also used to exist in Tamil.  Sri Lankan Tamil still uses the third kind of deictic reference (ivan – he who is near the speaker, uvan – he who is near the listener, and avan – he who is far from both).

You also notice this distinction in the old saying: ‘ikkara ukkara pachcha’ which means ’from the shore near me, the shore near you looks green’, and you can also argue that you see a bit of it in ‘unnai’ (you-accusative) and ‘avanai’ (him-accusative).

What I would love to do is find someone from Australia who can tell me if these triple deictic references are also features of Australian languages.

Conclusion

Well, I am not going to comment on the interesting question of what this means/implies.  These similarities could simply mean nothing.  The similarities could have been the result of random language mutations.

But then again, maybe, just maybe, the ancestors of the native inhabitants of Australia stood on these very shores a hundred thousand years ago.  And just maybe, as I listen to my grandmother, I am hearing an echo of voices long gone from this world.

Acknowledgements

Thanks to dad for telling me about the work of T P Meenakshi Sundaram.  Thanks to mom for helping me with the thoughts on deictic references.

Counterargument

One of my friends wrote to me with excellent counterarguments, so I’m adding them to this post, just so you have a complete picture.

The problem he discovered with my logic is as follows.

My main claim is that that the three features (which I’ll refer to as F1, F2 and F3) occurring together is a very very rare event, making their occurring together in two unrelated languages even rarer.  However, for this claim to hold, the joint probability p(F1, F2, F3) would have to be very very low.

My friend pointed out that p(F1,F2,F3) need not be a very low number if the features are strongly interdependent, that is, when you see one such feature, you’re bound to see the others as well.

Now my friend also mentioned that F3 is a universal feature – all language initially started with only three vowels, so if you take any language and drill back in time far enough, you’ll be left with just ‘a’, ‘i’ and ‘u’.  This also implies that F3 is independent of F1 and F2 and p(F3) is 1.

Now, because of the independence of F3, p(F1, F2, F3) can be written as p(F2|F1)p(F1)p(F3).  Since p(F3) == 1, we can take it out of the picture and think of p(F1, F2, F3) as p(F2|F1)p(F1).

Now my friend pointed out that phonological features occur in clusters.  So, a large number of alveolar articulation points in a language would be a good indicator that the language has a paucity of fricatives.  So, p(F2|F1) is also close to 1.  So we’re left with p(F1, F2, F3) = p(F1).  p(F1) is not likely to be low enough to establish beyond reasonable doubt that the two languages are interrelated.

In order to complete my case, I’d still have to do all of the following:

a)  find more such features
b)  show that p(F) is low
c)  show that the conditional probabilities are low (high feature independence)

Thanks Dr. M___ C___ for pointing this out!

Now the traditional methods of comparative historical linguistics use features of languages called cognates (similar sounding words).  In doing so, they are biased in how they assign languages to language families.  Using cognates alone, Japanese would be assigned to the same language family as Chinese, but not if we looked at the syntactic, semantic and phonological features of Japanese.  So, I feel that the comparative methodology is incomplete and would need to be supplemented by some other features at the semantic/syntactic levels maybe wrapped into some kind of probabilistic framework.


Aiaioo World!

The default first post on WordPress goes ‘Hello world!’  I thought I’d keep to that general theme, but include the name of the lab in place of ‘Hello’, so it became ’Aiaioo World!’

This is an introduction to the Lab, and the funky name, so it just seemed very apt.  Another reason for this seeming aptness (I was about to say ‘aptitude’) was that both Hello and Aiaioo are interjections.  At a broad level, words belonging to the same ‘Part of Speech’ can be interchanged (or used instead of each other) in semantically meaningful but linguistically similar situations.  However, the devil is in the semantics.  For example, both ‘aptness’ and ‘aptitude’ are nouns, but in the second sentence of this paragraph, aptness is more apt than aptitude!

So, back to Aiaioo!  Aiaioo is an interjection whose use seems to run across Asia, from China to India.  Actually, I should take that back.  I do not know if the word is used in the Koreas, Thailand, Vietman, Cambodia and any other country in Asia other than China and India.  So, what I have essentially done is generalized from a sample of two.  That’s a bad thing to do in pure statistical terms, because I have not smoothed my estimators and accounted for the sparseness of my data over the sample space.

Another thing you might have noticed is that the name of the firm is made up entirely of vowels!  Now that’s indeed very surprising!  There are not a lot of words (in English) that are that way.  How many vowels have I used?  In English I would count six.  In any of the languages of India, I would, most probably, count no more than three!


Follow

Get every new post delivered to your Inbox.