Tag: relation extraction

Fun with Text – Managing Text Analytics

The year is 2016.

I’m a year older than when I designed the text analytics lecture titled “Fun with Text – Hacking Text Analytics“.

Yesterday, I found myself giving a follow on lecture titled “Fun with Text – Managing Text Analytics”.

Here are the slides:

“Hacking Text Analytics” was meant to help students understand a range text analytics problems by reducing them into simpler problems.

But it was designed with the understanding that they would hack their own text analytics tools.

However, in project after project, I was seeing that engineers tended not to build their own text analytics tools, but instead rely on handy and widely available open source products, and that the main thing they needed to learn was how to use them.

So, when I was asked to lecture to an audience at the NASSCOM Big Data and Analytics Summit in Hyderabad, and was advised that a large part of the audience might be non-technical, and could I please base the talk on use-cases, I tried a different tack.

So I designed another lecture “Fun with Text – Managing Text Analytics” about:

  • 3 types of opportunities for text analytics that typically exist in every vertical
  • 3 use cases dealing with each of these types of opportunities
  • 3 mistakes to avoid and 3 things to embrace

And the take away from it is how to go about solving a typical business problem (involving text), using text analytics.

Enjoy the slides!

Visit Aiaioo Labs

Advertisements

Kabir and Language

Kabir
Image from Wikipedia

Yesterday, I went to a concert of songs belonging to the tradition of a 15th century saint-poet called Kabir, and came across a very interesting song that he is said to have composed.

It went something like this.

The cow was milked

Before the calf was born

But after I sold the curd in the market

and this:

The ant went to its wedding

Carrying a gallon of oil

And an elephant and a camel under its arms

From the perspective of natural language processing and machine learning, the incongruous situations depicted in these poems turn out having an interesting pattern in them, as I will explain below.

I found more examples of Kabir’s “inverted verses” online.

The poems at http://www.sriviliveshere.com/mapping-ulat-bansi.html come with beautiful illustrations as well.

Here are a few more lines from Kabir’s inverted verse:

A tree stands without roots

A tree bears fruit without flowers

Someone dances without feet

Someone plays music without hands

Someone sings without a tongue

Water catches fire

Someone sees with blind eyes

A cow eats a lion

A deer eats a cheetah

A crow pounces on a falcon

A quail pounces on a hawk

A mouse eats a cat

A dog eats a jackal

A frog eats snakes

What’s interesting about all of these is that they’re examples of entity-relationships that are false.

Let me first explain what entities and relationships are.

Entities are the real or conceptual objects that we perceive as existing in the world we live in.  They are usually described using a noun phrase and qualified using an adjective.

Relationships are the functions that apply to an ordered list of entities and return a true or false value.

For example, if you take the sentence “The hunter hunts the fox,” there are two entities (1. the hunter, 2. the fox).  The relationship is “hunts”, it returns true for the two entities presented in that order.

The relationship “hunts” would return false if the entities were inverted (as in 1. the fox and 2. the hunter … as in the sentence “The fox hunts the hunter”).

The relationship and the entity can be stored in a database and hence can be considered as the structured form of an unstructured plain-language utterance.

In fact it is entities and relationships such as these that it was speculated would some day make up the semantic web.

Most of Kabir’s inverted verse seems to be based on examples of false entity relationships of dual arity (involving two entities), and that often, there is a violation of entity order which causes the entity function to return the value false.

In the “cow was milked” song, the relationship that is violated is the temporal relationship: “takes place before”.

In the “ant’s wedding” song, the relationship that is violated is that of capability: “can do”.

In the rest of the examples, relationships like “eats”, “hunts”, “plays”, “dances”, “bears fruit”, etc., are violated.

Other Commentary

In Osho’s “The Revolution”, he talks about Kabir’s interest in and distrust of language, quoting the poet as saying:

I HAVE BEEN THINKING OF THE DIFFERENCE BETWEEN WATER

AND THE WAVES ON IT. RISING,

WATER’S STILL WATER, FALLING BACK,

IT IS WATER. WILL YOU GIVE ME A HINT

HOW TO TELL THEM APART?

BECAUSE SOMEONE HAS MADE UP THE WORD ‘WAVE’,

DO I HAVE TO DISTINGUISH IT FROM ‘WATER’?

And Osho concludes with:

Kabir is not interested in giving you any answers — because he knows perfectly well there is no answer. The game of question and answers is just a game — not that Kabir was not answering his disciples’ questions; he was answering, but answering playfully. That quality you have to remember. He is not a serious man; no wise man can ever be serious. Seriousness is part of ignorance, seriousness is a shadow of the ego. The wise is always non-serious. There can be no serious answers to questions, not at least with Kabir — because he does not believe that there is any meaning in life, and he does not believe that you have to stand aloof from life to observe and to find the meaning. He believes in participation. He does not want you to become a spectator, a speculator, a philosopher.

Notes

This genre of verse seems to have been a tradition in folk religious movements in North India.  In “The Tenth Rasa: An Anthology of Indian Nonsense” by Michael Heyman, Sumanya Satpathy and Anushka Ravishankar, they talk about Namdev, a 13th century saint-poet as having authored such verses as well.

Event and Fact Analysis

In the post “Intent on Intentions”, I’d talked a bit about the Speech Act Theory of Searle and Winograd.

In this blog post, I’d like to look at all other utterances. What purpose do utterances have if they are meaningful, but are not a Speech Act?

It turns out that meaningful utterances that do not convey Speech Acts, typically convey information. Information in turn comes in two flavours – events and facts. Facts represent states of the world (they describe relations between entities or describe properties of entities). Events represent changes.

For example, “London is in England” is a fact, whereas “London Bridge is falling down” is an event.

Entities are the things being talked about. In the sentences used to illustrate events and facts above, the following entities may be observed: “London”, “England” and “London Bridge”.

The distinction between intentions, events and facts is not watertight. There are times when utterances can cross the boundaries and fall into more than one of these categories.

Interestingly, there are different uses for the three kinds of text analysis (analysis of intention, analysis of events, and analysis of fact) and types of data that they may be applied to.

Data Sources

  • Event Analysis: News articles, because news
    reports are always about important happenings or changes in the state of the world, and
    hence are rich with events and also with facts.
  • Fact Analysis: Wikipedia, other Encyclopedias and Knowledge Bases are full of facts,
    but don’t necessarily report current events.
    They may contain information on events that
    took place in another age.
  • Intention Analysis: Emails Messages, Customer Feedback, Social
    Media Messages

Enterprise Applications

  • Event Analysis: Media Monitoring Tools,
    Opportunity Identification Tools, Conformance and Discovery Tools
  • Fact Analysis: Enterprise Search, Semantic
    Web, Logic and Inference Engines
  • Intention Analysis: CRM Tools, Collaboration Tools, Task Management Tools, Communication Devices

Here is a link to a whitepaper on the topic of doing a 360 degree analysis of text.