Yesterday, I found myself giving a follow on lecture titled “Fun with Text – Managing Text Analytics”.
Here are the slides:
“Hacking Text Analytics” was meant to help students understand a range text analytics problems by reducing them into simpler problems.
But it was designed with the understanding that they would hack their own text analytics tools.
However, in project after project, I was seeing that engineers tended not to build their own text analytics tools, but instead rely on handy and widely available open source products, and that the main thing they needed to learn was how to use them.
So, when I was asked to lecture to an audience at the NASSCOM Big Data and Analytics Summit in Hyderabad, and was advised that a large part of the audience might be non-technical, and could I please base the talk on use-cases, I tried a different tack.
So I designed another lecture “Fun with Text – Managing Text Analytics” about:
3 types of opportunities for text analytics that typically exist in every vertical
3 use cases dealing with each of these types of opportunities
3 mistakes to avoid and 3 things to embrace
And the take away from it is how to go about solving a typical business problem (involving text), using text analytics.
Here are a few more lines from Kabir’s inverted verse:
A tree stands without roots
A tree bears fruit without flowers
Someone dances without feet
Someone plays music without hands
Someone sings without a tongue
Water catches fire
Someone sees with blind eyes
A cow eats a lion
A deer eats a cheetah
A crow pounces on a falcon
A quail pounces on a hawk
A mouse eats a cat
A dog eats a jackal
A frog eats snakes
What’s interesting about all of these is that they’re examples of entity-relationships that are false.
Let me first explain what entities and relationships are.
Entities are the real or conceptual objects that we perceive as existing in the world we live in. They are usually described using a noun phrase and qualified using an adjective.
Relationships are the functions that apply to an ordered list of entities and return a true or false value.
For example, if you take the sentence “The hunter hunts the fox,” there are two entities (1. the hunter, 2. the fox). The relationship is “hunts”, it returns true for the two entities presented in that order.
The relationship “hunts” would return false if the entities were inverted (as in 1. the fox and 2. the hunter … as in the sentence “The fox hunts the hunter”).
In fact it is entities and relationships such as these that it was speculated would some day make up the semantic web.
Most of Kabir’s inverted verse seems to be based on examples of false entity relationships of dual arity (involving two entities), and that often, there is a violation of entity order which causes the entity function to return the value false.
In the “cow was milked” song, the relationship that is violated is the temporal relationship: “takes place before”.
In the “ant’s wedding” song, the relationship that is violated is that of capability: “can do”.
In the rest of the examples, relationships like “eats”, “hunts”, “plays”, “dances”, “bears fruit”, etc., are violated.
In Osho’s “The Revolution”, he talks about Kabir’s interest in and distrust of language, quoting the poet as saying:
I HAVE BEEN THINKING OF THE DIFFERENCE BETWEEN WATER
AND THE WAVES ON IT. RISING,
WATER’S STILL WATER, FALLING BACK,
IT IS WATER. WILL YOU GIVE ME A HINT
HOW TO TELL THEM APART?
BECAUSE SOMEONE HAS MADE UP THE WORD ‘WAVE’,
DO I HAVE TO DISTINGUISH IT FROM ‘WATER’?
And Osho concludes with:
Kabir is not interested in giving you any answers — because he knows perfectly well there is no answer. The game of question and answers is just a game — not that Kabir was not answering his disciples’ questions; he was answering, but answering playfully. That quality you have to remember. He is not a serious man; no wise man can ever be serious. Seriousness is part of ignorance, seriousness is a shadow of the ego. The wise is always non-serious. There can be no serious answers to questions, not at least with Kabir — because he does not believe that there is any meaning in life, and he does not believe that you have to stand aloof from life to observe and to find the meaning. He believes in participation. He does not want you to become a spectator, a speculator, a philosopher.
This genre of verse seems to have been a tradition in folk religious movements in North India. In “The Tenth Rasa: An Anthology of Indian Nonsense” by Michael Heyman, Sumanya Satpathy and Anushka Ravishankar, they talk about Namdev, a 13th century saint-poet as having authored such verses as well.
In the post “Intent on Intentions”, I’d talked a bit about the Speech Act Theory of Searle and Winograd.
In this blog post, I’d like to look at all other utterances. What purpose do utterances have if they are meaningful, but are not a Speech Act?
It turns out that meaningful utterances that do not convey Speech Acts, typically convey information. Information in turn comes in two flavours – events and facts. Facts represent states of the world (they describe relations between entities or describe properties of entities). Events represent changes.
For example, “London is in England” is a fact, whereas “London Bridge is falling down” is an event.
Entities are the things being talked about. In the sentences used to illustrate events and facts above, the following entities may be observed: “London”, “England” and “London Bridge”.
The distinction between intentions, events and facts is not watertight. There are times when utterances can cross the boundaries and fall into more than one of these categories.
Interestingly, there are different uses for the three kinds of text analysis (analysis of intention, analysis of events, and analysis of fact) and types of data that they may be applied to.
Event Analysis: News articles, because news
reports are always about important happenings or changes in the state of the world, and
hence are rich with events and also with facts.
Fact Analysis: Wikipedia, other Encyclopedias and Knowledge Bases are full of facts,
but don’t necessarily report current events.
They may contain information on events that
took place in another age.
Intention Analysis: Emails Messages, Customer Feedback, Social
Event Analysis: Media Monitoring Tools,
Opportunity Identiﬁcation Tools, Conformance and Discovery Tools
Fact Analysis: Enterprise Search, Semantic
Web, Logic and Inference Engines