Fun with Text – Managing Text Analytics

The year is 2016.

I’m a year older than when I designed the text analytics lecture titled “Fun with Text – Hacking Text Analytics“.

Yesterday, I found myself giving a follow on lecture titled “Fun with Text – Managing Text Analytics”.

Here are the slides:

“Hacking Text Analytics” was meant to help students understand a range text analytics problems by reducing them into simpler problems.

But it was designed with the understanding that they would hack their own text analytics tools.

However, in project after project, I was seeing that engineers tended not to build their own text analytics tools, but instead rely on handy and widely available open source products, and that the main thing they needed to learn was how to use them.

So, when I was asked to lecture to an audience at the NASSCOM Big Data and Analytics Summit in Hyderabad, and was advised that a large part of the audience might be non-technical, and could I please base the talk on use-cases, I tried a different tack.

So I designed another lecture “Fun with Text – Managing Text Analytics” about:

  • 3 types of opportunities for text analytics that typically exist in every vertical
  • 3 use cases dealing with each of these types of opportunities
  • 3 mistakes to avoid and 3 things to embrace

And the take away from it is how to go about solving a typical business problem (involving text), using text analytics.

Enjoy the slides!

Event and Fact Analysis

In the post “Intent on Intentions”, I’d talked a bit about the Speech Act Theory of Searle and Winograd.

In this blog post, I’d like to look at all other utterances. What purpose do utterances have if they are meaningful, but are not a Speech Act?

It turns out that meaningful utterances that do not convey Speech Acts, typically convey information. Information in turn comes in two flavours – events and facts. Facts represent states of the world (they describe relations between entities or describe properties of entities). Events represent changes.

For example, “London is in England” is a fact, whereas “London Bridge is falling down” is an event.

Entities are the things being talked about. In the sentences used to illustrate events and facts above, the following entities may be observed: “London”, “England” and “London Bridge”.

The distinction between intentions, events and facts is not watertight. There are times when utterances can cross the boundaries and fall into more than one of these categories.

Interestingly, there are different uses for the three kinds of text analysis (analysis of intention, analysis of events, and analysis of fact) and types of data that they may be applied to.

Data Sources

  • Event Analysis: News articles, because news
    reports are always about important happenings or changes in the state of the world, and
    hence are rich with events and also with facts.
  • Fact Analysis: Wikipedia, other Encyclopedias and Knowledge Bases are full of facts,
    but don’t necessarily report current events.
    They may contain information on events that
    took place in another age.
  • Intention Analysis: Emails Messages, Customer Feedback, Social
    Media Messages

Enterprise Applications

  • Event Analysis: Media Monitoring Tools,
    Opportunity Identification Tools, Conformance and Discovery Tools
  • Fact Analysis: Enterprise Search, Semantic
    Web, Logic and Inference Engines
  • Intention Analysis: CRM Tools, Collaboration Tools, Task Management Tools, Communication Devices

Here is a link to a whitepaper on the topic of doing a 360 degree analysis of text.