Tag: text analysis

Fun with Text – Managing Text Analytics

The year is 2016.

I’m a year older than when I designed the text analytics lecture titled “Fun with Text – Hacking Text Analytics“.

Yesterday, I found myself giving a follow on lecture titled “Fun with Text – Managing Text Analytics”.

Here are the slides:

“Hacking Text Analytics” was meant to help students understand a range text analytics problems by reducing them into simpler problems.

But it was designed with the understanding that they would hack their own text analytics tools.

However, in project after project, I was seeing that engineers tended not to build their own text analytics tools, but instead rely on handy and widely available open source products, and that the main thing they needed to learn was how to use them.

So, when I was asked to lecture to an audience at the NASSCOM Big Data and Analytics Summit in Hyderabad, and was advised that a large part of the audience might be non-technical, and could I please base the talk on use-cases, I tried a different tack.

So I designed another lecture “Fun with Text – Managing Text Analytics” about:

  • 3 types of opportunities for text analytics that typically exist in every vertical
  • 3 use cases dealing with each of these types of opportunities
  • 3 mistakes to avoid and 3 things to embrace

And the take away from it is how to go about solving a typical business problem (involving text), using text analytics.

Enjoy the slides!

Visit Aiaioo Labs

Fun With Text – Hacking Text Analytics


I’ve always wondered if there was a way to teach people to cobble together quick and dirty solutions to problems involving natural language, from duct tape, as it were.

Having worked in the field now for a donkey’s years as of 2015, and having taught a number of text analytics courses along the way, I’ve seen students of text analysis stumble mostly on one of two hurdles:

1.  Inability to Reduce Text Analytics Problems to Machine Learning Problems

I’ve seen students, after hours of training, still revert to rule-based thinking when asked to solve new problems involving text.

You can spend hours teaching people about classification and feature sets, but when you ask them to apply their learning to a new task, say segmenting a resume, you’ll hear them very quickly falling back to thinking in terms of programming steps.

Umm, you could write a script to look for a horizontal line, followed by capitalized text in bold, big font, with the words “Education” or “Experience” in it !!!

2.  Inability to Solve the Machine Learning (ML) Problems

Another task that I have seen teams getting hung up on has been solving ML problems and comparing different solutions.

My manager wants me to identify the ‘introduction’ sections.  So, I labelled 5 sentences as introductions.  Then, I trained a maximum entropy classifier with them.  Why isn’t it working?

One Machine Learning Algorithm to Rule Them All

One day, when I was about to give a lecture at Barcamp Bangalore, I had an idea.

Wouldn’t it be fun to try to use just one machine learning algorithm, show people how to code up that algorithm themselves, and then show them how a really large number of text analytics problem (almost every single problem related to the semantic web) could be solved using it.

So, I quickly wrote up a set of problems in order of increasing complexity, and went about trying to reduce them all to one ML problem, and surprised myself!  It could be done!

Just about every text analytics problem related to the semantic web (which is, by far, the most important commercial category) could be reduced to a classification problem.

Moreover, you could tackle just about any problem using just two steps:

a) Modeling the problem as a machine learning problem

Spot the appropriate machine learning problem underlying the text analytics problem, and if it is a classification problem, the relevant categories, and you’ve reduced the text analytics problem to a machine learning problem.

b) Solving the problem using feature engineering

To solve the machine learning problem, you need to coming up with a set of features that allows the machine learning algorithm to separate the desired categories.

That’s it!

Check it out for yourself!

Here’s a set of slides.

It’s called “Fun with Text – Hacking Text Analytics”.

How can you find out who wants to buy your products?

One way to understand a brand’s strengths is to see how often people say they want to buy that brand, on Twitter or other social media.

We have a demo of Twitter analysis for finding customers.

The following link takes you to a live demonstration of purchase intention detection being run on a Twitter stream:


What you will see on the page is tweets of people wanting to purchase something.

It is a fully automatic way of learning how many people want to buy your products, and who they are.

Event and Fact Analysis

In the post “Intent on Intentions”, I’d talked a bit about the Speech Act Theory of Searle and Winograd.

In this blog post, I’d like to look at all other utterances. What purpose do utterances have if they are meaningful, but are not a Speech Act?

It turns out that meaningful utterances that do not convey Speech Acts, typically convey information. Information in turn comes in two flavours – events and facts. Facts represent states of the world (they describe relations between entities or describe properties of entities). Events represent changes.

For example, “London is in England” is a fact, whereas “London Bridge is falling down” is an event.

Entities are the things being talked about. In the sentences used to illustrate events and facts above, the following entities may be observed: “London”, “England” and “London Bridge”.

The distinction between intentions, events and facts is not watertight. There are times when utterances can cross the boundaries and fall into more than one of these categories.

Interestingly, there are different uses for the three kinds of text analysis (analysis of intention, analysis of events, and analysis of fact) and types of data that they may be applied to.

Data Sources

  • Event Analysis: News articles, because news
    reports are always about important happenings or changes in the state of the world, and
    hence are rich with events and also with facts.
  • Fact Analysis: Wikipedia, other Encyclopedias and Knowledge Bases are full of facts,
    but don’t necessarily report current events.
    They may contain information on events that
    took place in another age.
  • Intention Analysis: Emails Messages, Customer Feedback, Social
    Media Messages

Enterprise Applications

  • Event Analysis: Media Monitoring Tools,
    Opportunity Identification Tools, Conformance and Discovery Tools
  • Fact Analysis: Enterprise Search, Semantic
    Web, Logic and Inference Engines
  • Intention Analysis: CRM Tools, Collaboration Tools, Task Management Tools, Communication Devices

Here is a link to a whitepaper on the topic of doing a 360 degree analysis of text.