Using text analytics to detect fraud related to government tenders

In my last article, I talked about using statistics to detect fraud.  I’d promised to write about methods for detecting and preventing fraud in the issuing of tenders.

The floating of tenders is the primary mechanism by which governments – often the biggest economic force in a geographical area – procure services from private organizations.

If the bidding process is compromised, contracts might end up not going to the best or most efficient vendor as a stakeholder (the tax-payer) would desire.

That in turn results in bad roads, poor infrastructure, delayed projects, under-spending on education, over-spending on military purchases and other problems associated with bad governance. (See:

It also stands to reason that if you could detect tendering fraud, you could solve quite a few of the problems that affect places where corruption is rife.

So how do you tell if a tender has been fairly issued or if it has been gamed to the benefit of a certain participant?

An unfair procurement process can use one of the following methods to ensure that a contract is awarded to a favored party:

Method 1:  Choice of pre-selection criteria

One method used to favour a certain party is the introduction of unnecessary qualifying conditions in the tender that have nothing whatsoever to do with the the end product or service to be procured.

These conditions are added to the tender in order to ensure that only a chosen small set of bidders meet all the conditions for participation in the tender process.

Method 2:  Cancellation and reissuing of the tender

I have been given to believe (by various sources) that in India/China, 3% of the size of the deal is the norm for kickbacks.

If a very efficient bid is placed, and it brings the cost of the service down so that the 3% kickbacks do not translate into a lot of money or if the winner of the bid refuses to pay a bribe, procurement officials might be able to subvert the process by coming up with reasons to cancel or terminate the tender.

They can then reissue the tender with tighter criteria intended to disqualify the uncooperative bidder.

My Experiences

Now, I must tell you that since the beginning of my career as an entrepreneur in India, I have come across numerous stories of terminated tenders, or of the disqualification of firms from a bidding process because they bid too low to be able to pay much by way of bribes.

I have personally walked into a tendering meeting where the government officials began with the words: “Ladies and gentlemen, we are proud to welcome you to our campus today.  We are extremely sorry that we cannot entertain you the way you entertain us when we come to your campuses.”

The tender was being issued for a software project that I felt should have taken a team of 3 engineers no more than 6 months to deliver.  But the tender stated that only firms with a minimum of “100 crores in revenue each year for the past 5 years,” (approximately 20 million USD each year) could bid for the project.  There were only 6 other firms in the room.

When the officials realised that Aiaioo Labs was a small firm, they suggested we leave.

They said, “There should be some other things we can work on with you.  Let’s meet some other time.”

Over the years, I began to wonder if there was any way that I as a tax payer might protect myself from bad deals (corrupt or price-inefficient deals) entered into by government middlemen with my tax money.

Fortunately, it seems possible to use text analytics to detect and alert an ombudsman to possible fraud in the issuing of tenders.  Below is a description of how such a method might work.

Using text analytics to detect irrelevant selection constraints

If tenders for procuring very different products have very similar pre-selection criteria, they could be flagged as suspicious.

The reason this method might work is that relationships between corrupt officials and client firms can take a considerable amount of time to form (because of the risks involved and the consequent need for caution).  It is easy for corrupt officials to change the favored vendor very frequently.

That would mean that they would have to keep the criteria of selection of firms more or less unchanged across widely varying tenders and over long periods of time.  So, you might find that tenders small and large, for hardware or for software, (in other words, tenders for different services), but issued by the same organization might – if the tender process has become unfair – employ more or less the same set of selection criteria irrespective of what is being purchased.

Tools can be developed to detect these similarities and flag them up for review.  Such tools would have to be able to detect the portions of the tender document that are related to the bidder, and the portions of the document that are related to the product or service requested.  It would then have to measure the similarity between the bidder-related sections of the tender documents.  It might also be possible to extract only the qualifying criteria and look for similarities there.  It might also be possible to analyse the bidder selection criteria to see if any criteria might be irrelevant to a project, or incompatible with the requirements of the project.

Using text analytics to detect reissued tenders

If a new tender document’s product or service description sections resemble those in an older tender – and if the issuing organization remains the same, it might be possible that the tender has been reissued.  If it is further found that a very low bid had won the bidding in the previous round of tendering, and that the earlier tender had been cancelled, this could be used as a flag to alert an ombudsman.

Using text analytics to detect vendor-oriented constraints

If many of the conditions for participation in the tender are company-specific (properties of companies such as size or earnings) as opposed to capability-specific (experience in a certain technology space), it might raise a red-flag.

Systems to manage tenders

It might be possible to analyse tenders for fraud if tenders are stored in and managed using a tender management system with fraud detection analytics that both serves as a repository for tender documents, as well as manages the submission of bids and monitors the selection procedure and the life-cycle of projects. This would allow governments to maintain not just a history of tender issuers, but also a history of vendors.  By so doing, governments would be able to determine which vendors are reliable and which are not.

Moreover, it would give people issuing and evaluating tenders more confidence in a low bidder (there is always a danger in projects that someone could bid too low and win the project, but then not be able to execute) and hence help reduce costs. So, a tender fraud detection tool could possibly help governments make better decisions regarding vendors of services and reduce corruption in the process of issuing tenders for the procurement of services and products for government.

Graph Algorithms for Fraud Detection

Text analytics algorithms are difficult and expensive to develop.  Fortunately there are other ways to detect tendering fraud. Patterns of favoritism in tender outcomes can be detected from a bipartite graph of issuing organizations and beneficiaries.  If tenders from a particular issuing organization are found to repeatedly favour a specific vendor from a large field of vendors (more than random probabilities would allow), the organization and vendor could be flagged. Price comparisons across tenders can also be made to determine if any prices have exceeded the price range for similar purchases (this will again require text analytics).

Some Theory

There has been a lot of work on corruption by economists in the last 10 years.  One interesting equation that models corruption is the Klitgaard Corruption Equation. The equation is C = R + D – A where C stands for Corruption, R for Rent (quantum of possible illegal earnings from being in a position of responsibility where corruption is possible) and A stands for Accountability. These concepts are explained very well in the following article (from where I got the following image as well). But Klitgaard does not model one variable that can impact corruption – and that variable is choice. If you increase the choices available to a purchaser, the opportunities for avoiding corruption increase and the likelihood of corrupt transactions occurring decreases.

For example, if everyone in a certain location must only obtain a service from the government office serving their locality, then a person who does not want to pay a bribe does not have the option of travelling to a different office to obtain the service without paying a bribe.  So, if the officer at the local office is corrupt and demands a bribe for rendering a service, then the person has no choice but to cough up the bribe.  This happens in a lot of government offices where registrations have to be performed.  Increasing the choice of service provider lessens the likelihood of people being trapped into giving bribes.

The same forces are at work in the case of tenders.  The strategy of a corrupt tendering official is to artificially reduce the choices of the selectors to only firms that will pay a bribe. Computer systems that fight tendering corruption work by preventing the artificial restriction of choices.


One thought on “Using text analytics to detect fraud related to government tenders

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s