Adding clarity through fuzz

Thursday 25th October 2018, 9:22am

Data duplication could be ruining up to 30 percent of your company’s database – and simply throwing software at it won’t fix it.

Generally speaking, life is amazing. Sometimes, however, it can be annoying.

Missing a motorway turnoff and realising that the next exit is 19 miles away – that’s annoying. So is discovering that the cheap flight you were after has somehow doubled in price in the two minutes since your last site visit.

Receiving two identical letters from your bank addressed to two different versions of yourself might not seem that high on the peskiness scale. When you think about it a little deeper though, you’ll see that your double-identity problem is only the tip of a vast, melting iceberg that is negatively impacting both you and the bank.

As a thinking customer, you’ll know that the money spent on generating, printing and posting one of those two bank letters was utterly wasted. Multiply that mistake by a few thousand, or a few hundred thousand depending on just how much duplication there is on the bank’s database, and you’re talking about serious amounts of wasted cash.

Wasted cash hurts the bank’s bottom line. As a customer, you’ll end up covering that through increased fees, eroding not just your own personal bottom line but also whatever goodwill there might have been between you and your bank. You might easily end up becoming annoyed enough to take your business elsewhere.

In short, if you’re a bank or any other company holding personal data for comms or other purposes, data duplication is not good news. It’s especially annoying because it’s entirely avoidable.

How? To understand that, first you need to understand how data duplication arises and why, once it’s established, it’s rarely tackled inside any company.

The first thing to know is that it’s not the computer’s fault. ‘Computer errors’ are rarely errors by the computer, but by the computer operator. When data is being entered in a hurry, mistakes occur. You see it every week in time-pressured retail environments like supermarket checkouts, but it’s also happening on a much bigger scale in closed commercial environments everywhere. 

Duplication happens because, despite our fond sci-fi imaginings, regular computers really aren't very bright. When Bob Builder is put onto a database not just as Bob Builder but also as Robert Builder, Rob Builder, Robbie Builder, or even Roberto Builder, we all know that’s just one person – but the computer sees five people. Without the crutch of uneconomic artificial intelligence, computers can’t do the remarkable ‘dot connecting’ automatic ID and selection trick that we human beings take for granted.

That’s how the corporate sickness of data duplication takes hold. But why is it so rarely cured? It’s to do with separation and silo thinking. Customer data is collected in many different ways. It might be through a survey, through a customer services or order system, through a website that requires customers to log in, through digital TV, through a smartphone app or through any number of whizzo wheezes that are yet to be dreamt up. These touchpoint-based information gathering systems usually operate quite independently of each other. To put it another way, they don’t talk. It’s classic silo stuff.

Bringing all that disparate data into one accessible-to-all place and then matching the records up so they make sense is a challenge that a gung-ho IT department will sometimes try to take on. That’s brave of them, but the result will often either take an age to deliver, be unfit for purpose when it is finally delivered, or actually make the problem worse.

Outside agencies will propose do-it-all software solutions that promise automatic record matching without any human intervention – and, it would seem, without paying much heed to the oversimplification that results.

Optima Connect takes a different approach. By using advanced ‘fuzzy matching’ techniques and the human eyeballing that we know is essential for truly accurate data matching, we steer around the main danger of oversimplified matching – namely, customer leakage caused by inaccuracy or by a failure to identify and eliminate duplicates.  

By running a sample of data through a matching tool that one of the Optima team helped develop, pairs of records are given accuracy scores. High scores are best, but inevitably there’ll be a bit in the middle – a grey area – that needs some attention. All variables are examined and analysed. We will then agree a set of business rules with the client to improve the matching (and reduce that grey area) before running the process again, rescoring the records according to the new rules and re-examining all the grey areas that cropped up in the previous iteration. And we’ll keep repeating that process, refining, rescoring and re-honing, until we reach an iteration – and a set of rules – that everyone is happy with.

If you liken your database to a car engine, going for the average de-duplicating ‘fix’ will be like putting slightly dodgy fuel into the tank and not worrying about the fact that the throttle only goes two-thirds of the way down. Treat a real car engine like that and it will go into limp-home mode and eventually die.

The key Optima difference is that we tune your database engine to get it running at peak efficiency and at full throttle. Over a period measured in weeks rather than months or years, our unique data-matching process will turn your database from a flabby, wasteful millstone into a tight, muscular resource that adds serious power and genuine value to your business.

This last bit might come as a shock. Optima’s experience indicates that the data duplication rate within a medium-sized company can be as high as 30 percent. That’s three in ten campaign actions incurring inappropriate, ineffective and wholly unnecessary costs.  You probably wouldn’t want to be the one announcing that little titbit at the next shareholder’s meeting.


Written by Tony Middlehurst