Businesses have more data than ever. This is unquestionable. The challenge remains for many, however, in extracting the potential value from it.
Companies are well aware of that potential value. Extracting actionable insights from their data can lead to better decision-making around customer strategy, staff retention, profit and guide business strategy in difficult economic conditions. That’s why many businesses continue to invest heavily in their analytics capabilities – building teams of dozens or even hundreds of analysts and furnishing them with expensive analytics software.
And yet their analysts’ critical work is often slow, frustrated and inefficient. Their models and reports fail to have the business impact the leadership imagine.
The central issue is rarely with the quality of the analysts, nor the tools they’re using.
Analytics software can struggle with large data sets, true, but the software is usually only a small part of the equation. The single biggest problem lies in the terrible state of the data sources that analysts are being asked to work with. Poor quality source data results in analysts investing huge portions of their time (and expending valuable mental energy) pulling data into suitable formats before beginning the analysis.
A survey of 23,000+ data professionals revealed analysts only spend c. 12% of their time extracting insights from data.
Successful, efficient analytics relies on having a trustworthy, clean data set to work with.
This isn’t news. Data Science requires its practitioners to manipulate data, often very large volumes of data. Data gathering (also known as ‘wrangling’ or ‘munging’) is a vital step. Data gathering involves refining raw data into formats better suited for consumption by downstream systems and users.
Here's a simplified workflow of a typical data science project:
Data gathering is highlighted here because it is so important for the efficiency and quality of the project. It’s the first truly technical part of the data science project, and it’s also the skillset that is least taught (or sought) in data science. It is, in fact, a data engineering skillset. Data gathering can include exploration, transformation, enrichment and validation.
We’ve talked about the relationship between data science and data engineering in a previous blog series. For now, it’s enough to appreciate that these are two different skillsets, and there are good reasons why a data scientist wouldn’t claim to be a data engineer, or vice versa. Data engineers are highly skilled at organising data, integrating different data sources and making them sit nicely alongside one another. In short, they’re perfectly suited to the data wrangling stage of a data science project – they’re comfortable working with messy data and can find solutions. Crucially, they also possess the technical skills to get the job done fast and to automate the process for ongoing efficiency.
By contrast, data scientists’ core skills lie in understanding business problems and knowing where to look for the right data, interpreting that data through advanced models and algorithms, and communicating the important facts with a high level of clarity and brevity.
The two skillsets are highly complementary, both technical, and yet rarely reside in one individual.
That’s why at Optima we have built teams of both Data Scientists and Data Engineers, and we often find that the best results come when we deploy both sets of skills. We complete projects to a high standard, fast, and the data pipelines that our engineers build keep analysts working efficiently for long after we’ve handed over the keys.
You might say that data engineers set the stage for data scientists to do what they do best.
In most situations, you just can’t have one without the other. Of course, you can (and businesses often do), but it means analysts are left doing a lot of data heavy lifting that’s outside their core skillset. And that is the root cause of many problems with data science projects. Organisations continue to invest in data science and analytics and expect their staff to work magic. And in many ways they probably do, it’s just that the analysts’ work is a lot slower and more expensive than it should be.
So next time you wonder why your analysts aren’t delivering as much value as you might expect, consider that if the data they're dealing with isn't properly managed, they’re likely spending over half of their precious time battling problems that their engineering counterparts would eat for breakfast.
Here’s a real-world example of the synergy working well at Optima:
We were appointed by a business that wanted to understand how their global marketing team could work more efficiently. The project required us to materialise operational data from their Kanban boards and translate it into meaningful dashboards. The data itself was complex and there was a lot of it. We recognised that attempting to bring the data straight into a visualisation tool like PowerBI would be a fruitless task. Tools like PowerBI aren’t equipped to process large volumes of complex data. Similarly, neither R nor Python is a suitable tool for data extraction at scale. While those tools can do data extraction, in this case it would have been the equivalent of using a Swiss Army Knife to chop down a tree. Rather, we started by bringing in a data engineer (equipped with an axe!) to format the data into new tables. That took less than a week. By the end of the second week, we had the initial dashboards set up in PowerBI, ready for analysis. Had we sent a data scientist to tackle the problem alone, they might have been struggling for months with nothing to show for it.
Of course, that’s an extreme example. In the majority of cases, analysts can get the job done. But the inefficiency is real, and the problem compounds with every analyst and every project.
With better formatted data to work with, analysts might spend more than 12% of their time doing the job they’re actually being paid to do – extracting insights to drive the business forwards.
The quality of the data that goes into data science projects is a huge factor in determining the quality of the output. The old saying ‘Garbage in, garbage out’ stands. Every data scientist will agree. In large organisations with multiple data centres and legacy systems to contend with, analysts and data scientists typically don’t have all the skills required to format the data quickly. That's an engineer's job. And while asking a scientist to do an engineer’s job might eventually yield some insight, it will also result in several much less desirable outcomes: a lot of wasted time, higher costs, and deeply frustrated scientists.
Once you've experienced Optima, we' re confident you'll realise there's no better team to trust with your data. That's why we're offering free one-hour consultations. Meet our experts one-to-one. No salespeople. Just advice.Book Your Free Slot Now