Data Scientists and Data Engineers: Role Comparison (Part 2)

Lifting the bonnet on what it takes to be a Data Scientist or a Data Engineer.

|

19 June 2020

BLOG > PERSPECTIVE

Data Scientists and Data Engineers: Role Comparison (Part 2)

Lifting the bonnet on what it takes to be a Data Scientist or a Data Engineer.

19 June 2020

Did you hear the one about the surgeon who performed a leg amputation without an anaesthetist?

No, neither did we. Not in the last couple of hundred years, anyway. Tasks like that simply can’t be done by just one person. Okay, technically they can, but you probably wouldn’t want to be on the operating table in that hospital.

Point being that there’s a huge difference between just doing something and doing it properly. The difference between a proper operation and one that really hurts (and that is unlikely to have a brilliant long-term prognosis) is like the difference between getting real value from your data and merely collecting it. In today’s ultra-competitive business environment, that can highlight another critical difference: the one between survival and success.

A single ‘Jack of all trades’ type might be able to cosh your data into submission, hack it into some sort of functional state and then slap it back into life, but the whole-life ramifications for your business could be dire. To painlessly use your data with far fewer post-operative complications, you don’t need one generalist. You need two specialists: a data scientist and a data engineer.

Part 1 of this blog series discusses why data scientists and data engineers make a formidable team, and why both roles are essential for any business serious about getting more value from its data. But what is it that distinguishes a data scientist from a data engineer?

Like the surgeon and the anaesthetist, data scientists and data engineers actually have a lot in common. High-performing scientists and engineers both excel at solving problems, albeit logical rather than physiological. They also share an aptitude for mathematics and other science-based disciplines, a level of curiosity bordering on the obsessive, a facility for lateral thinking, and a healthy dose of technical flair.

So where do the two paths diverge? Essentially it comes down to the type of problems each specialist solves. Data engineers focus on technical problems, designing and building data platforms with repeatable, automated processes to connect and optimise the quality of data sets from multiple sources. The currencies they deal in are facts, diagrams and absolutes.

Data scientists untangle important business problems using insights derived from data. They design and deploy statistical analysis techniques that will allow the data to answer those questions. Their currencies are confidence levels, graphs,stories, and discoveries.

Engineers integrate data from multiple sources, migrating, importing, matching and de-duplicating data. Scientists employ machine learning and AI techniques to build the analytical models that will create profit-building or cost-reducing results. They possess the surgeon’s ability not only to understand the challenge at hand but also to make the incisive moves that will reliably deliver the right results. Through sophisticated analytical methods and visual storytelling, data scientists can accurately predict trends, steer decision-making and measure the effectiveness of new strategies.

Engineers help scientists do what they do best

Engineers smooth the scientists’ path by building high quality data pipelines, tracking data lineage, and keeping data integrity high. Through strong problem solving and lateral thinking, particularly when working with ‘legacy’ systems, a top data engineer will be able to construct an ideal conduit combining fast, free-flowing processing with watertight security. They also possess reporting skills, though typically these are more geared towards operational considerations.

Another distinguishing feature of the two roles is the toolkits they bring to work.

Modern business demands that data be used more intelligently – and with more certainty - than ever. Data engineers’ toolkits have developed to allow them to perform hugely complex data manipulation tasks, combining and transforming data sets from multiple sources into formats that aren’t just consistent and useable but also notably fast and accurate. Assisting engineers in their work are database management languages like SQL and system tools like SQL Server, Oracle and Teradata; Azure and AWS in the Cloud environment; data exchange formats such as CSN, JSON, and XML; web APIs like REST and SOAP; and customer relationship and intelligence platforms like Salesforce and SAS.

Meanwhile, data scientists need to be able to interrogate data, identify significant variables and run models to identify patterns. They need to test and refine their statistical models, identifying the ones that give the most reliable results. They then need to be able to convey their findings in the most understandable, compelling manner possible – often to colleagues with considerably less technical knowledge and that have business-critical decisions to make. These requirements have led to the expansion of their toolkit to include languages and tools such as R and Python, and business intelligence (BI) tools such as PowerBI, Qlik and Tableau.

While there’s no set path to becoming either a data scientist or a data engineer, the latter’s educational CV will usually revolve around a broad range of ‘STEM’ subjects (Science, Technology, Engineering and Mathematics). Data scientists are more likely to come from a slightly narrower STEM subset, such as Maths, Stats or Software Engineering.

Data scientists and data engineers are no more interchangeable than surgeons and anaesthetists. Scientists may please senior managers and the board, whereas engineers tend to be loved by operations, marketing, analytics and IT teams. But at the end of the day it’s the whole team that grateful relatives will thank when their loved ones come home safe from the hospital.