What Does a Data Engineer Do?
March 10, 2021
Data scientists make headlines, but data engineers make data science possible. All the information that data scientists analyze passes through the hands of oft-overlooked data engineers first.
Data engineers are the frontline warriors of the analytics world. They go by many titles, including technical architects, BI developers, data science software engineers, and ETL developers. No matter what they're called, these professionals take the first pass at the massive amounts of structured data, unstructured data, and semi-structured data that businesses, researchers, and government agencies access in our increasingly connected world.
Data engineers develop the data infrastructure and interfaces necessary to collect data from different sources. They also design the systems used to transform that information into clean data sets that data analysts and data scientists can sort in ways that lead to useful conclusions. That data transformation makes everything that happens in the data science world possible.
Given that, you may be wondering why data engineers don't make headlines the way data scientists do. The reason may be that while even laypeople can understand what data scientists do, it's not always clear what role data engineers play in data analysis. Data engineering isn't any more difficult to understand than data science, however, as you'll discover below.
In this article, we discuss what a data engineer does and answering the following questions:
- What are a data engineer's key responsibilities?
- What does a data engineer's typical day look like?
- Do data engineers and data scientists do the same things?
- What skills does a data engineer use?
- Which degrees do data engineers usually have?
- What tools do data engineers use?
- How much do data engineers get paid?
- What kinds of people are most likely to succeed in this career?
What are a data engineer's key responsibilities?
Simply put, data engineers manage data. They construct and oversee database architecture. They determine how data is collected and stored. They prepare that data for analysis by creating the pipelines that transform raw data into useful formats. And sometimes, they even create the systems through which people who aren't part of the data science or data engineering team can access essential data (via, for instance, a custom real-time analytics dashboard).
According to Dataquest, there are three main types of data engineers, and their responsibilities differ somewhat. First, there is the generalist. These data engineers tend to work at small companies where there aren't many data-focused employees, so they have to manage data and analyze it, too. Then there are the pipeline-centric data engineers, who work closely with data scientists, building whatever custom tools they might need to accomplish certain big data analytics goals. Finally, there are database-centric data engineers. They may do some pipeline building, but they spend the majority of their time creating large-scale data warehouses that make it easier for analysts and data scientists to do their jobs.
What does a data engineer's typical day look like?
Data engineers spend their days:
- Building custom data pipelines based on business logic
- Collaborating with a database administrator to create data stores
- Collecting data from various sources
- Creating new data validation methods
- Developing frameworks to serve data
- Evaluating, parsing, and cleaning data sets
- Gathering requirements for data models
- Identifying and analyzing new data sources
- Maintaining computer clusters
- Making sure data is secure
- Preparing data as part of ETL (extract, transform, and load) processes
- Managing real-time data processing
- Sorting through raw data
- Stitching data from various sources together
- Writing ETL logic
- Writing queries that deliver accurate results
Most data engineers won't do all of these things every single day, however. A data engineer's day might begin with maintenance tasks like checking logs and looking into pipeline failures. Fixing issues and adding in any missing features required to keep a data warehouse well-stocked might keep our hypothetical data engineer busy until lunchtime. Next, they'll turn their attention to whatever big project (e.g., creating a new ETL pipeline) is currently in their inbox. If they've already mapped out an implementation plan, they'll start coding.
During all this, however, data scientists and other data users may be submitting tickets related to missing data, duplicate data, and errors. Sometimes the data engineer will prioritize the big project. At other times, tickets will take priority. Every day is different when you become a data engineer.
Do data engineers and data scientists do the same things?
Data science is a team sport. Data engineers work with data architects, data warehouse engineers, data platform engineers, analytics engineers, DevOps engineers, and yes, data scientists. In the smallest organizations, a single data-focused professional may be responsible for everything related to data acquisition, data pipeline creation, and data analysis. At most companies, however, all of these professionals do different things.
Data scientists use math and statistics to interpret data and deliver insights related to it. A data scientist might be given a business problem to solve, and they will need to decide which data can help solve it. In other cases, they'll be given data and then asked to extract as much meaning from it as possible. Data engineers, on the other hand, use programming to build the infrastructure that serves up data in ready-to-interpret formats. Both have important roles to play in the data science world.
What skills does a data engineer use?
Data engineers utilize a broad range of skills every day. It's not unreasonable to say that technical skills are the most critical. Indeed, the more technical skills a data engineer has, the better. They need solid coding chops, and they need to understand how to use ETL tools. Automation is an increasingly important part of data engineering, and data engineers must have both the knowledge and the problem-solving skills necessary to automate tedious data collection and sorting processes.
However, data engineers also need soft skills—especially communication skills and people skills. "A data engineer serves internal teams, so he or she has to understand the business goal that the data analyst wants to achieve to best support them," Paul Lappas, co-founder and CEO of Intermix, told the Stitch blog. "If a data scientist has a specific tool they want to use, the data engineer has to set up the environment in a way that lets them use it. So you have to be really good at interacting with the rest of the data team."
Data engineering is a collaborative enterprise. Almost everything data engineers do involves communicating and interacting with others.
Which degrees do data engineers usually have?
Some data engineers launch their careers without degrees, but most have at least a bachelor's degree. That's because even though it's all but impossible to learn everything you'll need to know to become a data engineer in a degree program, most employers prefer to hire candidates with degrees.
Deciding which degrees can be a challenge because there are very few dedicated data engineering degrees at the undergraduate or graduate levels in the US. Most data engineers study computer science, data analytics, information systems, or data science, and if a program offers a data engineering specialization, all the better. Kent State University at Kent, for instance, offers a Bachelor of Science in Computer Science program with a data engineering concentration, and Wellesley College offers a Bachelor of Data Science with a dual concentration in computer science and data engineering.
Meanwhile, Northeastern University, Stevens Institute of Technology, and the University of Wisconsin - Madison offer some of the only master's programs focused specifically on data engineering.
The best degree programs for data engineers cover:
- Architectural projections
- Data mining techniques
- Data modeling techniques
- Database clustering techniques
- Data platforms
- ETL design
- Programming languages
- Relational and non-relational database theory
It's also possible to learn much of what you'll need to know to succeed in this role outside of school. There are many online bootcamp-style courses designed for aspiring data engineers and courses that will teach you to use the most common data engineering tools on sites like Udemy. It's not unusual for data engineers to pick up the technical skills they need to succeed in this role on their own.
What tools do data engineers use?
Data engineers use a lot of different tools, and most employers expect data engineers to have experience with nearly all the following:
- Algorithms and data structures
- Apache Spark
- Data warehousing tools
- Distributed systems
- ETL tools
- Google Cloud Platform (GCP), Amazon Web Services (AWS), and/or Microsoft Azure
- Machine learning
- SQL and NoSQL
- UNIX, Linux, and Solaris
Continuing education for data engineers often involves learning to use the tech tools and programming languages that weren't covered in their degree program (whether because they didn't enroll in a data engineering program or because the tools weren't standard when they were in school). Many certifications for data engineers are tool-specific, like:
- Cloudera's Certified Professional Data Engineer credential
- Google's Cloud Data Engineer Certification
- IBM's Certified Data Engineer credential
- Microsoft's Certified Solutions Associate in Data Engineering with Azure credential
- Microsoft's MCSE: Data Management and Analytics credential
How much do data engineers get paid?
Data engineers are paid quite well. The average entry-level data engineer salary is about $77,000, and the highest-paid data engineers earn close to $160,000. Average salaries for mid-career data engineers can be anywhere from $89,000 to $124,000, which is a big range. It may be that higher earners have certifications or have completed a higher level of education, or that data engineers working in major metro areas are earning a lot more than their peers in suburban locales.
The good news is that $89,000 can go pretty far in many parts of the US, and there are more than 100,000 job openings for data engineers on Indeed.com—many of which were posted by employers who are looking to fill those positions quickly. That means this is a role that comes with a reassuring level of job security. The Dice 2020 Tech Job Report called data engineer the fastest-growing job in technology.
What kinds of people are most likely to succeed in this career?
To do what they do, data engineers need a firm understanding of programming, database design and configuration, and systems architecture, but that's not all. They also need patience, determination, and a certain amount of grit. As Paul Lappas put it in the Stitch interview linked above, data engineering "is very difficult. It's an unsexy job, but it's super-critical. Data engineers are kind of like the unsung heroes of the data world. Their job is incredibly complex, involving new skills and new tech. It's really hard to build new ETL pipelines."
It's not a job for wannabe rock stars. Data engineers do work that's every bit as challenging and technically rigorous as that of data scientists, but they're seldom acknowledged for their work in the same way. That means data engineers have to love cleaning up raw data just because it's there. They have to get a kick out of developing beautiful data architectures. And they have to be cool with the fact that if they want a pat on the back, they're probably going to have to do the patting themselves. Successful data engineers don't seem to mind, however. Possibly because they're too busy munging data and bringing home the big bucks.
Questions or feedback? Email email@example.com