What Is Data Science? [In-Depth Guide to Data Science Degrees, Jobs, Salaries + More]
November 13, 2022
Data science is an amalgam of disciplines spanning statistics, mathematics, computer science, and business analytics—but is it a science? In this guide, we dig into this evolving field so you can decide whether it's for you.
Businesses, organizations, and governments have always collected data, but never on today's scale. The extraordinary popularity of the internet—alongside the integration of computing devices in manufacturing, medicine, and many other sectors—has resulted in an unprecedented information explosion. Internet users alone generate about 2.5 quintillion bytes (that's 2.5 billion gigabytes) of digital data each and every day. Add other data generation sources and you're looking at 7.5 sextillion gigabytes of data. Every. 24. Hours.
We're only just now figuring out how to process these vast data sets for analysis. The vast majority goes untouched.
Data science is trying to change that. Once, it was a niche discipline that the Harvard Business Review referred to as 'the sexiest job of the 21st century.' Executives regarded it as a form of arcane sorcery.
These days, most people understand that the field isn't populated by wizards but rather hardcore numbers and automation nerds. Unfortunately, that realization hasn't helped clarify what data scientists do.
One issue is that everyone is still trying to answer questions like:
- Exactly what is data science good for?
- How does data science differ from data analytics, data engineering, and data architecture?
- Should data scientists be responsible for collecting and processing information, or should they analyze it, too?
- Is data science a team sport?
It's not surprising that there are still so many unanswered questions, given how much technology has changed in the past four decades. So, let's start with an easy one: What is data science? Oracle does an excellent job of answering it concisely when it calls data science a discipline that "combines multiple fields, including statistics, scientific methods, and data analysis to extract value from data." But it's not that simple, of course. Let's explore further.
In this article, we cover:
- What is data science?
- A brief history of data science
- What does a data scientist do?
- What skills does a data scientist need?
- How to become a data scientist
- Where do data scientists work?
- How much money does a data scientist make?
- Where is data science headed?
- Is data science right for me?
What is data science?
Data science is a branch of computer science that uses powerful software and tools like machine learning algorithms to sort through and find useful patterns in immense batches of information.
Put that way, it sounds simple enough. What makes data science so complex is that it borrows from many other disciplines. Dr. Ganapathi Pulipaka, Chief Data Scientist at Accenture, describes data science as a field that blends "software engineering, predictive analytics, machine learning, deep learning, HPC, supercomputing, mathematics, data mining, databases (SQL, NoSQL), Hadoop, streaming analytics platforms for live analysis (Apache Kafka, Apache Flink, Apache Spark, Apache Impala), IoT platforms, edge computing, fog computing, networks, statistics, web development, cloud computing, data engineering, and data visualization."
Data science applications span industries. With enough information and the right tools, data science can solve abstract problems, predict future events, and boost profits for companies that adopt a data-driven approach to problem-solving. For example:
- Netflix uses data science to generate recommendations that keep us watching
- Amazon uses it to develop recommendations that prompt impulse buys
- Data science is why most of us are never more than a few blocks from the nearest Starbucks
- It is the foundation upon which Uber's surge pricing model was built
As the University of Virginia says in the guide to its online Master of Science in Data Science program, "From healthcare to government to business, data has never been so widely accessible or relied-upon to help people do their best work."
Is data science real science?
This question generates a fair amount of controversy. Those who assert that data science is a genuine science back their claim by pointing to the processes it uses. In their view, data science is the study of information. Data scientists observe rough data, develop testable predictions, and create experiments to test those predictions, just like other scientists and researchers. Some even insist that data science is the mother of all science because data scientists work with pure data.
On the other side of the divide is the view that data science isn't an actual science because most data scientists don't do novel research the same way scientists do. Proponents of this view sometimes argue that real scientists don't study pure data, but rather whatever generates that data, to understand how the world works in the here and now. Data science uses statistics and mathematics to build models that make better guesses about the future, but it can only generate probabilities, not facts.
Ultimately, there's no definitive answer. Some data scientists are motivated by scientific principles and take a very scientific approach to their work. Others aren't especially concerned with scientific truth; they use data to find competitive advantages in business.
The data science life cycle
The data science lifecycle is a highbrow way of describing the phases of work in a data science project. There are multiple data science life cycles, but the most widely known is probably CRISP-DM. It lays out six distinct project phases beyond data collection:
- Business understanding
- Data understanding
- Data cleaning/data processing
There are other data science life cycle frameworks, however, including:
- Team Data Science Project (TDSP)
- The Digital Curation Center model
- USGS Data Management Plan Framework (DMPf)
A brief history of data science
Data science is a relatively new discipline. However, it has roots in data analysis, which is as old as counting. Paleolithic tribespeople used tally bones tens of thousands of years ago to track trading activities and supplies and may have used that data to make useful predictions about future events. In the 1600s, John Graunt used statistical data analysis to predict the spread of disease and mortality rates.
People began using digital computers to conduct statistical analysis as soon as reliable computing devices were invented in the 1960s. Once computer use became commonplace, businesses and other organizations began collecting large volumes of data—so vast that traditional statistical analysis was all but useless.
Scientists and businesses eventually started looking for novel ways to derive value from all that data, but technology wasn't up to the challenge until well into the 2000s. After 2010, data science tools became widely available, and the demand for data scientists exploded. More and more sectors and organizations realized how valuable their data could be. For many years, data science jobs were created faster than master's programs and PhD programs could train data scientists. Today, we're only scratching the surface of what data can do, but researchers worldwide are searching for new ways to leverage it.
Who are the top data scientists?
- Yoshua Bengio is well-known for his work on artificial intelligence and neural networks. He's regarded as one of the top professionals leading advancement in deep learning and has contributed to significant advances in natural language processing. Jeff Hammerbacher developed some of the main techniques used to capture, store, and analyze large amounts of data. He helped launch Facebook's first data science team and is credited with enabling the platform to process massive amounts of data.
- Geoffrey Hilton is called the Godfather of Deep Learning for his pioneering work on neural networks and artificial intelligence.
- Fei-Fei Li is a major figure in fields like cognitive neuroscience, machine learning, and AI. She led the team that created ImageNet, a massive visual database that influenced how deep learning evolved.
- Dr. Dhanurjay Patil was Chief Data Scientist of the United States Office of Science and Technology Policy. He's worked as a principal data consultant for companies like PayPal, eBay, LinkedIn, and Salesforce.
- Judea Pearl is known for his work on Bayesian networks and the probabilistic approach to artificial intelligence. He helped develop the technology that made possible driverless cars and speech recognition technology.
- Alex Pentland is one of the most powerful data scientists in the world. He co-leads the World Economic Forum Big Data and Personal Data initiatives and is on the advisory boards of companies like Nissan and Motorola.
- Jurgen Schmidhuber has been called the Father of Modern Deep Learning. He's known for his work on self-improving AI systems and long short-term memory.
What does a data scientist do?
Data science is often defined in terms of what data scientists do. These professionals use math, statistics, programming, and other procedures and tools to find actionable insights in massive amounts of data. Their work involves collecting, cleaning, organizing, and analyzing information, though many data scientists are only responsible for some of these tasks. They might spend more time in data preparation (i.e., processing raw data for future analysis) or writing reports designed to influence decision-making than actually manipulating data.
What are the different roles in data science?
There are many roles to fill in data science—data scientist is only one of them. Some data science professionals have titles like:
- Big Data engineer
- Business analyst
- Data analyst
- Data analytics manager
- Data architect
- Data ecologist
- Data engineer
- Data visualizer
- Machine learning engineer
General job description of a data scientist
Job postings for data scientists are often quite vague—possibly because hiring managers and executives don't have the technical expertise to understand what it is they're looking for.
Companies and organizations know they need professionals who can gather, organize, and analyze large amounts of information to solve business challenges. In a Reddit thread about confusing messages in data science, the original poster wrote:
If you're interested in data science, you can overcome some confusion by reading job listings to see what employers are looking for across sectors.
A day in the life of a data scientist
A typical workday in data science involves email, meetings with other departments, and prioritization. Most data scientists spend some time each day:
- Examining databases
- Collecting and cleaning data
- Sorting through data
- Running regressions and other analyses
- Creating visualizations
- Looking for insights
- Drafting reports
Don't expect that you'll spend most of your time on data analysis. Some surveys have found that data science professionals spend only 20 percent of their time analyzing data. They spend the other 80 percent finding, cleaning, and reorganizing it.
Data science applications by industry
Many people see data science mainly as a tool corporations use to drive business decisions, but it is useful across industries:
- Health insurance companies use data science to identify candidates for early screenings or reduce treatment costs over the long term
- In medicine, data science helps doctors diagnose cancers and other illnesses earlier and more reliably
- Shipping and logistics companies use data science to speed up delivery times and reduce the risk that unexpected issues like storms will cause delays
- The financial sector uses data science to make predictions about how political events, natural disasters, and other large-scale events will affect financial markets
These represent just a handful of the ways data science is used to optimize processes, answer strategy questions, and identify trends. There are thousands of other applications across fields as diverse as entertainment and aerospace engineering.
What skills does a data scientist need?
The skills used in data science vary by job title, but data scientists typically have technical skill sets related to manipulating unstructured data, machine learning algorithms, and data visualization. They need strong communication skills, too, because sharing findings with executives and other stakeholders is part of the job.
Increasingly, data scientists also have to be domain experts. Employers aren't just hiring data scientists; they're hiring data scientists who know enough about a domain to ask the right questions and understand how answers can be used to generate value.
What tools are used for data science?
Today's data scientists use tools like:
- Data visualization tools like D3.js and Tableau
- Frameworks like Hadoop, Mahout, Apache, Hive, and Pig
- Programming language interfaces like Jupyter Notebooks
- Programming languages like R, Java, Python, and SQL
- TensorFlow and other machine learning platforms
How to become a data scientist
While more colleges and universities have created dedicated data science bachelor's and master's degree programs, it's still possible to launch a career in this field with a degree in data analytics, data engineering, applied statistics, or business intelligence. Many data scientists are at least partially self-taught because even the most comprehensive on-campus and online data science master's degree programs don't cover all the skills and tools data scientists use.
Is a master's degree necessary for data science?
A whopping 90 percent of data scientists have advanced degrees, suggesting you'll need to earn a master's degree to advance in data science, if not a doctorate. There are entry-level roles in analytics and business intelligence, but most junior positions in data science aren't really junior at all.
What training do you need to be a data scientist?
Data science is an evolving field. To work in it, you have to commit to a life of learning. Your data science training will include not only earning a degree, but also learning new programming languages, staying up-to-date on new tools as they're developed, and doing whatever is necessary to stay relevant as data science becomes increasingly automated. There are also data science certifications that can teach you new skills and boost your hireability, including:
- CAS Institute Predictive Analytics and Data Science credential
- Data Science Council of America Principal Data Scientist credential
- Data Science Council of America Senior Data Scientist credential
- SAS Big Data Professional credential
- SAS Certified Data Scientist credential
Where do data scientists work?
According to Diffbot's State of Data Science, Engineering & AI Report, the companies with the largest data-related workforces are:
Becoming a data scientist doesn't always mean working in tech, however. Retail companies, big banking and investment firms, pharmaceutical companies, healthcare networks, and other employers all have teams of data science professionals working behind the scenes. As Stevens Institute of Technology puts it in the guide to its online MS in Data Science program, a data science master's can prepare "students for careers in fintech, business intelligence and analytics, academia, and database management, as well as government positions requiring strong skills in data analysis."
How much money does a data scientist make?
Data science jobs typically pay well, though pinning down just how well can be tough. Attention-grabbing headlines make it seem like all data scientists are earning big bucks and that six-figure data science salaries are the norm—even for early-career professionals. While data scientists do earn more than the average American, this field isn't churning out one-percenters. Experienced data scientists earn just over $100,000. Data scientists in managerial roles can earn close to $200,000. That's good money, but it's not private jet money.
What's the average data science salary?
The average data science salary is $123,000 according to Indeed, but Glassdoor says $113,000. PayScale's figure is lowest at $96,000, but closest to the median salary for data scientists reported by the US Bureau of Labor Statistics.
What's a typical starting salary in data science jobs?
Starting salaries in data science tend to be fairly high, but keep in mind that there's no such thing as an entry-level data science job or a junior data scientist who is just starting out. Entry-level data scientists earn as much as they do (around $70,000 or even more at big tech firms) because they have data science master's degrees and years of analytics experience.
What's the average data science salary in 2021?
The most recent Robert Half Technology Salary Guide predicts that the median national salary for data scientist roles in 2021 will be about $129,000. That's about what data scientists were earning in 2016, but it's still a lot, and there are affordable data science degree programs (including affordable online data science master's programs) .
Where is data science headed?
Headlines proclaiming the imminent death of data science are plentiful; the cause of death varies from article to article. Some blame the proliferation of new data science degree programs churning out data scientists at a rate higher than new jobs are being created. Others blame the way employers treat data science as an arcane art and expect too much of their data scientists. Still others point to tools like Google's AutoML and pre-packaged algorithms and APIs, which are replacing modeling work and making it easier for software engineers and data analysts to conduct data science.
There's no real evidence, however, that this discipline is dying. It is changing, but that's to be expected. Technological shifts always have an impact on tech jobs. Soon, there will probably be fewer data scientists working on model development and more data scientists working on analysis. In the far-flung future, when artificial intelligence has gotten smarter, more data scientists may spend their days adapting it for specific technical implementation.
The key to succeeding in this fast-changing field will be keeping up. There will come a time when the demand for data scientists cools off and there are plenty of them to fill available roles. When that happens, you'll need to differentiate yourself from the competition. You might specialize in deep learning, software engineering, or the next big innovation in data science. Another option might be to flesh out your business skills with an MBA.
As one Redditor put it this way in a thread about whether data science is a dead end:
Is data science right for me?
Only you can answer this question. Being a talented programmer with a head for statistics used to be all it took to land a six-figure data science job when data science was the hot new buzzword and companies treated data scientists like sorcerers. As time passed, however, many of those companies began to realize that they were paying for a whole lot of nothing. Data scientists without domain knowledge were processing massive amounts of data but delivering few actionable insights.
Employers increasingly want their data scientists to do more than just slice and dice data. You have to be good at turning data into dollars in this role. As another commenter wrote in the Reddit thread above, "the rest of the work hinges not on executing data science, but on figuring out how to make data science matter for the business. It's not about answering the question, it's about knowing what questions to ask."
In other words, if you're considering a career in data science because you want to spend your days building special-purpose algorithms, this might not be the discipline for you. The gulf between data science and data analytics is narrowing slowly but surely, and that means you'll probably be happiest in this field if you enjoy using technology not just for technology's sake but to solve real-world puzzles.
Questions or feedback? Email firstname.lastname@example.org