Businesses, organizations, and governments have always collected data, but never on today's scale. The extraordinary popularity of the internet—alongside the integration of computing devices in manufacturing, medicine, and many other sectors—has resulted in an unprecedented information explosion. Internet users alone generate about 2.5 quintillion bytes (that's 2.5 billion gigabytes) of digital data each and every day. Add other data generation sources and you're looking at 7.5 sextillion gigabytes of data. Every. 24. Hours.
We're only just now figuring out how to process these vast data sets for analysis. The vast majority goes untouched.
Data science is trying to change that. Once, it was a niche discipline that the Harvard Business Review referred to as 'the sexiest job of the 21st century.' Executives regarded it as a form of arcane sorcery.
These days, most people understand that the field isn't populated by wizards but rather hardcore numbers and automation nerds. Unfortunately, that realization hasn't helped clarify what data scientists do.
One issue is that everyone is still trying to answer questions like:
It's not surprising that there are still so many unanswered questions, given how much technology has changed in the past four decades. So, let's start with an easy one: What is data science? Oracle does an excellent job of answering it concisely when it calls data science a discipline that "combines multiple fields, including statistics, scientific methods, and data analysis to extract value from data." But it's not that simple, of course. Let's explore further.
In this article, we cover:
Data science is a branch of computer science that uses powerful software and tools like machine learning algorithms to sort through and find useful patterns in immense batches of information.
Put that way, it sounds simple enough. What makes data science so complex is that it borrows from many other disciplines. Dr. Ganapathi Pulipaka, Chief Data Scientist at Accenture, describes data science as a field that blends "software engineering, predictive analytics, machine learning, deep learning, HPC, supercomputing, mathematics, data mining, databases (SQL, NoSQL), Hadoop, streaming analytics platforms for live analysis (Apache Kafka, Apache Flink, Apache Spark, Apache Impala), IoT platforms, edge computing, fog computing, networks, statistics, web development, cloud computing, data engineering, and data visualization."
Data science applications span industries. With enough information and the right tools, data science can solve abstract problems, predict future events, and boost profits for companies that adopt a data-driven approach to problem-solving. For example:
As the University of Virginia says in the guide to its online Master of Science in Data Science program, "From healthcare to government to business, data has never been so widely accessible or relied-upon to help people do their best work."
This question generates a fair amount of controversy. Those who assert that data science is a genuine science back their claim by pointing to the processes it uses. In their view, data science is the study of information. Data scientists observe rough data, develop testable predictions, and create experiments to test those predictions, just like other scientists and researchers. Some even insist that data science is the mother of all science because data scientists work with pure data.
On the other side of the divide is the view that data science isn't an actual science because most data scientists don't do novel research the same way scientists do. Proponents of this view sometimes argue that real scientists don't study pure data, but rather whatever generates that data, to understand how the world works in the here and now. Data science uses statistics and mathematics to build models that make better guesses about the future, but it can only generate probabilities, not facts.
Ultimately, there's no definitive answer. Some data scientists are motivated by scientific principles and take a very scientific approach to their work. Others aren't especially concerned with scientific truth; they use data to find competitive advantages in business.
The data science lifecycle is a highbrow way of describing the phases of work in a data science project. There are multiple data science life cycles, but the most widely known is probably CRISP-DM. It lays out six distinct project phases beyond data collection:
There are other data science life cycle frameworks, however, including:
Data science is a relatively new discipline. However, it has roots in data analysis, which is as old as counting. Paleolithic tribespeople used tally bones tens of thousands of years ago to track trading activities and supplies and may have used that data to make useful predictions about future events. In the 1600s, John Graunt used statistical data analysis to predict the spread of disease and mortality rates.
People began using digital computers to conduct statistical analysis as soon as reliable computing devices were invented in the 1960s. Once computer use became commonplace, businesses and other organizations began collecting large volumes of data—so vast that traditional statistical analysis was all but useless.
Scientists and businesses eventually started looking for novel ways to derive value from all that data, but technology wasn't up to the challenge until well into the 2000s. After 2010, data science tools became widely available, and the demand for data scientists exploded. More and more sectors and organizations realized how valuable their data could be. For many years, data science jobs were created faster than master's programs and PhD programs could train data scientists. Today, we're only scratching the surface of what data can do, but researchers worldwide are searching for new ways to leverage it.
Data science is often defined in terms of what data scientists do. These professionals use math, statistics, programming, and other procedures and tools to find actionable insights in massive amounts of data. Their work involves collecting, cleaning, organizing, and analyzing information, though many data scientists are only responsible for some of these tasks. They might spend more time in data preparation (i.e., processing raw data for future analysis) or writing reports designed to influence decision-making than actually manipulating data.
There are many roles to fill in data science—data scientist is only one of them. Some data science professionals have titles like:
Job postings for data scientists are often quite vague—possibly because hiring managers and executives don't have the technical expertise to understand what it is they're looking for.
Companies and organizations know they need professionals who can gather, organize, and analyze large amounts of information to solve business challenges. In a Reddit thread about confusing messages in data science, the original poster wrote:
If you're interested in data science, you can overcome some confusion by reading job listings to see what employers are looking for across sectors.
A typical workday in data science involves email, meetings with other departments, and prioritization. Most data scientists spend some time each day:
Don't expect that you'll spend most of your time on data analysis. Some surveys have found that data science professionals spend only 20 percent of their time analyzing data. They spend the other 80 percent finding, cleaning, and reorganizing it.
Many people see data science mainly as a tool corporations use to drive business decisions, but it is useful across industries:
These represent just a handful of the ways data science is used to optimize processes, answer strategy questions, and identify trends. There are thousands of other applications across fields as diverse as entertainment and aerospace engineering.
The skills used in data science vary by job title, but data scientists typically have technical skill sets related to manipulating unstructured data, machine learning algorithms, and data visualization. They need strong communication skills, too, because sharing findings with executives and other stakeholders is part of the job.
Increasingly, data scientists also have to be domain experts. Employers aren't just hiring data scientists; they're hiring data scientists who know enough about a domain to ask the right questions and understand how answers can be used to generate value.
Today's data scientists use tools like:
While more colleges and universities have created dedicated data science bachelor's and master's degree programs, it's still possible to launch a career in this field with a degree in data analytics, data engineering, applied statistics, or business intelligence. Many data scientists are at least partially self-taught because even the most comprehensive on-campus and online data science master's degree programs don't cover all the skills and tools data scientists use.
A whopping 90 percent of data scientists have advanced degrees, suggesting you'll need to earn a master's degree to advance in data science, if not a doctorate. There are entry-level roles in analytics and business intelligence, but most junior positions in data science aren't really junior at all.
Data science is an evolving field. To work in it, you have to commit to a life of learning. Your data science training will include not only earning a degree, but also learning new programming languages, staying up-to-date on new tools as they're developed, and doing whatever is necessary to stay relevant as data science becomes increasingly automated. There are also data science certifications that can teach you new skills and boost your hireability, including:
According to Diffbot's State of Data Science, Engineering & AI Report, the companies with the largest data-related workforces are:
Becoming a data scientist doesn't always mean working in tech, however. Retail companies, big banking and investment firms, pharmaceutical companies, healthcare networks, and other employers all have teams of data science professionals working behind the scenes. As Stevens Institute of Technology puts it in the guide to its online MS in Data Science program, a data science master's can prepare "students for careers in fintech, business intelligence and analytics, academia, and database management, as well as government positions requiring strong skills in data analysis."
Data science jobs typically pay well, though pinning down just how well can be tough. Attention-grabbing headlines make it seem like all data scientists are earning big bucks and that six-figure data science salaries are the norm—even for early-career professionals. While data scientists do earn more than the average American, this field isn't churning out one-percenters. Experienced data scientists earn just over $100,000. Data scientists in managerial roles can earn close to $200,000. That's good money, but it's not private jet money.
The average data science salary is $123,000 according to Indeed, but Glassdoor says $113,000. PayScale's figure is lowest at $96,000, but closest to the median salary for data scientists reported by the US Bureau of Labor Statistics.
Starting salaries in data science tend to be fairly high, but keep in mind that there's no such thing as an entry-level data science job or a junior data scientist who is just starting out. Entry-level data scientists earn as much as they do (around $70,000 or even more at big tech firms) because they have data science master's degrees and years of analytics experience.
The most recent Robert Half Technology Salary Guide predicts that the median national salary for data scientist roles in 2021 will be about $129,000. That's about what data scientists were earning in 2016, but it's still a lot, and there are affordable data science degree programs (including affordable online data science master's programs) .
Headlines proclaiming the imminent death of data science are plentiful; the cause of death varies from article to article. Some blame the proliferation of new data science degree programs churning out data scientists at a rate higher than new jobs are being created. Others blame the way employers treat data science as an arcane art and expect too much of their data scientists. Still others point to tools like Google's AutoML and pre-packaged algorithms and APIs, which are replacing modeling work and making it easier for software engineers and data analysts to conduct data science.
There's no real evidence, however, that this discipline is dying. It is changing, but that's to be expected. Technological shifts always have an impact on tech jobs. Soon, there will probably be fewer data scientists working on model development and more data scientists working on analysis. In the far-flung future, when artificial intelligence has gotten smarter, more data scientists may spend their days adapting it for specific technical implementation.
The key to succeeding in this fast-changing field will be keeping up. There will come a time when the demand for data scientists cools off and there are plenty of them to fill available roles. When that happens, you'll need to differentiate yourself from the competition. You might specialize in deep learning, software engineering, or the next big innovation in data science. Another option might be to flesh out your business skills with an MBA.
As one Redditor put it this way in a thread about whether data science is a dead end:
Only you can answer this question. Being a talented programmer with a head for statistics used to be all it took to land a six-figure data science job when data science was the hot new buzzword and companies treated data scientists like sorcerers. As time passed, however, many of those companies began to realize that they were paying for a whole lot of nothing. Data scientists without domain knowledge were processing massive amounts of data but delivering few actionable insights.
Employers increasingly want their data scientists to do more than just slice and dice data. You have to be good at turning data into dollars in this role. As another commenter wrote in the Reddit thread above, "the rest of the work hinges not on executing data science, but on figuring out how to make data science matter for the business. It's not about answering the question, it's about knowing what questions to ask."
In other words, if you're considering a career in data science because you want to spend your days building special-purpose algorithms, this might not be the discipline for you. The gulf between data science and data analytics is narrowing slowly but surely, and that means you'll probably be happiest in this field if you enjoy using technology not just for technology's sake but to solve real-world puzzles.
Questions or feedback? Email email@example.com