Data Science

17 Must-Read Books for Data Science Grad Students [2021 Edition]

17 Must-Read Books for Data Science Grad Students [2021 Edition]
They may not make the best beach reading, but data science books deliver when it comes to essential information on this critical, emerging computer science field. Image from Unsplash
Tom Meltzer profile
Tom Meltzer February 16, 2021

From how-to's to histories, we've got 17 books every graduate student in data science should know about. Read on to learn what might be missing from your bookshelf.

Article continues here

There aren’t a lot of page-turners in the data science genre, but you probably already knew you weren’t going to find any Stephen King-like works on this list. You’re looking for the best books to inform you about the latest data science trends or teach you the basics of NumPy or pandas. In data science, a great book is one that delivers the information you need, preferably without putting you to sleep in the process.

We’ve compiled a list of 17 essential data science books for the library of anyone considering a master’s in data science . We’ve subdivided them by subject as follows:

  • Data science books: artificial intelligence
  • Data science books: big data
  • Data science books: data analysis
  • Data science books: data mining
  • Data science books: data visualization
  • Data science books: deep learning
  • Data science books: natural language processing
  • Data science books: programming
  • Data science books: statistics

Data science books: artificial intelligence

Driven

Alex Davies

Self-driving cars represent one of the Holy Grails of artificial intelligence. It seems everyone—from Google to the big automobile manufacturers to garage tinkerers—is pursuing this quest. And why wouldn’t they? It’s a guaranteed gold mine. Alex Davies, a Wired editor, chronicles the past two decades of the race to design and build a safe and effective autonomous car in Driven. He explores not only the design, engineering, and data aspects of this story but also the human side: the infighting over credit and money, the intellectual property battles, the inevitable ego trips that result from so many brilliant people focused on a single endeavor. It’s a good read with many lessons in both data science and business.

If Then

Jill Lepore

Harvard history professor and New Yorker writer Jill Lepore examines the early days of data science through the story of the Simulatics Corporation, a pioneer in political data analysis. The company rose to prominence assisting John F. Kennedy’s 1960 presidential campaign—it concluded that Kennedy could not win without the Black vote and advised the candidate to take a stronger pro-Civil Rights stance—and remained a political force for a decade until the company went bankrupt in 1970. Lepore, an engaging and persuasive writer, argues that despite its ultimate failure, Simulatics was highly influential, helping entrench assumptions about the value of segmentation and projections based on data analytics. The story of Simulatics’ dramatic rise—at one point, it helped formulate Vietnam War strategy—and rapid decline makes for an entertaining and instructive read.

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again

Eric Topol

Artificial intelligence promises real-world benefits across disciplines, few more enticing than medicine. What if computers could catch doctor error as it occurs, predict undetectable but imminent medical conditions, discover new medications and treatments, or eliminate uncertainty from test results? Topol examines the possibilities, offering a hopeful vision of a future in which AI liberates doctors from tedious, error-prone procedures so that they can spend more time with patients. The book isn’t all pollyannaish forecasts; Topol acknowledges challenges and potential shortcomings of AI-based medicine. Still, Deep Medicine‘s overall outlook is encouraging.

Advertisement

“I’m Interested in Data Science!”

University and Program Name Learn More

Data science books: big data

Big Data at Work: Dispelling the Myths, Uncovering the Opportunities

Thomas H. Davenport

Davenport writes more for business leaders than data scientists, so don’t expect a deep dive into Python programming or Bayesian statistics. Rather, the value of Big Data at Work is the way it effectively uses case studies to survey and summarize business applications of big data in plain, engaging language. By taking a clear-eyed, 30,000-foot view of the field, this book provides data scientists the blueprint for explaining what they do and why it’s important to stakeholders who likely would not otherwise understand. That’s why Forbes described this book as “required reading for managers that need a straightforward, hype-free introduction to big data.”

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy

Cathy O’Neil

Cathy O’Neil, a Havard math PhD and one-time Wall Street quantitative analyst, pulls back the curtain on the algorithms that drive so many critical decisions in our society. The view she exposes isn’t pretty. From the models that initiated and accelerated the 2008 financial crisis to automated evaluations of public-school teachers to flawed credit ratings, imperfect algorithms reinforce societal and economic inequality. An Occupy Wall Street activist, O’Neil voices strong opinions about the damage caused by these mostly unseen forces; fortunately, she has the chops to make her critiques stick and to lend her suggested remedies authority.

Data science books: data analysis

The Model Thinker: What You Need to Know to Make Data Work For You

Scott E. Page

Using data sets to model projections is a critical data science application. The problem is, which model do you use to attack which problem? Social science PhD Scott E. Page’s book covers numerous models, thie applications, strengths, and drawbacks. He suggests that data scientists more aggressively apply multi-model solutions to problems rather than choosing and relying on a single model for analysis. The final chapter applies this approach to real-world challenges (economic disparity, the opioid crisis) with compelling results.

Predict and Surveil: Data, Discretion, and the Future of Policing

Sarah Brayne

Predict and Surveil explores the intersection of data analytics and predictive policing. Unlike similar books aimed at general audiences, Brayne’s book does not shy away from the algorithms and other technical details that power data-driven policing. Her unsettling conclusion: predictive policing fails to deliver its promised transparency and equitability and, in fact, creates feedback loops that intensify racial and other demographic biases. She also explores the significant privacy issues created through big-data aggregation by the government (and its private partners). Brayne offers prescriptive remedies: this isn’t just a jeremiad.

Data science books: data mining

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Seth Stephens-Davidowitz

Who says data mining can’t be fun? Seth Stephens-Davidowitz applies data mining to try to determine everything from the exact impact your schooling has on your future success to the extent to which people lie about their sex lives. Stephens-Davidowitz asks—and attempts to answer—whether violent entertainment inspires violent crime, whether parents favor sons over daughters, and whether it’s possible to game the stock market. If you’re a fan of Freakonomics and books of that ilk, you’ll almost certainly enjoy Everybody Lies.

Mining of Massive Datasets

Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman

Are you looking for a one-stop data mining reference? Mining of Massive Datasets, written by three Stanford-affiliated data scientists, covers a lot of ground, including: MapReduce systems and algorithms, locality-sensitive hashing, algorithms for data streams, PageRank and web-link analysis, frequent itemset analysis, clustering, computational advertising, recommendation systems, social-network graphs, dimensionality reduction, and machine-learning algorithms. Better still, the book corresponds to an online course you can audit for free (or take for credit for a small fee).

Data science books: data visualization

Information Graphics

Sandra Rendgen

Data science explores massive data sets to reveal profound, hidden insights. Those insights are of no use, however, if you can’t find a way to convey them to users. Information Graphics extensively covers the many options for visualizing data, offering both analysis and copious beautifully reproduced examples from the profession’s leaders. This is a large, handsome book that would look as appropriate on a coffee table as on an office shelf.

Data science books: Deep learning

Deep Learning

Ian Goodfellow, Yoshua Bengio, Aaron Courville

Deep learning involves creating artificial neural networks to mimic the workings of the human brain, including the process of synthesizing previous learning to drive subsequent knowledge and deductions. Deep Learning, by Goodfellow et al., covers the mathematical principles and programming techniques underlying this machine learning field, including deep feedforward neural networks, optimization algorithms, sequence modeling, natural language processing, and speech recognition. Many readers and reviewers consider it the definitive text on the subject.

Deep Learning and the Game of Go

Max Pumperla, Kevin Ferguson

If you’re looking for a book that will deliver both an overview of deep learning and a tutorial on applying it to a real-world project, Deep Learning and the Game of Go is the book for you. The authors summarize the foundations of machine learning and deep learning, then walk readers through the process by which AlphaGo Zero, a Go-bot capable of beating world champions in this Chinese game of strategy, was created. While the book focuses on the challenges of creating a Go-playing AI, the principles forwarded can be applied to a multitude of problems.

Data science books: Natural language processing

Introduction to Natural Language Processing

Jacob Eisenstein

There aren’t a ton of breezy, fun-to-read books on natural language processing. It’s just not that kind of subject, unfortunately. The bookshelves offer a little more in the way of thick textbooks, of which Eisenstein’s Introduction to Natural Language Processing is a solid example. For a thorough review of how to teach machines to understand human language and apply this technology to create real-world solutions, Introduction is a good, well, introduction. Don’t expect it to be introductory-level, however. You’ll need a solid background in computer science, programming languages, and mathematics to make sense of it.

Data science books: Programming

Python Tricks, The Book: A Buffet of Awesome Python Features

Dan Bader

Python is a complex and challenging programming language that many programmers struggle with. While there are plenty of Python books available, Bader’s work stands out for both its thoroughness and its clear, engaging writing style. Python Tricks assumes a basic knowledge of Python; this is not a book for beginners but rather for those who already have some grounding and are looking to make their programming cleaner, more efficient, and more powerful. According to Amazon reviewers, Bader’s book is “an excellent resource for someone with modest to moderate Python experience looking to round out their knowledge of some of the more subtle features/behaviors” of Python. This is one of several books on this list published by O’Reilly.

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data

Hadley Wickham

Python is one of the two most popular programming languages in data science; R is the other. Most data scientists know both, even if they specialize in one or the other, and both can handle the vast majority of data science tasks. According to datacamp, R programming excels in statistical modeling; it’s also better for dashboard creation (Python, in contrast, is the primary language for deep learning). Wickham’s R for Data Science is an R bible of sorts, and it’s supported by a free website that provides a quick and easy reference source. It is, at least according to one Goodreads user, a “clear and intuitive” “gamechanger.”

Data science books: Statistics

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jerome Friedman

The fundamentals of statistics underlie data science’s major disciplines, including bioinformatics, data mining, and machine learning. The Elements of Statistical Learning covers crucial statistical principles, explaining how they apply across data science. Best of all, you can download a free pdf of the book from the Stanford University website. Also by the same authors: An Introduction to Statistical Learning: with Applications in R.

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python

Peter Bruce, Andrew Bruce, Peter Gedeck

Practical Statistics brings a foundational understanding of statistics to data science programming language instruction. For those with baseline knowledge of statistics, R, and Python, this book explains the value of exploratory data analysis, random sampling in big data sets, experimental design, linear regression analysis, classification techniques, and statistical machine learning.

Top MSDS programs

Schools offering excellent master’s in data science programs include:

Top online MSDS programs

Consider pursuing your online master’s in data science at:

Least expensive MSDS programs

Looking for a data science master’s that won’t break the bank? Consider:

Least expensive online MSDS programs

The least expensive online data science master’s programs include:

Questions or feedback? Email editor@noodle.com

About the Author

Tom Meltzer began his career in education publishing at The Princeton Review, where he authored more than a dozen titles (including the company's annual best colleges guide and two AP test prep manuals) and produced the musical podcast The Princeton Review Vocab Minute. A graduate of Columbia University (English major), Tom lives in Chapel Hill, NC.

About the Editor

Tom Meltzer spent over 20 years writing and teaching for The Princeton Review, where he was lead author of the company's popular guide to colleges, before joining Noodle.

To learn more about our editorial standards, you can click here.


Share

You May Also Like To Read


Categorized as: Data ScienceInformation Technology & Engineering