There aren’t a lot of page-turners in the data science genre, but you probably already knew you weren’t going to find any Stephen King-like works on this list. You’re looking for the best books to inform you about the latest data science trends or teach you the basics of NumPy or pandas. In data science, a great book is one that delivers the information you need, preferably without putting you to sleep in the process.
We’ve compiled a list of 17 essential data science books for the library of anyone considering a master’s in data science . We’ve subdivided them by subject as follows:
Self-driving cars represent one of the Holy Grails of artificial intelligence. It seems everyone—from Google to the big automobile manufacturers to garage tinkerers—is pursuing this quest. And why wouldn’t they? It’s a guaranteed gold mine. Alex Davies, a Wired editor, chronicles the past two decades of the race to design and build a safe and effective autonomous car in Driven. He explores not only the design, engineering, and data aspects of this story but also the human side: the infighting over credit and money, the intellectual property battles, the inevitable ego trips that result from so many brilliant people focused on a single endeavor. It’s a good read with many lessons in both data science and business.
Harvard history professor and New Yorker writer Jill Lepore examines the early days of data science through the story of the Simulatics Corporation, a pioneer in political data analysis. The company rose to prominence assisting John F. Kennedy’s 1960 presidential campaign—it concluded that Kennedy could not win without the Black vote and advised the candidate to take a stronger pro-Civil Rights stance—and remained a political force for a decade until the company went bankrupt in 1970. Lepore, an engaging and persuasive writer, argues that despite its ultimate failure, Simulatics was highly influential, helping entrench assumptions about the value of segmentation and projections based on data analytics. The story of Simulatics’ dramatic rise—at one point, it helped formulate Vietnam War strategy—and rapid decline makes for an entertaining and instructive read.
Artificial intelligence promises real-world benefits across disciplines, few more enticing than medicine. What if computers could catch doctor error as it occurs, predict undetectable but imminent medical conditions, discover new medications and treatments, or eliminate uncertainty from test results? Topol examines the possibilities, offering a hopeful vision of a future in which AI liberates doctors from tedious, error-prone procedures so that they can spend more time with patients. The book isn’t all pollyannaish forecasts; Topol acknowledges challenges and potential shortcomings of AI-based medicine. Still, Deep Medicine‘s overall outlook is encouraging.
Data science professionals can use their knowledge and skills in many ways and in almost every industry. You might specialize in business intelligence or robotics or healthcare informatics. There are almost too many options.
90 percent of data scientists hold master’s degrees, and 47 percent hold doctoral degrees. ( )
The Bureau of Labor Statistics sets median data scientist annual pay at just over $100,000. Top-paying sectors include ( ):
- Computer and peripheral equipment manufacturing ($148,290)
- Semiconductor and other electronic equipment manufacturing ($142,150)
- Specialized information services ($139,600)
- Data processing, hosting, and related services ($126,160)
- Accounting, tax preparation, bookkeeping, payroll services ($124,440)
|University and Program Name||Learn More|
Thomas H. Davenport
Davenport writes more for business leaders than data scientists, so don’t expect a deep dive into Python programming or Bayesian statistics. Rather, the value of Big Data at Work is the way it effectively uses case studies to survey and summarize business applications of big data in plain, engaging language. By taking a clear-eyed, 30,000-foot view of the field, this book provides data scientists the blueprint for explaining what they do and why it’s important to stakeholders who likely would not otherwise understand. That’s why Forbes described this book as “required reading for managers that need a straightforward, hype-free introduction to big data.”
Cathy O’Neil, a Havard math PhD and one-time Wall Street quantitative analyst, pulls back the curtain on the algorithms that drive so many critical decisions in our society. The view she exposes isn’t pretty. From the models that initiated and accelerated the 2008 financial crisis to automated evaluations of public-school teachers to flawed credit ratings, imperfect algorithms reinforce societal and economic inequality. An Occupy Wall Street activist, O’Neil voices strong opinions about the damage caused by these mostly unseen forces; fortunately, she has the chops to make her critiques stick and to lend her suggested remedies authority.
Scott E. Page
Using data sets to model projections is a critical data science application. The problem is, which model do you use to attack which problem? Social science PhD Scott E. Page’s book covers numerous models, thie applications, strengths, and drawbacks. He suggests that data scientists more aggressively apply multi-model solutions to problems rather than choosing and relying on a single model for analysis. The final chapter applies this approach to real-world challenges (economic disparity, the opioid crisis) with compelling results.
Predict and Surveil explores the intersection of data analytics and predictive policing. Unlike similar books aimed at general audiences, Brayne’s book does not shy away from the algorithms and other technical details that power data-driven policing. Her unsettling conclusion: predictive policing fails to deliver its promised transparency and equitability and, in fact, creates feedback loops that intensify racial and other demographic biases. She also explores the significant privacy issues created through big-data aggregation by the government (and its private partners). Brayne offers prescriptive remedies: this isn’t just a jeremiad.
Who says data mining can’t be fun? Seth Stephens-Davidowitz applies data mining to try to determine everything from the exact impact your schooling has on your future success to the extent to which people lie about their sex lives. Stephens-Davidowitz asks—and attempts to answer—whether violent entertainment inspires violent crime, whether parents favor sons over daughters, and whether it’s possible to game the stock market. If you’re a fan of Freakonomics and books of that ilk, you’ll almost certainly enjoy Everybody Lies.
Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman
Are you looking for a one-stop data mining reference? Mining of Massive Datasets, written by three Stanford-affiliated data scientists, covers a lot of ground, including: MapReduce systems and algorithms, locality-sensitive hashing, algorithms for data streams, PageRank and web-link analysis, frequent itemset analysis, clustering, computational advertising, recommendation systems, social-network graphs, dimensionality reduction, and machine-learning algorithms. Better still, the book corresponds to an online course you can audit for free (or take for credit for a small fee).
Data science explores massive data sets to reveal profound, hidden insights. Those insights are of no use, however, if you can’t find a way to convey them to users. Information Graphics extensively covers the many options for visualizing data, offering both analysis and copious beautifully reproduced examples from the profession’s leaders. This is a large, handsome book that would look as appropriate on a coffee table as on an office shelf.
Ian Goodfellow, Yoshua Bengio, Aaron Courville
Deep learning involves creating artificial neural networks to mimic the workings of the human brain, including the process of synthesizing previous learning to drive subsequent knowledge and deductions. Deep Learning, by Goodfellow et al., covers the mathematical principles and programming techniques underlying this machine learning field, including deep feedforward neural networks, optimization algorithms, sequence modeling, natural language processing, and speech recognition. Many readers and reviewers consider it the definitive text on the subject.
Max Pumperla, Kevin Ferguson
If you’re looking for a book that will deliver both an overview of deep learning and a tutorial on applying it to a real-world project, Deep Learning and the Game of Go is the book for you. The authors summarize the foundations of machine learning and deep learning, then walk readers through the process by which AlphaGo Zero, a Go-bot capable of beating world champions in this Chinese game of strategy, was created. While the book focuses on the challenges of creating a Go-playing AI, the principles forwarded can be applied to a multitude of problems.
There aren’t a ton of breezy, fun-to-read books on natural language processing. It’s just not that kind of subject, unfortunately. The bookshelves offer a little more in the way of thick textbooks, of which Eisenstein’s Introduction to Natural Language Processing is a solid example. For a thorough review of how to teach machines to understand human language and apply this technology to create real-world solutions, Introduction is a good, well, introduction. Don’t expect it to be introductory-level, however. You’ll need a solid background in computer science, programming languages, and mathematics to make sense of it.
Python is a complex and challenging programming language that many programmers struggle with. While there are plenty of Python books available, Bader’s work stands out for both its thoroughness and its clear, engaging writing style. Python Tricks assumes a basic knowledge of Python; this is not a book for beginners but rather for those who already have some grounding and are looking to make their programming cleaner, more efficient, and more powerful. According to Amazon reviewers, Bader’s book is “an excellent resource for someone with modest to moderate Python experience looking to round out their knowledge of some of the more subtle features/behaviors” of Python. This is one of several books on this list published by O’Reilly.
Python is one of the two most popular programming languages in data science; R is the other. Most data scientists know both, even if they specialize in one or the other, and both can handle the vast majority of data science tasks. According to datacamp, R programming excels in statistical modeling; it’s also better for dashboard creation (Python, in contrast, is the primary language for deep learning). Wickham’s R for Data Science is an R bible of sorts, and it’s supported by a free website that provides a quick and easy reference source. It is, at least according to one Goodreads user, a “clear and intuitive” “gamechanger.”
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jerome Friedman
The fundamentals of statistics underlie data science’s major disciplines, including bioinformatics, data mining, and machine learning. The Elements of Statistical Learning covers crucial statistical principles, explaining how they apply across data science. Best of all, you can download a free pdf of the book from the Stanford University website. Also by the same authors: An Introduction to Statistical Learning: with Applications in R.
Peter Bruce, Andrew Bruce, Peter Gedeck
Practical Statistics brings a foundational understanding of statistics to data science programming language instruction. For those with baseline knowledge of statistics, R, and Python, this book explains the value of exploratory data analysis, random sampling in big data sets, experimental design, linear regression analysis, classification techniques, and statistical machine learning.
Schools offering excellent master’s in data science programs include:
Consider pursuing your online master’s in data science at:
Looking for a data science master’s that won’t break the bank? Consider:
Questions or feedback? Email email@example.com