Reddit's favorite books on data analysis

Reddit’s favorite books on data analysis

This is a guest project by my friend Yashwant Rahul Reddy. He scraped the popular data analysis communities on Reddit to find their favorite books on marketing and data analytics.

Yashwant is an ex-marketing specialist who is currently transitioning into the analytics world by studying data science and business analytics. Having the experience as a marketing professional as well as knowing his way in data analytics, he is in a perfect position to combine the best of both worlds and to translate business challenges into actionable data insights. Yashwant is one of the brightest marketing analytics students I know with a strong focus on providing actual business value. He is always looking for marketing + data science collaborations, so make sure to give him a shout

What are the best books on data analysis?

Above is probably one of the most asked questions on Reddit in the prominent data analysis subreddits. Seeing this question so often I asked myself: What are actually the most recommended books on Reddit on data analytics? It seems that every time this question is asked a whole set of different answers is given.

Wouldn’t it be great to have an aggregate view of all recommended books in all related threads?

That’s exactly what Yashwant did. He collected all the answers and made a list out of it with all the favorite data analysis books on Reddit. And even better: He took a data-driven approach and made a data scraping project out of it.

The methodology

The project was quite a typical data analysis project: The data was pulled, cleaned, enriched, and ultimately analyzed and presented in a useful way. 

All Reddit comments which mentioned books were scraped in the respective subreddits (all-time timeframe). Then the score for each book was calculated by counting the aggregate upvotes for each of them. In addition, the authors’ names were pulled and the books’ titles were enriched with them.

The following are the tools that were used:

  1. Python
    1. PRAW – A Python Reddit API wrapper to connect to the Reddit API and scrape data off comments using Python
    2. Pandas (Python package) to modify the dataset
    3. Beautiful Soup (Python package) to find authors for each book
  2. Microsoft Excel to manually clean and transform some parts of the data
  3. Google Sheets to  generate combined insights
  4. Tableau to present the data

And the subreddits analyzed:



r/analyticalmarketing (that’s the companion subreddit to this site)












The best books on data analysis according to Reddit

The following are the top five (actually six but the first two are by the same author) recommended books in r/analytics, r/dataanalysis, and r/businessintelligence. If you want the full list in Google Sheets, sign-up below:

The Data Warehouse Toolkit Book & The Data Warehouse Lifecycle Toolkit by Ralph Kimball

Both books are very comprehensive guides on dimensional modeling for data warehousing and are considered to be among the most authoritative guides in this space. They are classics and offer many examples of data structures. Be aware though that it can be quite dry to read at some times and it feels like Kimball is praising himself in every second sentence.

Storytelling With Data by Cole Nussbaumer Knaflic

Another classic and one of my favorite books on data visualization and how to communicate with data. I can definitely recommend this book. It is written in an easily understandable way and focuses on applicability. If you want to improve your effectiveness in storytelling – this is the book.

Thinking Fast And Slow by Daniel Kahneman

Thinking, Fast and Slow

Funny enough this isn’t a book on data analysis, but nevertheless, it is very good and you have probably seen it being recommended a million times. And rightly so. Written by a winner of the Nobel Prize for economics, this book gives practical and enlightening insights into how choices are made in both our business and our personal lives – a good foundation when it comes to recommending decisions based on data.

The Art of Statistics: How to Learn from Data by David Spiegelhalter

A good primer on statistics if you are coming from another professional background. It explains statistical methods and concepts in a way a reader with neither statistical nor a mathematical background can understand. If you are trying to transition into data analysis, this is a good starter. Definitely recommended! 

Business Intelligence: The Savy Manager’s Guide by David Loshin

A useful introduction to the vocabulary, practices, and key concepts of data analysis and business intelligence. It does a good job of providing an overview of the concept of business intelligence without going too much into technicalities. As the title already suggests: It’s a good book for business managers or marketers to get a general understanding of the subject.