This curated list of data science projects offers real-life problems that will help you master skills to demonstration that you are technically sound and know how to conduct data science projects that add business value.
It’s been almost two years since I started writing articles — that’s equated to just over 175 articles! One fault in some of my previous articles is that I suggested data science projects that were interesting but not practical.
One of the easiest ways to get a job as a data scientist is to show that you’ve already completed similar projects and work as the job posting itself. Therefore, I want to share with you some practical data science projects that I’ve personally done throughout my career that will beef up your experience and your portfolio.
1. Customer Propensity Modelling
A propensity model is a model that predicts the likelihood that someone will do something. To give a few examples:
- The likelihood that website visitors will register an account.
- The likelihood that a registered user will pay and subscribe.
- The likelihood that a user will refer another user.
Propensity modeling does not only entail “who” and “what” — it also entails “when” (when should you target the users you’ve identified) and “how” (how should you deliver your message to your targeted users?).
Propensity modeling allows you to allocate your resources more wisely, resulting in greater efficiencies while achieving better results. To give an example, think of this: instead of sending an email advertisement where there’s a 0%-100% chance of a user clicking it, with propensity modeling, you can target users with a 50%+ chance of clicking it. Fewer emails, more conversions!
Below are two code walkthroughs that demonstrate how to build basic propensity models:
Here are two datasets that you can use to build a propensity model. Take note of the type of features that are offered in each dataset:
- Customer propensity to purchase dataset – A data set logging shoppers interactions on an online store
- Marketing Campaign – Boost the profit of a marketing campaign
2. Metric Forecasting
Metric forecasting is self-explanatory — it refers to forecasting a given metric, like revenue or the total number of users, in the short-term future.
Specifically, forecasting involves techniques that use historical data as inputs to generate a predicted output. Even if the output itself is not entirely accurate, forecasting can be used to gauge the general trend of where a particular metric is going.
Forecasting is basically like looking into the future. By predicting (with some level of confidence) what will happen in the future, you can make more informed decisions more proactively. The result of this is that you’ll have more time to make decisions and ultimately reduce the likelihood of failure.
The first resource provides a summary of several time-series models:
The second resource provides a step-by-step walkthrough in creating a time-series model using Prophet, a Python library built by Facebook specifically for time-series modeling:
3. Recommendation Systems
Recommendation systems are algorithms with an objective to suggest the most relevant information to users, whether that be similar products on Amazon, similar TV shows on Netflix, or similar songs on Spotify.
There are two main types of recommendation systems: collaborative filtering and content-based filtering.
- Content-based recommendation systems recommend particular items based on previously chosen items’ features. For example, if I watched a lot of action movies previously, it would rank other action movies higher.
- Collaborative filtering, on the other hand, filters items that a user might like based on the reactions of similar users. For example, if I liked Song A and someone else liked Song A and Song C, then I would be recommended Song C.
Recommendation systems are one of the most widely used and most practical data science applications. Not only that, but it also has one of the highest ROIs when it comes to data products. It’s estimated that Amazon increased its sales by 29% in 2019, specifically due to its recommendation system. As well, Netflix claimed that its recommendation system was worth a staggering $1 billion in 2016!
But what makes it so profitable? As I alluded to earlier, it’s about one thing: relevancy. By providing users with more relevant products, shows, or songs, you’re ultimately increasing their likelihood to purchase more and/or stay engaged longer.
Resources and Datasets
- Introduction To Recommender Systems- 1: Content-Based Filtering And Collaborative Filtering
- Netflix Movies and TV Shows
- Restaurant Recommendation Challenge
- Spotify Recommendation
4. Deep Dive Analyses
A deep dive analysis is simply an in-depth analysis of a particular problem or topic. They can be explorative in nature, to discover new information and insights, or investigative, to understand the cause of a problem.
It’s not a widely talked about skill, partially because it comes with experience, but that doesn’t mean you can’t improve it! Like anything else, it’s just a matter of practice.
Deep dives are essential for any data-related professional. Being able to figure out why something doesn’t work, or being able to find the silver bullet, is what differentiates great from good.
Resources and datasets
Below are several deep dive tasks that you can try on your own:
5. Customer Segmentation
Customer segmentation is the practice of dividing a customer base into several segments.
The most common type of segmentation is by demographic, but there are many other types of segmentation, including geographic, psychographic, needs-based, and value-based.
Segmentation is extremely valuable to a business for several reasons:
- It allows you to conduct more targeted marketing and deliver more personalized messaging to each segment. Young teenagers value many different things than parents of several kids.
- It allows you to prioritize particular segments when resources are limited, particularly those that are more profitable.
- Segmentation also serves as a basis for other applications like upselling and cross-selling.