Starbuck Offer Type Strategy

Andri Sumitro
8 min readJun 21, 2020
  1. Project Overview

This project is part of the Udacity Capstone Challenge and the given data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app.

Users of the Starbucks rewards mobile app receive an offer once every few days. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). However, offers are not sent out to every user and even those that receive one might not all get the same.

One of the Starbuck’s goal is to send out the right offers to increase the response of users in mobile apps and maximizing the profit by giving the right offer type that will trigger the users to spend more.

This data set contains three main files:

  • Portfolio - file describes the characteristics of each offer, including its duration and the amount a customer needs to spend to complete it (difficulty).
  • Profile - file contains customer demographic data including their age, gender, income, and when they created an account on the Starbucks rewards mobile application.
  • Transcript - file describes customer purchases and when they received, viewed, and completed an offer. An offer is only successful when a customer both views an offer and meets or exceeds its difficulty within the offer’s duration.

2. Exploratory Data Analysis (EDA)

2.1 Portfolio Dataset

There are ten different offers in three distinct categories offered by Starbucks:

  • Buy-one-get-one-free offers (BOGO) (4 types),
  • informational offers (2 types)
  • Discounts (4 types).

However, we are not going to take into the little offer type in each offer categories for the ease of solving this project. BOGO and informational offers do not guarantee a positive effect on financial results since both of these offers have a minimum profit margin of 0%.

This is because the rewards for completing BOGO offers are equivalent to the amount that will be paid for the drinks, while informational offers on the other hand do no return any rewards to the customers. In contrast, discounts offer is the only offers that have a positive minimum profit margin for completed offers (at least 71%) as it is kind of like a mission for customers because customers have to accumulate a certain amount before they get a reward (their difficulty is pretty high on average while rewards are relatively low).

2.2 Profile Dataset

The profile data consists of 17,000 users. Most of the users are older as the median of using the Starbucks app is having a median age of 55 age and a median of $64,000 income, which is quite high compared to the median of American income.

This is reasonable as we know that age and income have a positive relationship. The chart below also shows that users started using the app approximately one and a half years ago and note that there are 13% of all mobile app users don’t provide any additional demographic information at all.

2.3 Transcript Dataset

Each record in the transcript dataset means that 1 out of 306,137 events of the 17,000 registered Starbucks app users over a period of 29 days. These records show the action taken towards the offer by users. So the event of the users can be received, view, and complete an offer or make a transaction.

  • It is found that the offers have been sent out roughly every 6 days and we also see that the conversion rate of the users is quite low (57%) despite the offers has been sent out to 99% of all app users at least once.
  • Starbucks app users make transactions every 3–4 days and 13.68$ is the amount every time they make a transaction on average.
  • Every ~3rd of these transactions completes an offer leading to an average reward of 4.75$.
  • Starbucks is still able to achieve an overall profit margin of its offers of 91% despite the distribution of transaction amount is quite skewed.

The graph above shows how app events (the action taken by users) are distributed over time.

  • Most of the users view the offer on the day of send-out.
  • Actual transactions peak slightly delayed within a few days after an offer was send-out.
  • Shorter timespans between offer send-outs past day 14 seem to cause a larger overall number of transactions.
  • Increased transaction volumes also increase revenues as shown in Figure 6 below. However, transaction profits per user remain stable as the increased exposure to offers also leads to more offer completions thus more frequently paid rewards.

Discounts is the type of offers that are being completed more often compared to BOGO offers (60% vs. 51% conversion rate). However, when it comes to consistency, BOGO offers is still a winner since it can be seen that the standard deviations of the conversion rate for both (12.27% vs. 6.39%).

3. Data Pre-Processing

3.1 Portfolio Data

  • Added an abbreviation of “offer_id” because the original one is quite long

3.2 Profile Data

  • Member age — add a column that represents the duration when the user started using Starbucks apps., Note that there might be a slight deviation since at the time of viewing or completing an offer as timestamps in the transcript data are only given as hours, whereas the profile data contains dates.
  • Replace value of ‘118’ with nan values since it is not possible that there exist users with the age of 118, we will impute this error with median afterward.

3.3 Transcript Data

Steps to evaluate if one or more offers had an impact on a users purchasing behavior:

  • Calculate rewards received by app users
  • Add offer numbers to transcript data to keep track of situations where users receive the same offer more than once
  • Add unique transaction IDs to transcript data to keep track of “transaction” events
  • Separate transaction events including amounts paid and received from offer events
  • Separate offer events including timestamps from transaction events
  • Make a comparison between transaction and offer events

3.4 Missing values imputation

  • Gender (filled with mode)
  • Age (filled with median)
  • Income (filled with median)

3.5 Feature engineering

I. Dummy encoding

  • Gender
  • Channel

Ii. Create new features based on existing features

  • Clustering (group customers based on their behavior which will help in model prediction)
  • reward_per_difficulty — the ratio between expected reward and difficulty
  • profit — amount — reward_received

3.6 Manipulating and merge all dataset for modeling

Since the objective is to predict which offer type should we offer to prompt the customer to spend more, so each row represents the information of each individual:

  • Demographic data for each user (age, income, member age)
  • Purchasing behavior towards each offer type
    - Frequency of Bogo, discount, information and no offer
    - Average profit of Bogo, discount, information and no offer
    - Reward_per_difficulty of Bogo, discount, information and no offer

4. Model Building

In this project, XGBoost Multi-Classifier will be used to predict which offer is the most effective in the sense that it will generate more profit from the users. Below is the target variable that the model is used to predict:

Target variable = the offer that generates the highest profit for each user

  • The accuracy score we get is 92%, which mean out of 100, the model is able to predict 92 of them correctly, which is quite decent.

4.1 Hyperparameter Tuning

The default parameters may not be the best at getting the accuracy score for every problem, that is why we have to find out the best combination of the parameters for each model that could achieve higher accuracy by using random grid search

Randomized grid search Is a method where random combinations of the hyperparameters are used to find the best solution that will give the highest accuracy score in the training set, while able to generalize well for unseen data as well. Best hyperparameters are as follows:

  • max_depth=5
  • min_child_weight=1
  • n_estimators=1000
  • learning_rate=0.02
  • The model is a bit overfitting as we can see that the model fit it so well in train dataset but it does not perform that well in test dataset as there is huge gap (~5%) between them.

4.2 Feature Importance

  • Demographic data seem to play very little role in predicting offer type
  • In contrast, the statistics of each offer type show a much larger influence
  • The least important features are in fact the clustering features. It did not create any value to the model

Conclusion

The goal of this project was to optimize Starbucks’ promotion strategy or more specifically to maximize transaction profits per user by giving the right offer type. We have used XGBoost multiclassifier to make a prediction and able to achieve an accuracy score of 94%, which is quite decent.

However, since the data used for modeling is only a simulated data and it is just a small proportion of the Starbucks data. Hence, it is advised to do an AB test in order to improve the promotion strategy within the Starbucks rewards app and optimize the model along the way. Additionally, before that, we can also run an A/A test before running A/B testing just to be sure that the final conclusion is true and the potential risk will be minimized too before we scale it up to the whole population.

Limitations & Improvements

First of all, as mentioned in the conclusion section, the data at hand are not only simulated but more importantly, it is just a simplified version of tracking data from the Starbucks rewards app, it remains unclear if both developed solutions are applicable in real-life. Apart from that, I still see that there is a room for improvement to increase the accuracy of the model that not only predicts well in this data but also able to generalize well on unseen data in a much bigger population

Clustering clearly, from the results, we can see that the group that we have found did not create any value to the model. This might be due to the sparsity of the data, which makes the K means not able to capture any valuable information since it is found that the variance of many columns is not huge.

Modeling In here, my objective is to solve build a model that will predict the most effective offer type for each user. Hence I did not put a lot of focus on using a more advanced model to increase the accuracy. Further exploration can be done for this part like stacking, which is combining a few algorithms to predict the target variable.

--

--