Evaluation of Starbucks offers and Analysis Results

9 min readJun 5, 2021

I would like to share my finding of Starbucks Capstone Challenge of Starbucks Project.

Introduction of the Problem

The shared data contains in three files and it gives us to evaluate a simplified version of real Starbucks app. So we will evaluate the offers only for one product.

portfolio.json — containing offer ids and meta data about each offer (duration, type, etc.)
profile.json — demographic data for each customer
transcript.json — records for transactions, offers received, offers viewed, and offers completed

With the help of machine learning algorithm or heuristic approaches, I will try to find solutions to increase the effectiveness of the offers.

Strategy and Project Goal

I follow these steps to get insights about the data. Data understanding, data cleaning, using visualization techniques and finally constructing a model.

First, I try to understand data so I can determine the cleaning strategy. In this step, I have also try to construct modeling strategy. Then data cleaning steps has been started and I have found the outliers and divided the data into different subsets for further analysis. As a third step, I have checked the transaction data and offer types to understand the user selection strategy for offers. I think it is a good start to understand current strategy before constructing new strategy. However I saw that Starbucks uses same demographics of transactions and offers so I could not derive any strategy from it. As a step 4, I visualize and compare different demographics data to get insight about the product selection of customers. Finally I derive 2 models and select the most efficient model with the help of metrics.

What are the main features of demographics that affect offer effectiveness?
Is there an efficient model to predict users response?

Data Understanding

As I described before there were 3 different data sets. I would like to show raw data to give an understanding to readers.

portfolio.json

id (string) — offer id
offer_type (string) — type of offer ie BOGO, discount, informational
difficulty (int) — minimum required spend to complete an offer
reward (int) — reward given for completing an offer
duration (int) — time for offer to be open, in days
channels (list of strings)

profile.json

age (int) — age of the customer
became_member_on (int) — date when customer created an app account
gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
id (str) — customer id
income (float) — customer’s income

transcript.json

event (str) — record description (ie transaction, offer received, offer viewed, etc.)
person (str) — customer id
time (int) — time in hours since start of test. The data begins at time t=0
value — (dict of strings) — either an offer id or transaction amount depending on the record

Data Cleaning

day column converted to hours
column name of “id” changed according to correct usage like “offer_id” and “customer_id”.
Outlier check to remove. Age information has outlier parameters which is equal to 118. I have removed age 118 rows because there are some other missing info for that age.

I have converted the date column to ease analysis. I have kept only year info for “became member on”.
I have dropped “value” column.
I have used dictionary to renumber the complex numbers for modelling purposes and to understand the data easily.
Merged 3 datasets and filtered according to different needs for further evaluation.

Data Exploration

First of all, I would like to understand target group of Starbucks via comparing transaction and offer datasets. Starbucks generally uses the same demographics distribution of transaction for offers. There is only one case that I catched the difference. Income has quite different especially higher than 75K for 2 datasets. So that means Starbucks sent more offers as a percentage to 75 to 100K income customers. They targeted this customer set.

Age and became member year distribution is nearly same for 2 datasets.

I have checked the offer send period and its effects. When I checked graphs separately, we can say that offers were not sent every day. The interval was decreased after 2nd offer. On the other hand, offer receivals are going to decrease after day 0 continuously. Transactions w/o offers has increased with the increased frequency of offers.

Discount and bogo offers have same quantity. Bogo offers views are higher than discount on the other hand discount completion rate is higher.

Women has higher income than men according to distribution results.

Summary of Findings during Heuristic Analysis

First, I have used visualization techniques to get some insights with the help of visualization techniques.

1-) Transactions w/o offers has increased with the increased frequency of offers. So to increase the frequency of offers will attract the customers more. Due to limited size of the data, we could not know the bottlenecks of high duration effect of this action.

2-) Increased offer frequency increases the completion ratio.

3-) Discount and bogo offers have same quantity. Bogo offers views are higher than discount on the other hand discount completion rate is higher.

4-) When difficulty is getting higher the offer completion ratio is worsen as expected for 5 and 10 day durations. On the other hand, if the duration is 7 days then the results is not correlated with difficulty even difficult ones has less rewards.

5-) If difficulty and reward are same, then completion is getting higher when duration increases as expected.

6-) All offers are sent nearly same day so there is no difference according to time wise. On the other hand, there is no significant differences on statistics between received and viewed ones. I could not detect why there is a difference of receivals ratio for 7 day duration events even it is not logical.

7-) Women’s completion ratio is significantly higher than men. Especially on discount, women completion ratio is higher than view.

8-) Women and men have same top3 offer choice.

9-) Women have less sensitivity to offer type but offer type is important for men. If the difficulty is higher than 1dolar per day then men’s tendecy to complete offer decreases significantly.

10-) Offer completion is not sensitive to women’s age.

11-) Younger men prefer low difficulty offers while mid and higher ages(<70) can prefer high difficulty ones.

12-) Starbucks uses same strategy for whole users as a “become a member year” wise. Offer completion ratio is getting higher for the 3th and 4th year users(2015 and 2016) but getting decrease after 4th year. So the offer type should be more specific for older users to keep retention rate high.

13-) Men who have 35K to 50K income have different choices on different offers. Women who have 65K to 85K income have different choices on different offers.

As a conclusion of findings, I can use the strategy as below. I will check the correlation and use a model on next steps.

Men customer who were 20–45 years old responded to offer 7 and 8 higher than other offers.

Men customer who were 45–60 years old responded to other offers rather than 7 and 8.

Men customer who have 35K to 50K income responded to other offers rather than 4,6,9.

Women are less sensitive to offer types but they have a tendecy to discount offers. Increase discount offers for women.

Women customer who have 65K to 85K income responded to offer 1 and 10.

Focus on offer 7,8,5 and 1 in general. Retention rate is higher.

3th and 4th year members has higher retention rate in general.

Correlation Results

Even I have find some insights with the help of visualization techniques, I could not find enough correlation with that parameters. Age, became_member_on, gender, income and time have been checked to see if there is a correlation with offer_id, event and offer parameters like reward etc.

Model Evaluation and Validation

I have used 2 different models to select the effective one. Decision tree is normally used to prospective customers with using demographics data. Random forest also has similar usages so I will be interesting to see each model for this data set.

https://corporatefinanceinstitute.com/resources/knowledge/other/decision-tree/#:~:text=Decision%20trees%20are%20used%20for,and%20continuous%20variable%20decision%20trees.

I have changed the test size 0.33 and 0.5 respectively but the results(shown figures below) did not change significantly so I do not try to optimize the parameters. I have divided the dataset for train and test purpose and checked each model results with the help of F1 Score.

Test size 0.33

Test size 0.5

Then I have tried to optimize the params to see if there is an effect. I select decision tree as a base model and optimize random forest classifier. I have added Gridsearch to find optimums. The result table has been changed according to this result. We have seen a significant increase on Test F1 score on the other hand train score has been decreased dramatically.

Accuracy is used when the True Positives and True negatives are more important while F1-score is used when the False Negatives and False Positives are crucial. Accuracy can be used when the class distribution is similar while F1-score is a better metric when there are imbalanced classes as in the above case. In most real-life classification problems, imbalanced class distribution exists and thus F1-score is a better metric to evaluate our model on. Reference: https://medium.com/analytics-vidhya/accuracy-vs-f1-score-6258237beca2#:~:text=Accuracy%20is%20used%20when%20the,as%20in%20the%20above%20case.

After selection process, I have build dataframe to show the inputs, outputs and expected outputs of the model.

Reflections & Justification

I have followed the path that I described on Strategy and Project Goal section. At first, I think I can find useful insights with the help of visualization however my expectations haven’t met with quantitative and correlation results. It was really interesting and challenging part for me. Even I did not understand why there is no correlation even 1 parameter. After that I have passed the modelling and I have get fair results. It was really interesting results that my insight and other results that are not correlated. Normally I have get good insights during exploration phase.

I haved focused on 2 question in the beginnig. Even I have not validated main feastures of demographics, I think there will be an efficient model to predict user response.

Improvements

I can use my insight of visualization analysis results and check whether which insight improve my modelling results. If it works, I can quantify these results and then apart them into modelling section.

If the dataset can be enlarged for other products then may be it will be easier to find correlation with parameters and outputs. I would like to extend time period of data to remove seasonal effects that can affect data.

Try to do more feature engineering duties can be helpful. I do not have enough time to select different feature sets for analysis.

XGBoost or knn can be used for clarification.

You can reach the project repository from the GitHub link below.

mdurus/Starbucks-Project

Starbucks Capstone Challenge This data set contains simulated data that mimics customer behavior on the Starbucks…

github.com