This is a continuation of my Sales Report in Pandas post and it begins with the final table from it. Here it is for one random customer.

report[report['CustomerID'] == 12347.0]

It represents an activity of this particular customer for the whole available date range aggregated by month. You can see that this activity is not regular.

First, I want to add a flag whether or not customer was active in the previous month.

report['active_prev'] = (report.sort_values(by=['month'], ascending=True)
.groupby(['CustomerID'])['active'].shift(1))

This hard stuff is something like a window function in sql. Because I have many customers I have to somehow say to…


I will use https://www.kaggle.com/carrie1/ecommerce-data. I work in Colab and store data on my Google Drive.

import pandas as pddf = pd.read_csv('drive/My Drive/data/ecommerce-data.zip',encoding = "cp1252")df.head()

Look at missing values.

df.isna().mean()

Gleb Mikhaylov

I share my experience in data science, computational thinking, Python, Wolfram. https://www.glebmikhaylov.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store