This is a continuation of my Sales Report in Pandas post and it begins with the final table from it. Here it is for one random customer.

report[report['CustomerID'] == 12347.0]
Image for post
Image for post

It represents an activity of this particular customer for the whole available date range aggregated by month. You can see that this activity is not regular.

First, I want to add a flag whether or not customer was active in the previous month.

report['active_prev'] = (report.sort_values(by=['month'], ascending=True)
.groupby(['CustomerID'])['active'].shift(1))

This hard stuff is something like a window function in sql. Because I have many customers I have to somehow say to…


I will use https://www.kaggle.com/carrie1/ecommerce-data. I work in Colab and store data on my Google Drive.

import pandas as pddf = pd.read_csv('drive/My Drive/data/ecommerce-data.zip',encoding = "cp1252")df.head()
Image for post
Image for post

Look at missing values.

df.isna().mean()

Gleb Mikhaylov

I share my experience in data science, computational thinking, Python, Wolfram. https://www.glebmikhaylov.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store