This is a continuation of my Sales Report in Pandas post and it begins with the final table from it. Here it is for one random customer.
report[report['CustomerID'] == 12347.0]
It represents an activity of this particular customer for the whole available date range aggregated by month. You can see that this activity is not regular.
First, I want to add a flag whether or not customer was active in the previous month.
report['active_prev'] = (report.sort_values(by=['month'], ascending=True)
.groupby(['CustomerID'])['active'].shift(1))
This hard stuff is something like a window function in sql. Because I have many customers I have to somehow say to pandas that they should be treated separately, so that’s what groupby for. And also I want months to be in order, that’s what sort_values for. Finally, shift gives the value from the previous row (month in this case). …
I will use https://www.kaggle.com/carrie1/ecommerce-data. I work in Colab and store data on my Google Drive.
import pandas as pddf = pd.read_csv('drive/My Drive/data/ecommerce-data.zip',encoding = "cp1252")df.head()
Look at missing values.
df.isna().mean()
About