多借据合并
原始数据只有user_id/lnid/fund_date三个字段
把每个user时间窗1个月内的借据进行合并,合并出来的组别如group_rank所示,超过1个月时间窗,重置start_date,2组的start_date是2020-06-01
当然pandas也能很方便的分组编号
用SQL有限时间未能想出怎么写,想了个笨办法,如果组别太多会比较蠢。
首先可以在SQL中用row_number()编号新增一个字段loan_order,然后用python实现,比较简单,如下:
df = pd.read_csv('xxx.csv',parse_dates=['fund_date'])
df['group_rank'] = -999
for i in range(df.shape[0]):
if df.loc[i,'loan_order'] == 1:
rank_num = 1
start_date = df.loc[i,'fund_date']
df.loc[i,'group_rank'] = rank_num
else:
if df.loc[i,'fund_date'] <= start_date + pd.DateOffset(months=1):
df.loc[i,'group_rank'] = rank_num
else:
rank_num += 1
df.loc[i,'group_rank'] = rank_num
start_date = df.loc[i,'fund_date']
当然pandas也能很方便的分组编号
df = df.sort_values(by=['user_id','fund_date'])
df['loan_order'] = df.groupby(['user_id']).cumcount()+1
留言
張貼留言