1) Scan through the below steps and import needed Python libraries (don’t worry, you can import them later if you forget one)
2) Load the data from the csv file
3) Perform basic commands to understand the data
4) Bin the following features:
a) 'currentterm' into [0 to 11], [11 and more]
b) 'mrr_entry' into [0 to 14.99], [14.99 to 500], [500 to 5K], [5K and more]
c) 'account_age’ into [0 to 90], [90 to 180], [180 to 360], [360 and more]
d) 'days_left_in_term’ into [0 to 30], [30 to 360], [360 and more]
5) Set 'churn_next_90' as your target column
6) Set 'zoom_account_no' as an ID column, this should not be a feature
7) Set 'ahs_date' as a date column, this should not be a feature
8) Treat the binned features from step (4) and the following features as categorical features:
a) 'sales_group',
b) 'employee_count',
c) 'coreproduct'
9) Perform feature selection using your preferred method and ML algorithm. Choose 10 features and continue to step (10).
10) Divide the new data frame (with 10 features) into test and train subset
11) Use a different algorithm from part (9) and perform cross-validation method for parameter tuning. Print out the results.
12) Based on results from (11), fit your model on the train subset
13) Test your fitted model using the test subset
14) Print feature importance, accuracy score (roc_auc_score), and confusion matrix (crosstab) from step (13)
15) Save your trained model using pickle