class predictive_modelling_score(data, y_label, x_labels, model, synth_data=None, copy_model=True, preprocessor=None, df_meta=None, df_model=None, sample_size=None)#

Calculates the R-squared or ROC AUC score of a given model trained on a given dataset.

This function will fit a regressor or classifier depending on the datatype of the y_label. All necessary preprocessing (e.g standard scaling, one-hot encoding) is done in the function. The input data is automatically split into a training and testing set in order to evaluate the model performance.

  • data (pd.DataFrame) – Input dataset.

  • y_label (str) – Name of the target variable column/response variable to predict.

  • x_labels (List[str], optional) – Input column names/explanatory variables. Defaults to None, in which case all columns in the dataset except y_label will be used as predictors.

  • model (Union[str, sklearn.base.BaseEstimator]) – One of ‘Linear’, ‘GradientBoosting’, ‘RandomForest’, ‘MLP’, ‘LinearSVM’, or ‘Logistic’. Note that ‘Logistic’ only applies to categorical response variables. Alternatively, a custom model class that inherits from sklearn.base.BaseEstimator can be specified.

  • synth_data (pd.DataFrame) – Train the model on this synthetic data but evaluate it’s performance on the original.


The score, metric (‘r2’ or ‘roc_auc’), and the task (‘regression’, ‘binary’, or ‘multinomial’)