Evaluation#

Synthesized contains a variety of methods that can be used to assess the quality and utility of the generated synthetic data empirically and visually.

Univariate Metrics

KolmogorovSmirnovDistance()

Kolmogorov-Smirnov statistic between two continuous variables.

EarthMoversDistance()

Earth mover's distance (aka 1-Wasserstein distance) between two nominal variables.

Multivariate Metrics

CramersV()

Cramér's V correlation coefficient between nominal variables.

KendallTauCorrelation([max_p_value, ...])

Kendall's Tau correlation coefficient between ordinal variables.

CategoricalLogisticR2()

McFadden's pseudo R-squared coefficient between categorical and continuous variables.

SpearmanRhoCorrelation([max_p_value])

Spearman's rank correlation coefficient between ordinal variables.

Modelling Metrics

predictive_modelling_score(data, y_label, ...)

Calculates the R-squared or ROC AUC score of a given model trained on a given dataset.

predictive_modelling_comparison(data, ...[, ...])

Compare the R-squared or ROC AUC score of a model trained on original data and synthetic data, and tested on hold-out sample of original data.

Plotting & Analysis

Assessor(df_meta)

A universal set of utilities that let you to assess the quality of synthetic against original data.