static Sanitizer.find_unique(df_synth, df_orig, distances, n_col_intersect=None, skip_categorical=False, df_meta=None)

Method finds the rows in df_synth that do not appear in df_orig, returns a boolean mask for these rows

  • df_synth (DataFrame) – pd.DataFrame, synthetic data to find unique rows from

  • df_orig (DataFrame) – pd.DataFrame, original data to compare against.

  • distances (Dict[str, Optional[float]]) – Dict[str, Union[float, None]] size of bins to use for each continuous column, a none value denotes a categorical columns.

  • n_cols – int, number of columns that need to match before two rows are determined to be unique, by default, requires all columns to match.

  • skip_categorical (bool) – bool, whether to disallow for matches of only categorical columns, by default, False.

  • df_meta (Optional[DataFrameMeta]) – DataFrameMeta to inform binning process, optional

Return type