Sanitizer.find_unique¶
-
static
Sanitizer.
find_unique
(df_synth, df_orig, distances, n_col_intersect=None, skip_categorical=False, df_meta=None) Method finds the rows in df_synth that do not appear in df_orig, returns a boolean mask for these rows
- Parameters
df_synth (
DataFrame
) – pd.DataFrame, synthetic data to find unique rows fromdf_orig (
DataFrame
) – pd.DataFrame, original data to compare against.distances (
Dict
[str
,Optional
[float
]]) – Dict[str, Union[float, None]] size of bins to use for each continuous column, a none value denotes a categorical columns.n_cols – int, number of columns that need to match before two rows are determined to be unique, by default, requires all columns to match.
skip_categorical (
bool
) – bool, whether to disallow for matches of only categorical columns, by default, False.df_meta (
Optional
[DataFrameMeta
]) – DataFrameMeta to inform binning process, optional
- Return type
Index