ConditionalSampler.alter_distributions¶
-
ConditionalSampler.
alter_distributions
(df, num_rows, produce_nans=False, explicit_marginals=None, association_rules=None, expression_rules=None, generic_rules=None, progress_callback=None) Given a DataFrame, drop and/or generate new samples so that the column distributions are defined by user-specified marginals distributions. Unlike the
ConditionalSampler.synthesize()
method, this will keep some of the original data, and therefore the output will not be purely synthetic data.- Parameters
df (pd.DataFrame) – DataFrame of original data to modify.
num_rows (int) – The number of rows to generate.
produce_nans (bool) – Whether to produce NaNs. Defaults to False.
explicit_marginals (List[Dict[str, Dict[Union[str, int, float], float]]]) – Desired marginal distributions per column, defined as probably density per category or bin.
association_rules (List[Association]) – A list of association rules to apply to the data.
expression_rules (List[Expression]) – list of expression rules to apply to the data.
generic_rules (List[GenericRule]) – list of generic rules to apply to the data.
progress_callback (Callable, optional) – Progress bar callback. Defaults to None.
- Return type
DataFrame
- Returns
The generated data.
See also
ConditionalSampler.synthesize()
: Generate synthetic data from user-specified marginal distributions.