ConditionalSampler.alter_distributions

ConditionalSampler.alter_distributions(df, num_rows, produce_nans=False, explicit_marginals=None, association_rules=None, expression_rules=None, generic_rules=None, progress_callback=None)

Given a DataFrame, drop and/or generate new samples so that the column distributions are defined by user-specified marginals distributions. Unlike the ConditionalSampler.synthesize() method, this will keep some of the original data, and therefore the output will not be purely synthetic data.

Parameters
  • df (pd.DataFrame) – DataFrame of original data to modify.

  • num_rows (int) – The number of rows to generate.

  • produce_nans (bool) – Whether to produce NaNs. Defaults to False.

  • explicit_marginals (List[Dict[str, Dict[Union[str, int, float], float]]]) – Desired marginal distributions per column, defined as probably density per category or bin.

  • association_rules (List[Association]) – A list of association rules to apply to the data.

  • expression_rules (List[Expression]) – list of expression rules to apply to the data.

  • generic_rules (List[GenericRule]) – list of generic rules to apply to the data.

  • progress_callback (Callable, optional) – Progress bar callback. Defaults to None.

Return type

DataFrame

Returns

The generated data.

See also

ConditionalSampler.synthesize() : Generate synthetic data from user-specified marginal distributions.