Models

For data with a given Meta type, Models describe how we choose to model and generate data.

Overriding Default Models

The HighDimSynthesizer automatically determines the manner in which each column is modelled. The default behaviour can be overriden.

from synthesized import HighDimSynthesizer
from synthesized.model.models import Histogram

categorical_model = Histogram(meta=df_meta["categorical_col"])
synth = HighDimSynthesizer(df_meta, type_overrides=[categorical_model])

The usage of each Model implementation in the SDK is listed below:

Enumeration

Used to generate data from a minimum value up to maximum in discrete, constant steps.

Python
YAML

from synthesized.model.model import Enumeration

enum_model = Enumeration(
    meta=df_meta["colA"],
    start=1,
    step=1
)

Properties

meta: The meta of the column that is being modelled.
start (optional): Value to start the enumeration from. If not provided, the minimum will be inferred from the meta.
step (optional): Step size of the enumeration. If not provided, inferred from the meta.

enumeration:
  - name: "colA"
    start: 1
    step: 1

Properties

name: The name of the meta that is being modelled.
start (optional): Value to start the enumeration from. If not provided, the minimum will be inferred from the meta.
step (optional): Step size of the enumeration. If not provided, inferred from the meta.

KernelDensityEstimate

Used to model numeric data types, including datetimes, in a continuous manner.

Python
YAML

from synthesized.model.model import KernelDensityEstimate

kde_model = KernelDensityEstimate(
    meta=df_meta["colA"],
)

Properties

meta: The meta of the column that is being modelled.

kernel_density_estimate:
  - name: "colA"

Properties

name: The name of the meta that is being modelled.

Histogram

Used to model discrete/categorical variables, of any data type.

Python
YAML

from synthesized.model.model import Histogram

hist_model = Histogram(
    meta=df_meta["colA"],
)

Properties

meta: The meta of the column that is being modelled.
probabilities (optional): Probability distribution of categories. Empty dict until fit is called.

histogram:
  - name: "colA"

Properties

name: The name of the meta that is being modelled.
probabilities (optional): Probability distribution of categories. Empty dict until fit is called.