Models
For data with a given Meta
type, Models describe how we choose to model and generate data.
Overriding Default Models
The HighDimSynthesizer
automatically determines the manner in which each column is modelled. The
default behaviour can be overriden.
from synthesized import HighDimSynthesizer
from synthesized.model.models import Histogram
categorical_model = Histogram(meta=df_meta["categorical_col"])
synth = HighDimSynthesizer(df_meta, type_overrides=[categorical_model])
The usage of each Model
implementation in the SDK is listed below:
Enumeration
Used to generate data from a minimum value up to maximum in discrete, constant steps.
-
Python
-
YAML
from synthesized.model.model import Enumeration
enum_model = Enumeration(
meta=df_meta["colA"],
start=1,
step=1
)
Properties
-
meta
: The meta of the column that is being modelled. -
start
(optional): Value to start the enumeration from. If not provided, the minimum will be inferred from the meta. -
step
(optional): Step size of the enumeration. If not provided, inferred from the meta.
enumeration:
- name: "colA"
start: 1
step: 1
Properties
-
name
: The name of the meta that is being modelled. -
start
(optional): Value to start the enumeration from. If not provided, the minimum will be inferred from the meta. -
step
(optional): Step size of the enumeration. If not provided, inferred from the meta.
KernelDensityEstimate
Used to model numeric data types, including datetimes, in a continuous manner.
-
Python
-
YAML
from synthesized.model.model import KernelDensityEstimate
kde_model = KernelDensityEstimate(
meta=df_meta["colA"],
)
Properties
-
meta
: The meta of the column that is being modelled.
kernel_density_estimate:
- name: "colA"
Properties
-
name
: The name of the meta that is being modelled.
Histogram
Used to model discrete/categorical variables, of any data type.
-
Python
-
YAML
from synthesized.model.model import Histogram
hist_model = Histogram(
meta=df_meta["colA"],
)
Properties
-
meta
: The meta of the column that is being modelled. -
probabilities
(optional): Probability distribution ofcategories
. Emptydict
untilfit
is called.
histogram:
- name: "colA"
Properties
-
name
: The name of the meta that is being modelled. -
probabilities
(optional): Probability distribution ofcategories
. Emptydict
untilfit
is called.