Changelog

Version `3.1`

26 January 2024

feature YAML config auto-generation

It is now possible to automatically generate YAML config files for datasets.

feature YAML schema and hinting

A YAML schema for YAML config files can now be set in IDEs to enable YAML config file type hinting for improved and easier writing of YAML config files. I.e. users can now hit the tab button when writing YAML config files and see the available configuration options for the SDK.

feature Spark `DateType` native support

Native support to train and synthesize Spark DateType columns was added (in addition to the TimestampType and TimestampNTZType data types already supported).

enhancement Faster Spark Meta Extraction

2x faster extraction of Spark dataset meta information was achieved by implementing various performance optimisations.

enhancement Automatic Sampling

Automatic detection of very high cardinality columns was added, with such columns now automatically modelled with the SamplingModel model, matching the behaviour of SDK 2.9 for minimal code-conversion impact.

enhancement Automatic Enumeration

Automatic detection of enumerated columns (i.e. columns with predictable increases in values, like ID columns) was added, with such columns now automatically modelled with the EnumerationModel model, matching the behaviour of SDK 2.9 for minimal code-conversion impact.

Version `3.0`

01 December 2023

enhancement Native Spark support

Synthesized’s SDK now natively supports Spark, allowing you to easily generate synthetic data for your Spark dataframes. It also supports distributed training of models on Spark clusters allowing you to scale your synthetic data generation to large datasets.