Changelog
Version 3.2
20 March 2024
enhancement Model training time improvements
Training time for models has been improved, with an ~2x speedup for training models across a variety of datasets.
enhancement Synthetic data quality improvements
Synthesized’s synthetic data quality has been improved when using the Pandas
interface, with better support for complex data distributions resulting in more accurate
synthetic data generation.
enhancement Use configuration objects in-place of a large number of arguments
Previously, a large number of arguments had to be provided when using the TableSynthesizer
class methods from_data_interface
and
from_meta_collection
. This has been considerably simplified through the use of a pydantic data transfer object. Now,
users can provide a TrainConfig
object in when creating a TableSynthesizer
instance using these methods:
v3.1 |
v3.2 |
|
|
Version 3.1
26 January 2024
feature YAML config auto-generation
It is now possible to automatically generate YAML config files for datasets.
feature YAML schema and hinting
A YAML schema for YAML config files can now be set in IDEs to enable YAML config file type hinting for improved and
easier writing of YAML config files. I.e. users can now hit the tab
button when writing YAML config files and see
the available configuration options for the SDK.
feature Spark DateType
native support
Native support to train and synthesize Spark DateType
columns was added (in addition to the TimestampType
and
TimestampNTZType
data types already supported).
enhancement Faster Spark Meta Extraction
2x faster extraction of Spark dataset meta information was achieved by implementing various performance optimisations.
enhancement Automatic Sampling
Automatic detection of very high cardinality columns was added, with such columns
now automatically modelled with the SamplingModel
model, matching the behaviour of SDK 2.9 for minimal code-conversion
impact.
enhancement Automatic Enumeration
Automatic detection of enumerated columns (i.e. columns with predictable increases in values, like ID columns) was
added, with such columns now automatically modelled with the EnumerationModel
model, matching the behaviour of
SDK 2.9 for minimal code-conversion impact.