Synthesized SDK

The SDK generates high quality, privacy-preserving datasets for machine learning and data science use cases. It’s available on PyPI for a free 30-day trial. Install the SDK Now!

Version 2.21 released!

We’re excited to have released version 2.21 of the SDK.

A complete list of changes is available in the changelog.

See the Changelog

New in v2.19: Memory use warning for associations

Synthesized’s SDK now calculates how much memory will be need to create an association and raises a warning or error if it is excessively large.

An allocated_memory parameter can be specified when defining an association to determine what counts as excessively large.

Summary

Internally, the process of handling data by Synthesized can be broken down into three steps after the data is loaded into the python SDK from a data source:

Annotate and preprocess data: the software understands the data formats and types automatically. It is able to handle missing data and erroneous values.
Build a mathematical generative model of data: the software builds a generative representation, a mathematical equation which encapsulates how the properties of data should look like. Internally, this equation allows the user to take pure data noise (a sequence of standard normal random variables) and transform them into the output data which has the properties of original data.
Synthesize a new dataset from the generative model: Finally, when the generative model is trained it can be used to generate new samples of data on demand. Furthermore, the software enables data manipulation which is used to rebalance some of the variables in data so that the output data has the desired properties.

Reference

Explore Python and YAML reference API for some of the core components of the SDK.

Browse

Tutorials

View comprehensive tutorials and download example IPython notebooks which guide through various scenarios.

Browse

Supported Dataset Formats

Synthesized allows you to synthesize various of forms of data. Below are some guides showing how to get started with each.

Tabular

Use the High Dimensional Synthesizer to synthesize high quality, time-independent, tabular data.

Learn More

Time-series

Synthesize regularly spaced time-series data with the TimeSeriesSynthesizer.

Learn More

Event data

Create synthetic event-based data using the EventSynthesizer.

Learn More

User Guides

The user guide provides installation instructions and tutorials that demonstrate how to use the key features of the Synthesized SDK.

Improving Fraud Detection

See how you can improve the performance of your fraud detection models with rebalanced synthetic data.

Data Rebalancing

Better Customer Segmentation

Improve the quality of you customer segmentation by bootstrapping with synthetic data.

Privacy preserving Customer Data

Locally generate highly representational data which preserves the privacy of your customers.