Command Line Interface
While the SDK is written in python and can be easily used in a notebook environment, it can also be used to synthesize data directly from the command line by running:
input.csv is the input data used to train the model.
A variety of options can be used to configure and control the output of the
synthesize method. These can be shown
-h flag as shown below:
synthesize -h usage: synthesize [-h] [-c config.yaml] [-n N] [-s steps] [-o out_file] file Create a synthetic copy of a given csv file. positional arguments: file The path to the original csv file. optional arguments: -h, --help Show this help message and exit -g generate.yaml, --generate generate.yaml Generate an optional yaml configuration file -c config.yaml, --config config.yaml Path to an optional yaml config file -n N The number of rows to synthesize (default: The same number as the original data) -s steps The number of training steps (default: Use learning manager instead) -o out_file, --output out_file The destination path for the synthesized data (default: outputs to stdout)
For example, if 1000 rows of synthetic data was generated and written to a file called
output.csv in the same
directory as the original the command would be:
synthesize input.csv -n 1000 -o output.csv
By specifying a YAML configuration with the
-c option, the default behaviour of the
HighDimSynthesizer can be tuned
for a user’s particular purposes. This is covered in YAML Configuration.
If a parameter is specified in the YAML config and also as a command line argument, then the command line argument will take priority.
-g option can be used to autogenerate a YAML configuration file based on the default behaviour of the SDK. This file
can then be modified appropriately to tune the behaviour of the SDK for a given purpose.