Command Line Interface
While the SDK is written in python and can be easily used in a notebook environment, it can also be used to synthesize data directly from the command line by running:
synthesize input.csv
where input.csv
is the input data used to train the model.
Options
A variety of options can be used to configure and control the output of the synthesize
method. These can be shown
using the -h
flag as shown below:
synthesize -h
usage: synthesize [-h] [-c config.yaml] [-n N] [-s steps] [-o out_file] file
Create a synthetic copy of a given csv file.
positional arguments:
file The path to the original csv file.
optional arguments:
-h, --help Show this help message and exit
-g generate.yaml,
--generate generate.yaml Generate an optional yaml configuration file
-c config.yaml,
--config config.yaml Path to an optional yaml config file
-n N The number of rows to synthesize
(default: The same number as the original data)
-s steps The number of training steps
(default: Use learning manager instead)
-o out_file,
--output out_file The destination path for the synthesized data
(default: outputs to stdout)
For example, if 1000 rows of synthetic data was generated and written to a file called output.csv
in the same
directory as the original the command would be:
synthesize input.csv -n 1000 -o output.csv
By specifying a YAML configuration with the -c
option, the default behaviour of the HighDimSynthesizer
can be tuned
for a user’s particular purposes. This is covered in YAML Configuration.
If a parameter is specified in the YAML config and also as a command line argument, then the command line argument will take priority. |
The -g
option can be used to autogenerate a YAML configuration file based on the default behaviour of the SDK. This file
can then be modified appropriately to tune the behaviour of the SDK for a given purpose.