synthesized.config.HighDimConfig

class HighDimConfig(gender_female_regex='^(f|female)$', gender_male_regex='^(m|male)$', gender_non_binary_regex='^(n|non\\\\Wbinary|u|undefined|NA)$', title_female_regex='^(ms|mrs|miss|dr)\\\\.?$', title_male_regex='^(mr|dr)\\\\.?$', title_non_binary_regex='^(ind|per|m|mx)\\\\.?$', genders=('F', 'M'), person_locale='en', dict_cache_size=10000, mobile_number_format='07xxxxxxxx', home_number_format='02xxxxxxxx', work_number_format='07xxxxxxxx', pwd_length=(8, 12), postcode_regex='([A-Za-z]{1,2})([0-9]+[A-Za-z]?)( *[0-9]+[A-Za-z]{2})', postcode_level=0, address_locale='en_GB', addresses_file='~/.synthesized/addresses.jsonl.gz', learn_postcodes=False, categorical_threshold_log_multiplier=2.5, min_num_unique=10, check_frequency=100, use_checkpointing=True, checkpoint_path=None, n_checks_no_improvement=10, max_to_keep=3, patience=750, tol=0.0001, must_reach_metric=None, good_enough_metric=None, stop_metric_name=None, sample_size=10000, use_engine_loss=True, max_training_time=None, custom_stop_metric=None, continuous_weight=5.0, low_freq_weight=1.0, high_freq_weight=1.0, capacity=128, nan_weight=1.0, categorical_weight=3.5, temperature=1.0, moving_average=True, epsilon=1.0, delta=None, noise_multiplier=1.0, num_microbatches=1, l2_norm_clip=1.0, latent_size=32, network='resnet', num_layers=2, residual_depths=6, batch_norm=True, activation='relu', optimizer='adam', learning_rate=0.003, decay_steps=None, decay_rate=None, initial_boost=0, clip_gradients=1.0, beta=1.0, weight_decay=0.001, differential_privacy=False, distribution='normal', batch_size=64, increase_batch_size_every=500, max_batch_size=1024, synthesis_batch_size=16384, learning_manager=True)

Configuration for synthesized.HighDimSynthesizer.

Methods

__init__([gender_female_regex, ...])

Initialize self.

Attributes

address_locale

Locale for address synthesis.

addresses_file

Path to file with pre-generated addresses.

delta

The delta in (epsilon, delta)-differential privacy.

differential_privacy

Enable differential privacy.

epsilon

Abort model training when this value of epsilon is reached.

home_number_format

Format of home telephone number.

l2_norm_clip

Maximum L2 norm of each microbatch gradient.

learn_postcodes

Whether to learn postcodes from original data, or synthesis new examples.

learning_manager

Control learning with the LearningManager.

mobile_number_format

Format of mobile telephone number.

noise_multiplier

Ratio that determines amount of noise added to each sample during training

num_microbatches

Number of microbatches on which average gradient is calculated for clipping.

person_locale

Locale for name synthesis.

postcode_level

Level of the postcode to learn.

postcode_regex

Regular expression for postcode synthesis.

privacy_config

pwd_length

Length of password.

work_number_format

Format of work telephone number.

learning_manager: bool = True

Control learning with the LearningManager.