How Generation Works

Understand how the platform’s generation engine creates synthetic data from scratch while maintaining referential integrity and realistic distributions.

Overview

Data generation creates entirely new data based on your database schema and existing data. Unlike masking (which transforms existing data), generation creates new data but does statistical analysis of existing data to ensure the output is as close to production as possible whilst ensuring you can customize data for edge cases and specific scenarios.

Generated vs Real Data Comparison

Table 1. Quality Comparison
Characteristic Real Production Data Generated Synthetic Data

Realism

100% real patterns and distributions

95%+ realistic, statistically similar

Privacy Compliance

Contains real PII/PHI (risky)

Zero real PII/PHI (safe)

Referential Integrity

All foreign keys valid

All foreign keys valid

Data Volumes

Limited by production size

Configurable (1 to billions)

Unusual Edge Cases

Contains real-world anomalies

Includes scenarios not in production

Testing Value

Reflects real scenarios

Great for testing different scenarios

Refresh Speed

Slow (requires production dump)

Fast (generate on-demand)

Best Practice: Use generation for:

  • Performance testing - Generate millions of rows quickly

  • Privacy-sensitive environments - No real data exposure

  • New features - Create test data before production data exists

  • Training/demos - Clean, predictable datasets