How Generation Works
Understand how the platform’s generation engine creates synthetic data from scratch while maintaining referential integrity and realistic distributions.
Overview
Data generation creates entirely new data based on your database schema and existing data. Unlike masking (which transforms existing data), generation creates new data but does statistical analysis of existing data to ensure the output is as close to production as possible whilst ensuring you can customize data for edge cases and specific scenarios.
Generated vs Real Data Comparison
| Characteristic | Real Production Data | Generated Synthetic Data |
|---|---|---|
Realism |
100% real patterns and distributions |
95%+ realistic, statistically similar |
Privacy Compliance |
Contains real PII/PHI (risky) |
Zero real PII/PHI (safe) |
Referential Integrity |
All foreign keys valid |
All foreign keys valid |
Data Volumes |
Limited by production size |
Configurable (1 to billions) |
Unusual Edge Cases |
Contains real-world anomalies |
Includes scenarios not in production |
Testing Value |
Reflects real scenarios |
Great for testing different scenarios |
Refresh Speed |
Slow (requires production dump) |
Fast (generate on-demand) |
|
Best Practice: Use generation for:
|
Related Pages
-
Generating Data Guide - Comprehensive guide
-
How Masking Works - Comparison with masking
-
All Transformers - Complete transformer reference
-
Generation Basics - Step-by-step tutorial
-
Architecture Overview - System components