At Synthesized we have spent a lot of time thinking about how to evaluate synthetic data. We appreciate users will likely want to do the same, so we created a template evaluation framework as a starter-for-10 that should include the essential features users might want to evaluate, approaching the task in a thorough, logical manner and explaining how to evaluate the various points. See Product Evaluation Framework and Data Quality Evaluation Framework.

The Synthesized SDK includes several methods to assess the privacy, quality and utility of generated data. These can be used to answer several related questions, including:

  • Privacy: how much sensitive and private information from the original data can be extracted from the generated data?

  • Statistical Quality: does the generated data closely resemble the original data, and maintain the statistical properties and correlations?

  • Predictive Utility: does the generated data maintain the predictive performance for an ML classification/regression task?