Type 1: Cross-sectional Data
Shown below is the synthesis of a standard time-dependent dataset utilising the HighDimSynthesizer.
Possible actions:
|
import pandas as pd
from synthesized import HighDimSynthesizer, MetaExtractor
df = pd.read_csv("claim_prediction.csv")
print(df)
age sex bmi children smoker region charges insuranceclaim 0 19 0 27.900 0 1 3 16884.92400 1 1 18 1 33.770 1 0 2 1725.55230 1 2 28 1 33.000 3 0 2 4449.46200 0 3 33 1 22.705 0 0 1 21984.47061 0 4 32 1 28.880 0 0 1 3866.85520 1 ... ... ... ... ... ... ... ... 1333 50 1 30.970 3 0 1 10600.54830 0 1334 18 0 31.920 0 0 0 2205.98080 1 1335 18 0 36.850 0 0 2 1629.83350 1 1336 21 0 25.800 0 0 3 2007.94500 0 1337 61 0 29.070 0 1 1 29141.36030 1 [1338 rows x 8 columns]
df_meta = MetaExtractor.extract(df)
from synthesized.model import DataFrameModel
DataFrameModel(df_meta).fit(df).plot();

synth = HighDimSynthesizer(df_meta)
synth.learn(df_train=df)
df_synth = synth.synthesize(num_rows=len(df))
print(df_synth)
age sex bmi children smoker region charges insuranceclaim 0 41 1 24.320000 2 1 2 19682.501953 1 1 63 0 25.840000 0 0 2 17583.591797 0 2 46 0 26.410000 0 0 3 7333.937500 0 3 58 0 35.725399 3 1 2 47455.164062 1 4 20 0 21.469999 0 0 3 1656.546021 0 ... ... ... ... ... ... ... ... 1333 40 0 36.067009 3 0 1 7745.080078 0 1334 42 0 36.443886 5 1 3 22296.542969 1 1335 34 1 25.741701 2 0 0 6357.533691 0 1336 44 0 34.099998 1 0 3 7381.229980 1 1337 51 1 29.196758 0 0 0 8529.495117 1 [1338 rows x 8 columns]