Type 1: Cross-sectional Data

Shown below is the synthesis of a standard time-dependent dataset utilising the HighDimSynthesizer.

Possible actions:

  • Synthesize new panel members

  • Synthesize the same panel members

import pandas as pd
from synthesized import HighDimSynthesizer, MetaExtractor
df = pd.read_csv("claim_prediction.csv")
print(df)
      age  sex     bmi  children  smoker  region      charges  insuranceclaim
0      19    0  27.900         0       1       3  16884.92400               1
1      18    1  33.770         1       0       2   1725.55230               1
2      28    1  33.000         3       0       2   4449.46200               0
3      33    1  22.705         0       0       1  21984.47061               0
4      32    1  28.880         0       0       1   3866.85520               1
      ...  ...     ...       ...     ...     ...          ...             ...
1333   50    1  30.970         3       0       1  10600.54830               0
1334   18    0  31.920         0       0       0   2205.98080               1
1335   18    0  36.850         0       0       2   1629.83350               1
1336   21    0  25.800         0       0       3   2007.94500               0
1337   61    0  29.070         0       1       1  29141.36030               1

[1338 rows x 8 columns]
df_meta = MetaExtractor.extract(df)
from synthesized.model import DataFrameModel
DataFrameModel(df_meta).fit(df).plot();
cross sectional data 4 0
synth = HighDimSynthesizer(df_meta)
synth.learn(df_train=df)
df_synth = synth.synthesize(num_rows=len(df))
print(df_synth)
      age  sex        bmi  children  smoker  region       charges  insuranceclaim
0      41    1  24.320000         2       1       2  19682.501953               1
1      63    0  25.840000         0       0       2  17583.591797               0
2      46    0  26.410000         0       0       3   7333.937500               0
3      58    0  35.725399         3       1       2  47455.164062               1
4      20    0  21.469999         0       0       3   1656.546021               0
      ...  ...        ...       ...     ...     ...           ...             ...
1333   40    0  36.067009         3       0       1   7745.080078               0
1334   42    0  36.443886         5       1       3  22296.542969               1
1335   34    1  25.741701         2       0       0   6357.533691               0
1336   44    0  34.099998         1       0       3   7381.229980               1
1337   51    1  29.196758         0       0       0   8529.495117               1

[1338 rows x 8 columns]