MetaExtractor.extract

static MetaExtractor.extract(df, config=None, annotations=None, type_overrides=None, id_index=None, time_index=None)

Instantiate and extract the DataFrameMeta that describes a data frame.

Parameters
  • df (pd.DataFrame, optional) – Dataset to instantiate and extract DataFrameMeta.

  • config (MetaFactoryConfig, optional) – Custom configuration parameters to MetaFactory. Defaults to None.

  • annotations (List[Union[Address, Bank, Person]], optional) – Annotations for the dataframe. Defaults to None.

  • type_overrides (List[ValueMeta], optional) – Override the Meta for particular columns of the DataFrame.

  • id_index (Optional[str]) – (Optional) The name of the column representing the id index.

  • time_index (Optional[str]) – (Optional) The name of the column representing the time index.

Return type

DataFrameMeta

Returns

The DataFrameMeta instance for the given data.

Raises
  • UnsupportedDtypeError – The data type of a column in the data frame pandas is not supported.

  • TypeError – An error occurred during instantiation of a ValueMeta.

Examples

Extract the DataFrameMeta from DataFrame:

>>> df = pd.read_csv(...)
>>> df_meta = MetaExtractor.extract(df)

Annotate a DataFrame with a Person annotation to generate fake genders, first name and last name PII for each person:

>>> from synthesized.config import PersonLabels
>>> from synthesized.metadata.value import Person
>>> person_labels = PersonLabels(gender_label='gender', firstname_label='first_name', lastname_label='last_name')
>>> person = Person(name='person', labels='person_labels')
>>> df_meta = MetaExtractor.extract(df, annotations=[person])

Override a DateTime column with a BusDateTime column to enforce business days only:

>>> from synthesized.metadata.value import BusDateTime
>>> business_dates = BusDateTime(name="transaction_date")
>>> df_meta = MetaExtractor.extract(df, type_overrides=[business_dates])