MetaExtractor#

synthesized.MetaExtractor

class MetaExtractor(config=None)#

Extract the synthesized.DataFrameMeta from a data frame

Methods

create_meta(x[, name, annotations, ...])

Instantiate a Meta object from a pandas series or data frame.

default_config()

rtype

MetaFactoryConfig

extract(df[, config, annotations, ...])

Instantiate and extract the DataFrameMeta that describes a data frame.

create_meta(x, name='df', annotations=None, type_overrides=None, id_index=None, time_index=None)#

Instantiate a Meta object from a pandas series or data frame.

The underlying numpy dtype kind (e.g ‘i’, ‘M’, ‘f’) is used to determine the derived Meta object for a series.

Parameters
  • x (Union[Series, DataFrame]) – a pandas series or data frame for which to create the Meta instance

  • name (Optional[str]) – Optional; The name of the instantiated DataFrameMeta if x is a data frame

  • annotations (Optional[List[ValueMeta]]) – Any metas that should be applied on a DataFrame and incorporated into the meta hierarchy.

  • type_overrides (List[ValueMeta], optional) – Override the Meta for particular columns of the DataFrame.

  • id_index (Optional[str]) – (Optional) The name of the column representing the id index.

  • time_index (Optional[str]) – (Optional) The name of the column representing the time index.

Return type

Union[ValueMeta, DataFrameMeta]

Returns

A derived ValueMeta instance or DataFrameMeta instance if x is a pd.Series or pd.DataFrame, respectively.

Raises
  • UnsupportedDtypeError – The data type of the pandas series is not supported.

  • TypeError – An error occurred during instantiation of a ValueMeta.

static extract(df, config=None, annotations=None, type_overrides=None, id_index=None, time_index=None)#

Instantiate and extract the DataFrameMeta that describes a data frame.

Parameters
  • df (pd.DataFrame, optional) – Dataset to instantiate and extract DataFrameMeta.

  • config (MetaFactoryConfig, optional) – Custom configuration parameters to MetaFactory. Defaults to None.

  • annotations (List[Union[Address, Bank, Person]], optional) – Annotations for the dataframe. Defaults to None.

  • type_overrides (List[ValueMeta], optional) – Override the Meta for particular columns of the DataFrame.

  • id_index (Optional[str]) – (Optional) The name of the column representing the id index.

  • time_index (Optional[str]) – (Optional) The name of the column representing the time index.

Return type

DataFrameMeta

Returns

The DataFrameMeta instance for the given data.

Raises
  • UnsupportedDtypeError – The data type of a column in the data frame pandas is not supported.

  • TypeError – An error occurred during instantiation of a ValueMeta.

Examples

Extract the DataFrameMeta from DataFrame:

>>> df = pd.read_csv(...)
>>> df_meta = MetaExtractor.extract(df)

Annotate a DataFrame with a Person annotation to generate fake genders, first name and last name PII for each person:

>>> from synthesized.config import PersonLabels
>>> from synthesized.metadata.value import Person
>>> person_labels = PersonLabels(gender_label='gender', firstname_label='first_name', lastname_label='last_name')
>>> person = Person(name='person', labels='person_labels')
>>> df_meta = MetaExtractor.extract(df, annotations=[person])

Override a DateTime column with a BusDateTime column to enforce business days only:

>>> from synthesized.metadata.value import BusDateTime
>>> business_dates = BusDateTime(name="transaction_date")
>>> df_meta = MetaExtractor.extract(df, type_overrides=[business_dates])