MetaExtractor#
synthesized.MetaExtractor
- class MetaExtractor(config=None)#
Extract the
synthesized.DataFrameMeta
from a data frameMethods
create_meta
(x[, name, annotations, ...])Instantiate a Meta object from a pandas series or data frame.
default_config
()- rtype
MetaFactoryConfig
extract
(df[, config, annotations, ...])Instantiate and extract the DataFrameMeta that describes a data frame.
- create_meta(x, name='df', annotations=None, type_overrides=None, id_index=None, time_index=None)#
Instantiate a Meta object from a pandas series or data frame.
The underlying numpy dtype kind (e.g ‘i’, ‘M’, ‘f’) is used to determine the derived Meta object for a series.
- Parameters
x (
Union
[Series
,DataFrame
]) – a pandas series or data frame for which to create the Meta instancename (
Optional
[str
]) – Optional; The name of the instantiated DataFrameMeta if x is a data frameannotations (
Optional
[List
[ValueMeta
]]) – Any metas that should be applied on a DataFrame and incorporated into the meta hierarchy.type_overrides (List[ValueMeta], optional) – Override the Meta for particular columns of the DataFrame.
id_index (
Optional
[str
]) – (Optional) The name of the column representing the id index.time_index (
Optional
[str
]) – (Optional) The name of the column representing the time index.
- Return type
Union
[ValueMeta
,DataFrameMeta
]- Returns
A derived ValueMeta instance or DataFrameMeta instance if x is a pd.Series or pd.DataFrame, respectively.
- Raises
UnsupportedDtypeError – The data type of the pandas series is not supported.
TypeError – An error occurred during instantiation of a ValueMeta.
- static extract(df, config=None, annotations=None, type_overrides=None, id_index=None, time_index=None)#
Instantiate and extract the DataFrameMeta that describes a data frame.
- Parameters
df (pd.DataFrame, optional) – Dataset to instantiate and extract DataFrameMeta.
config (MetaFactoryConfig, optional) – Custom configuration parameters to MetaFactory. Defaults to None.
annotations (List[Union[Address, Bank, Person]], optional) – Annotations for the dataframe. Defaults to None.
type_overrides (List[ValueMeta], optional) – Override the Meta for particular columns of the DataFrame.
id_index (
Optional
[str
]) – (Optional) The name of the column representing the id index.time_index (
Optional
[str
]) – (Optional) The name of the column representing the time index.
- Return type
- Returns
The DataFrameMeta instance for the given data.
- Raises
UnsupportedDtypeError – The data type of a column in the data frame pandas is not supported.
TypeError – An error occurred during instantiation of a ValueMeta.
Examples
Extract the DataFrameMeta from DataFrame:
>>> df = pd.read_csv(...) >>> df_meta = MetaExtractor.extract(df)
Annotate a DataFrame with a Person annotation to generate fake genders, first name and last name PII for each person:
>>> from synthesized.config import PersonLabels >>> from synthesized.metadata.value import Person >>> person_labels = PersonLabels(gender_label='gender', firstname_label='first_name', lastname_label='last_name') >>> person = Person(name='person', labels='person_labels') >>> df_meta = MetaExtractor.extract(df, annotations=[person])
Override a DateTime column with a BusDateTime column to enforce business days only:
>>> from synthesized.metadata.value import BusDateTime >>> business_dates = BusDateTime(name="transaction_date") >>> df_meta = MetaExtractor.extract(df, type_overrides=[business_dates])