Meta Overrides
In the previous section, it was seen a meta object, df_meta
, stores the information regarding the inferred data types
that will be used during model training in the children
attribute. Using the example dataset from
Overrides, calling df_meta.children
gives:
>>> [<Ring[i8]: IntegerBool(name=SeriousDlqin2yrs)>,
... <Ring[f8]: Float(name=RevolvingUtilizationOfUnsecuredLines)>,
... <Scale[i8]: Integer(name=age)>,
... <Scale[i8]: Integer(name=NumberOfTime30-59DaysPastDueNotWorse)>,
... <Ring[f8]: Float(name=DebtRatio)>]
For each entry in df_meta.children
there are three important features:
-
Name: The column name associated with the given
df_meta
entry. -
Abstract data types: In the above example,
Ring
andScale
are both abstract data types. All concrete data types are implementations of these abstract types, which describe generic properties of the data as well as what kinds of operations can be performed on data values. There are five types of abstract data types implemented in the SDK, in a hierarchical structure-
Nominal
: Categorical data -
Ordinal
: Categorical data that can be sorted -
Affine
: Continuous data where the is the notion of distance between two points -
Scale
: Continuous data types where, as well as subtraction, there exists the notion of addition -
Ring
: Continuous data where multiplication and division are defined
-
-
Concrete data type: These are the concrete implementations of the above abstract classes, such as
IntegerBool
,Float
andInteger
. A full list of concrete data types are given below, including the abstract class they are a member of:
Abstract | Concrete | ||||
---|---|---|---|---|---|
Nominal |
|
||||
Ordinal |
|
||||
Affine |
|
||||
Scale |
|
||||
Ring |
|
Type overrides
To override the default behaviour, type_overrides
can be specified in the MetaExtractor.extract()
method. The
specified arguments should be meta value objects as detailed in the table above. For instance, using the example dataset
from Overrides, if it was desired that age
should be interpreted as a float, rather than
an integer:
from synthesized.metadata.value import Float
age_float = Float('age')
df_meta = MetaExtractor.extract(df, type_overrides=[age_float])
print(df_meta.children)
>>> [<Ring[i8]: IntegerBool(name=SeriousDlqin2yrs)>,
... <Ring[f8]: Float(name=RevolvingUtilizationOfUnsecuredLines)>,
... <Ring[f8]: Float(name=age)>,
... <Scale[i8]: Integer(name=NumberOfTime30-59DaysPastDueNotWorse)>,
... <Ring[f8]: Float(name=DebtRatio)>]
Note, type_overrides
are provided as a list since multiple overrides can be specified at once.