Annotations
Sometimes multiple columns of a tabular dataset describe a single entity. Annotations provide a way to link such columns together in order to synthesize higher quality data. See the annotations guide for more details.
Address
-
Python
-
YAML
from synthesized.config import AddressLabels
from synthesized.metadata.value import Address
address = Address(
name="annotated_columns",
labels=AddressLabels(
postcode="col_postcode",
county="col_country",
city="col_city",
district="col_district",
street="col_street",
house_number="col_house"
)
)
address:
- name: annotated_columns
labels:
postcode: col_postcode
county: col_country
city: col_city
district: col_district
street: col_street
house_number: col_house
Properties
-
name
: Name for the annotation. Cannot be the same as any existing column name. -
labels
(optional): Mapping of labels to column names.
Bank
-
Python
-
YAML
from synthesized.config import BankLabels
from synthesized.metadata.value import Bank
address = Bank(
name="annotated_columns",
labels=BankLabels(
sort_code="col_sc",
account="col_acc"
)
)
bank:
- name: annotated_columns
labels:
sort_code: col_sc
account: col_acc
Properties
-
name
: Name for the annotation. Cannot be the same as any existing column name. -
labels
(optional): Mapping of labels to column names.
Formatted String
-
Python
-
YAML
from synthesized.metadata.value import FormattedString
regex = "^(?!666|000|9\\d{2})\\d{3}-(?!00)\\d{2}-(?!0{4})\\d{4}$"
p1 = FormattedString(
name="colA",
pattern=regex
)
formatted_string:
- name: colA
pattern: "^(?!666|000|9\\d{2})\\d{3}-(?!00)\\d{2}-(?!0{4})\\d{4}$"
Properties
-
name
: Name for the annotation. Cannot be the same as any existing column name. -
pattern
: A regex string describing the format of the strings in the column.
Person
-
Python
-
YAML
from synthesized.config import PersonLabels
from synthesized.metadata.value import Person
p1 = Person(
name="annotated_columns",
labels=PersonLabels(
first_name="col_first",
last_name="col_last",
gender="col_gender",
email="col_email"
)
)
person:
- name: annotated_columns
labels:
first_name: col_first
last_name: col_last
gender: col_gender
email: col_email
Properties
-
name
: Name for the annotation. Cannot be the same as any existing column name. -
labels
(optional): Mapping of labels to column names.