Annotations

Sometimes multiple columns of a tabular dataset describe a single entity. Annotations provide a way to link such columns together in order to synthesize higher quality data. See the annotations guide for more details.

Address

  • Python

  • YAML

from synthesized.config import AddressLabels
from synthesized.metadata.value import Address

address = Address(
    name="annotated_columns",
    labels=AddressLabels(
        postcode="col_postcode",
        county="col_country",
        city="col_city",
        district="col_district",
        street="col_street",
        house_number="col_house"
    )
)
address:
  - name: annotated_columns
    labels:
      postcode: col_postcode
      county: col_country
      city: col_city
      district: col_district
      street: col_street
      house_number: col_house

Properties

  • name: Name for the annotation. Cannot be the same as any existing column name.

  • labels (optional): Mapping of labels to column names.

Bank

  • Python

  • YAML

from synthesized.config import BankLabels
from synthesized.metadata.value import Bank

address = Bank(
    name="annotated_columns",
    labels=BankLabels(
        sort_code="col_sc",
        account="col_acc"
    )
)
bank:
  - name: annotated_columns
    labels:
      sort_code: col_sc
      account: col_acc

Properties

  • name: Name for the annotation. Cannot be the same as any existing column name.

  • labels (optional): Mapping of labels to column names.

Formatted String

  • Python

  • YAML

from synthesized.metadata.value import FormattedString

regex = "^(?!666|000|9\\d{2})\\d{3}-(?!00)\\d{2}-(?!0{4})\\d{4}$"
p1 = FormattedString(
  name="colA",
  pattern=regex
)
formatted_string:
  - name: colA
    pattern: "^(?!666|000|9\\d{2})\\d{3}-(?!00)\\d{2}-(?!0{4})\\d{4}$"

Properties

  • name: Name for the annotation. Cannot be the same as any existing column name.

  • pattern: A regex string describing the format of the strings in the column.

Person

  • Python

  • YAML

from synthesized.config import PersonLabels
from synthesized.metadata.value import Person

p1 = Person(
  name="annotated_columns",
  labels=PersonLabels(
    first_name="col_first",
    last_name="col_last",
    gender="col_gender",
    email="col_email"
  )
)
person:
  - name: annotated_columns
    labels:
      first_name: col_first
      last_name: col_last
      gender: col_gender
      email: col_email

Properties

  • name: Name for the annotation. Cannot be the same as any existing column name.

  • labels (optional): Mapping of labels to column names.