Time-Series Synthesis

As well as capturing correlations between columns in tabular data, Synthesized offers the capability to model correlations between sequences of events i.e. time-series data.

Specifically, Synthesized utilises deep learning models that can be used to generate two different types of time-series data:

  • Regular time-series, using the TimeSeriesSynthesizer, where measurements are recorded at regular intervals, e.g. stock market prices.

  • Event-based data, using the EventSynthesizer, where each data entry corresponds to a single event and the time-interval between events is not constant, e.g. bank account transactions.

Both the TimeSeriesSynthesizer and the EventSynthesizer are capable of modelling time-series data for multiple variables, from many independent entities. Both synthesizer classes are therefore capable of modelling panel data, i.e. datasets which contain measurements of multiple “panel members”/unique entities over multiple time-periods. In other words, they are datasets/tables with multiple indices. Panel data can be presented in either long form or wide form, as shown in the figure below.

Long and wide forms of panel data

Both synthesizer classes accept tabular data in long format, as this format provides the most compact way to represent multiple time series in the same dataset.

An example of such a dataset, where there are multiple time series for many independent entities, is shown below:

Table 1. Share prices of companies in March 2020
Date Company Industry Value ($) Volume (M) DOW

02/03/20

AAPL

Tech

470.89

10.5

Monday

03/03/20

AAPL

Tech

471.03

11.2

Tuesday

04/03/20

AAPL

Tech

471.02

9.4

Wednesday

02/03/20

AMZN

Tech

1981.03

2.3

Monday

03/03/20

AMZN

Tech

1981.10

1.9

Tuesday

04/03/20

AMZN

Tech

1981.11

2.2

Wednesday

02/03/20

CLDR

Software

8.21

35.1

Monday

03/03/20

CLDR

Software

8.21

37.7

Tuesday

04/03/20

CLDR

Software

8.36

29.8

Wednesday

02/03/20

JNJ

Healthcare

140.02

11.5

Monday

03/03/20

JNJ

Healthcare

135.59

13.6

Tuesday

04/03/20

JNJ

Healthcare

143.48

10.5

Wednesday

Time-Series Column Specification

Time-series data is generally composed of these types of columns

  • Unique identifiers: This is typically a single column that can be used to differentiate between independent entities, each with their own set of time-series data. In the example above, this would be Company.

  • Timestamps: This is the time at which an event occurred/some variable was measured. This is the Date column in the above dataset.

  • Measurements/event data: This is the data tracked over the given timeperiod, for each timestamp. In the above example the Value ($) and Volume (M) columns are examples of this in the form of continuous data, but this data could be more complex - for instance event data could be strings communicating a patient’s process through a particular treatment plan.

  • Constant features for each ID: These columns are properties of each unique identifier that are constant for the full span of the time series. For example, an individuals date of birth, their bank account number or, in the above example, the Industry a company operates within.

  • Exogenous variables: Exogenous variables are variables that are independent and cannot be affected by the other variables in the dataset. In the particular case of time series data, exogenous variables are parallel time-series that are not directly modelled but apply context. For example, when generating synthetic time-series data for house prices it may be helpful to regard the year as an exogenous variable as the model will learn information regarding market crashes. In the above example DOW could be provided as an exogenous variable.

The TimeSeriesSynthesizer and EventSynthesizer use the keywords specified in the below table to reference these columns:

Table 2. Mapping between column type and argument
Column Type Keyword

Unique Identifier

id_idx

Timestamps

time_idx

Measurements

event_cols

Constants

const_cols

Exogenous variables

exog_cols

For more information on the TimeSeriesSynthesizer and the EventSynthesizer see our tutorials.