Time-Series Synthesis
As well as capturing correlations between columns in tabular data, Synthesized offers the capability to model correlations between sequences of events i.e. time-series data.
Specifically, Synthesized utilises deep learning models that can be used to generate two different types of time-series data:
-
Regular time-series, using the
TimeSeriesSynthesizer
, where measurements are recorded at regular intervals, e.g. stock market prices. -
Event-based data, using the
EventSynthesizer
, where each data entry corresponds to a single event and the time-interval between events is not constant, e.g. bank account transactions.
Both the TimeSeriesSynthesizer
and the EventSynthesizer
are capable of modelling time-series data for multiple
variables, from many independent entities. Both synthesizer classes are therefore capable of modelling panel data, i.e.
datasets which contain measurements of multiple “panel members”/unique entities over multiple time-periods. In other
words, they are datasets/tables with multiple indices. Panel data can be presented in either long form or wide form,
as shown in the figure below.
Both synthesizer classes accept tabular data in long format, as this format provides the most compact way to represent multiple time series in the same dataset.
An example of such a dataset, where there are multiple time series for many independent entities, is shown below:
Date | Company | Industry | Value ($) | Volume (M) | DOW |
---|---|---|---|---|---|
02/03/20 |
AAPL |
Tech |
470.89 |
10.5 |
Monday |
03/03/20 |
AAPL |
Tech |
471.03 |
11.2 |
Tuesday |
04/03/20 |
AAPL |
Tech |
471.02 |
9.4 |
Wednesday |
02/03/20 |
AMZN |
Tech |
1981.03 |
2.3 |
Monday |
03/03/20 |
AMZN |
Tech |
1981.10 |
1.9 |
Tuesday |
04/03/20 |
AMZN |
Tech |
1981.11 |
2.2 |
Wednesday |
02/03/20 |
CLDR |
Software |
8.21 |
35.1 |
Monday |
03/03/20 |
CLDR |
Software |
8.21 |
37.7 |
Tuesday |
04/03/20 |
CLDR |
Software |
8.36 |
29.8 |
Wednesday |
02/03/20 |
JNJ |
Healthcare |
140.02 |
11.5 |
Monday |
03/03/20 |
JNJ |
Healthcare |
135.59 |
13.6 |
Tuesday |
04/03/20 |
JNJ |
Healthcare |
143.48 |
10.5 |
Wednesday |
Time-Series Column Specification
Time-series data is generally composed of these types of columns
-
Unique identifiers: This is typically a single column that can be used to differentiate between independent entities, each with their own set of time-series data. In the example above, this would be Company.
-
Timestamps: This is the time at which an event occurred/some variable was measured. This is the Date column in the above dataset.
-
Measurements/event data: This is the data tracked over the given timeperiod, for each timestamp. In the above example the Value ($) and Volume (M) columns are examples of this in the form of continuous data, but this data could be more complex - for instance event data could be strings communicating a patient’s process through a particular treatment plan.
-
Constant features for each ID: These columns are properties of each unique identifier that are constant for the full span of the time series. For example, an individuals date of birth, their bank account number or, in the above example, the Industry a company operates within.
-
Exogenous variables: Exogenous variables are variables that are independent and cannot be affected by the other variables in the dataset. In the particular case of time series data, exogenous variables are parallel time-series that are not directly modelled but apply context. For example, when generating synthetic time-series data for house prices it may be helpful to regard the year as an exogenous variable as the model will learn information regarding market crashes. In the above example DOW could be provided as an exogenous variable.
The TimeSeriesSynthesizer
and EventSynthesizer
use the keywords specified in the below table to reference these columns:
Column Type | Keyword |
---|---|
Unique Identifier |
|
Timestamps |
|
Measurements |
|
Constants |
|
Exogenous variables |
|
For more information on the TimeSeriesSynthesizer
and the EventSynthesizer
see our tutorials.