Event Based Data
Event based, sometimes also known as irregular, time-series data is when there is not a constant time-interval between events or measurements. Event based data could be, for example, bank transactions for many users over a given time period:
import pandas as pd
df = pd.read_csv("simple_fraud.csv")
df
customer age gender merchant amount fraud date
0 C1021896897 46 - 55 MALE M17379832 559.85 1 2022-03-10 00:00:00
1 C1021896897 46 - 55 MALE M480139044 138.93 0 2022-03-10 13:18:10
2 C1021896897 46 - 55 MALE M855959430 187.65 1 2022-03-11 02:36:20
3 C1021896897 46 - 55 MALE M547558035 119.71 0 2022-03-11 15:54:30
4 C1021896897 46 - 55 MALE M480139044 1510.34 1 2022-03-12 05:12:41
... ... ... ... ... ... ... ...
7035 C989321907 46 - 55 MALE M85975013 28.11 0 2022-04-12 19:28:27
7036 C989321907 46 - 55 MALE M1823072687 37.82 0 2022-05-03 07:40:48
7037 C989321907 46 - 55 MALE M1823072687 3.26 0 2022-05-04 23:35:18
7038 C989321907 46 - 55 MALE M1823072687 25.60 0 2022-06-03 21:56:34
7039 C989321907 46 - 55 MALE M85975013 75.15 0 2022-06-14 10:41:49
[7040 rows x 7 columns]
The above dataset shows transactions completed by unique customers, including information regarding the transaction amount, the merchant involved in the transaction and whether the transaction was fraudulent or not.
The below plots shows transaction events over time for a particular customer and is flagged as fraudulent or not.

Similarly to the TimeSeriesSynthesizer
we configure the model to train over a maximum number of time steps,
corresponding to the maximum number of transactions for a given customer:
from synthesized import DeepStateConfig
config = DeepStateConfig()
value_counts = df["customer"].value_counts()
config.max_time_steps = max(value_counts)
We instantiate the EventSynthesizer
using the DeepStateConfig
and providing the specification for the columns
from synthesized import EventSynthesizer
synth = EventSynthesizer(
df,
id_idx="customer",
time_idx="date",
event_cols=["merchant", "amount", "fraud"],
const_cols=["gender", "age"]
config=config
)
synth.learn()
Synthesising data is then possible through nearly the same process as generating regular time-series data, with some small differences regarding the arguments that should be specified:
-
n
: number of new time-steps to synthesize -
df_exogenous
: Optional exogenous variables linked to the time-series. Must have the same number of rows asn
. -
id
: This optional argument can be used to specify the unique ID of the sequence. If provided, it must correspond to an ID in the raw dataset used during training. If this argument is not specified then a random ID is sampled and time-series data is generated. -
df_const
: Constant values linked to the givenid
. Note that theEventSynthesizer
considers the initial timestamp for a given unique identity as a constant, referring to it asf"{time_idx
}_0"`. The remaining elements of the DataFrame should be those provided inconst_cols
on instantiation. Ifid
is provided then this argument should also be specified.
from random import randint
n = randint(min(value_counts), max(value_counts))
df_synth = synth.synthesize(n=n)

Since we haven’t specified the id
, we have generated an entirely new customer with a particular transaction history.
Note that the synthetic data contains fraudulent transactions, similarly to the original data.