TDK Agents

What are TDK Agents and why should we use them?

Synthesized TDK serves as the core of database processing within Governor. This system interfaces with both input and output databases, executing essential operations for data generation, masking, and subsetting. A TDK Agent is a piece of software that encompasses TDK and can connect to a Governor instance over the Internet. Therefore, the machine hosting the TDK Agent must have access to both the transformed database and Governor. TDK agents visible in Governor can be managed through the Governor UI.

The use of TDK Agents is warranted for the following reasons:

Performance and Horizontal Scaling

TDK processes all the transformed or generated data. To ensure optimal performance, a TDK instance should be positioned as close to the database as possible, ideally by residing on the same machine or at least within the same local network. This minimizes SQL query latency and overall execution time.

For executing multiple workflows on several databases simultaneously, employing multiple instances of TDK Agents is advisable.

Security

There are practical and security reasons for installing TDK in the same network or on the same machine as the database server. It may not always be feasible or safe to allow a remote host Internet access to a database containing sensitive data.

TDK Agents do not transmit any database-contained data to Governor, ensuring that all the sensitive information remains within the Agent and is not passed over the Internet.

Setting up the system

Setting up Governor

  1. To activate Agents in the Governor frontend, users must set the TDK_AGENTS=true environment variable for the Governor frontend container.

  2. To change the available gRPC port on the backend container, users must set the AGENT_GRPC_PORT environment variable for the Governor API container (by default, the port is set to 50055).

Setting up the agent

To deploy a TDK agent locally, use the following Docker command:

docker run \
  -e AGENT_SERVERHOST=[server host] \ (1)
  -e AGENT_SERVERPORT=[server port] \ (2)
  -e AGENT_TAGS=[tags] \ (3)
  -e AGENT_DATASOURCES=[tags] (4)
  eu.gcr.io/synthesized-cloud-275014/testing-suite-agent
  1. Server host is the Governor gRPC host.

  2. Server port is the gRPC port on Governor. The standard SSL port is 443, however, if you set up Governor on a local network without SSL and connect to Governor directly, the port is typically 50055.

  3. Agent tags is the list of tags used to identify the agents.

  4. Data source tags is the list of network zone ids used to identify data sources located nearby the current agent. In Governor, a datasource must have the same network zone id in order for a workflow to be selected for execution on the agent.

Use a plain comma-separated format for listing tags, e.g. tag1,tag2.

Running a workflow in an Agent

  • The Agents section in Governor displays a list of connected agents, their statuses, and tags:

agents
  • Assign the appropriate network zone ID for the available data sources. For an agent to pick up a task, at least one of its data source tags must match the network zone ID of the respective data source.

network zone id
  • In the workflow run confirmation dialog, select the "Use agents" checkbox to execute the workflow within an agent:

use agents