Multi-Worker Deployment

Running multiple Governor workers in parallel allows concurrent workflow execution. Each worker independently polls for queued tasks over gRPC and processes them in parallel, increasing throughput for environments with many simultaneous workflows.

Docker Compose

Scaling Workers

The base docker-compose.yml defines a single worker service. Use the docker-compose.multi-worker.yml overlay with --scale to run multiple identical workers in parallel:

docker compose \
  -f docker-compose.yml \
  -f docker-compose.override.yml \
  -f docker-compose.multi-worker.yml \
  up --scale worker=3 --detach

The overlay resets the static host port binding on the worker service. Without it, --scale would fail because all replicas attempt to bind port 8083 on the host simultaneously. Port 8083 remains accessible within the Docker network.

On first start, the overlay also provisions the following automatically for the demo stack:

What How

Second output database (output_source_2)

Created via initdb/create_output_db2.sql on container initialisation

Second output connection in Governor

Seeded via initdb/create_governor_data_multi_worker.sql

Masking workflow pre-wired to second output

Same seed file updates the workflow’s output connection assignment (release demo stack only)

The initdb scripts only run on a fresh volume. If the stack was previously started, output_source_2 and the second Governor connection will not be created automatically. Either recreate volumes with docker compose down -v (this deletes all data) and restart, or create the output database and connection manually via the Governor UI.

Once the stack is running, all workers register in the Governor UI under Workers. The Data source tags column shows which datasource group each worker is assigned to — * means the worker handles any group:

Governor Workers page showing scaled workers with wildcard data source tags

Datasource Assignment

All replicas started with --scale share the same AGENT_DATASOURCES and AGENT_TAGS values. Workers are interchangeable — any replica can claim any task that matches the shared configuration.

--scale is the correct pattern for homogeneous fleets where all workers have identical database access and routing requirements. It does not support per-replica differentiation:

  • All replicas register under the same AGENT_TAGS value and appear identical in the Governor Workers page — only their UUIDs differ

  • Workflows cannot be pinned to a specific replica by tag; any replica competes for tasks matching the shared tag

  • All replicas share the same AGENT_DATASOURCES scope

If workers need different AGENT_TAGS (to route specific workflows to specific workers) or different AGENT_DATASOURCES (to restrict each worker to a datasource group), define named worker services manually instead of using --scale.

To assign a data source to a specific worker group, open the data source in Governor and set the Worker group field to match the worker’s AGENT_DATASOURCES value.

Kubernetes (Helm)

Scaling Workers

Worker replicas are controlled by worker.replicaCount in the Helm values. Set it to the desired number of concurrent workers and apply with helm upgrade:

worker:
  replicaCount: 2
  container:
    config:
      WORKER_DATASOURCES: '*'

Connection settings (WORKER_SERVERHOST, WORKER_SERVERPORT, WORKER_USEPLAINTEXT) are managed by the chart under the default mTLS configuration and should not be set manually. They only take effect when tlsInternal.enabled: false.

helm upgrade governor oci://synthesizedio.jfrog.io/helm/governor \
  -f values.yaml \
  --reuse-values

Each replica registers independently with a unique UUID and polls the governor-api gRPC endpoint every 2 seconds. No additional coordination configuration is required.

Datasource Assignment

Mode Environment variable Effect

Wildcard (default)

WORKER_DATASOURCES: '*'

Worker handles tasks for any datasource group

Group-scoped

WORKER_DATASOURCES: '<group-name>'

Worker only handles tasks whose input connection’s Worker group matches <group-name>

Resource Limits for Database Pods

When running multiple concurrent workflows, unbounded database pod memory can cause OOM errors and connection failures under load. Set explicit limits and cap the buffer pool:

# Database pod spec
env:
  - name: MSSQL_MEMORY_LIMIT_MB   (1)
    value: "2048"
resources:
  requests:
    memory: "2Gi"
    cpu: "1"
  limits:
    memory: "3Gi"
    cpu: "2"
1 Applies to SQL Server — use the equivalent parameter for your database engine.

Output Database Requirement

Regardless of the deployment method, each workflow running concurrently must target a distinct output database. Two workflows writing to the same output database will conflict — the second workflow fails immediately with:

Failed to claim sink. TDK is already running. Sink is already claimed by [<user>] at <timestamp>.

Both output databases can reside on the same database server instance; only the database name must differ.

In Docker Compose, the demo stack handles this automatically — the docker-compose.multi-worker.yml overlay provisions output_source_2 and pre-wires the Masking workflow to it.

In Kubernetes, create a dedicated output database for each concurrent workflow on the target server and assign the corresponding output connection to each workflow via the Governor UI.

After starting the multi-worker stack, the Data sources page in Governor shows all three connections — input, output, and output-2 — ready for use with no manual setup required:

Data sources page showing input

Deployment Comparison

Aspect Docker Compose Kubernetes (Helm)

Environment variable prefix

AGENT_*

WORKER_*

Datasource variable

AGENT_DATASOURCES

WORKER_DATASOURCES

Scaling mechanism

--scale worker=N via docker-compose.multi-worker.yml overlay

worker.replicaCount: N in values

Port conflicts when scaling

Resolved by overlay — resets the static host port binding

No — ClusterIP handles routing

Second output DB setup

Automatic (demo overlay)

Manual via Governor UI