PII Detection & Protection

Automatically detect and protect personally identifiable information (PII) using the platform’s built-in PII scanning capabilities.

Overview

The platform supports scanning database columns for personally identifiable information (PII) before masking. The PII scanner uses pattern matching (regex) and validation algorithms (Luhn) to identify sensitive data across multiple categories.

Running PII Scan

You can run PII scanning and apply transformers from the platform UI by following these steps:

  1. Open the workflow you want to scan

  2. Open "Sensitive data scan" from the left menu

  3. Click the "Run scanning" button

  4. Visually verify the list of database columns considered sensitive, displayed after scanning is complete. You can manually override columns to declare them sensitive or non-sensitive by selecting them and clicking the update sensitivity button.

  5. Select all columns and click the "Apply suggested transformers" button; this will update your YAML configuration to include the suggested transformations.

  6. Check the "Configuration" tab to see which transformers were applied

pii scanner

Custom PII Scanning Rules

The platform allows you to define custom PII detection rules to match your organization’s specific requirements. Custom rules use regex patterns to identify sensitive data that may not be covered by the built-in global patterns.

Creating Custom Rules

Organization Level (Administration → Settings):

  1. Navigate to Administration → Settings → PII Rules

    global pii
  2. Click "Create rule"

  3. Select the sensitive data type (Email, SSN, Credit Card, etc.)

  4. Enter a description for your rule

  5. Add regex patterns (one per line)

  6. Save the rule

Project Level (Project Settings):

  1. Open your project settings

    project setting
  2. Navigate to PII Rules

    project pii
  3. Click "Create rule"

  4. Select the sensitive data type

  5. Enter patterns specific to this project

  6. Click "Override for this project"

    project override

Supported Sensitive Data Types

The platform supports custom rules for 28 PII categories:

Identity Data: Email, Phone Number, SSN/National Insurance, Passport Number, Date of Birth, Person Name, Gender, Nationality

Financial Data: Credit Card, IBAN, BIC, Salary, Tax Identification Number (TIN), Legal Entity Identifier (LEI)

Location Data: Address, Postcode, Geolocation, IP Address (IPv4/IPv6), MAC Address, License Plate

Organizational Data: Company Name, Domain Name, Fax Number

Sensitive Categories: Race, Religion, Sexual Orientation

Security Data: Password, GitHub Token

Custom Rule Examples

Custom Email Patterns:

.*@internal\.company\.com
.*\.contractor@.*
.*@subsidiary\.com

Custom Phone Formats:

\d{3}-\d{3}-\d{4}
\(\d{3}\) \d{3}-\d{4}
\+44 \d{4} \d{6}

Custom SSN Variations:

\d{3}-\d{2}-\d{4}
\d{9}
[A-Z]{2}\d{6}[A-Z]

Custom Internal IDs:

EMP-\d{6}
CUST-[A-Z]{3}-\d{5}

Rule Override Behavior

When you create a custom rule for a data type that already has a rule at a broader scope, your custom rule completely replaces the broader rule for that data type.

Example:

Global EMAIL rule:     .*@.*                     (matches all emails)
Organization rule:     .*@company\.com           (overrides global)
Project rule:          .*@project\.company\.com  (overrides organization)

In this example, the project will ONLY scan for *@project.company.com patterns - the global and organization patterns are not applied.

API Access

Custom rules can also be managed via REST API:

Organization Level:

# Get effective rules
GET /api/v1/organizations/pii-detection-rules/effective

# Create/update rules
PUT /api/v1/organizations/pii-detection-rules
Content-Type: application/json

[
  {
    "name": "Custom Email Patterns",
    "sensitive_data_type": "EMAIL",
    "patterns": [
      ".*@internal\\.company\\.com"
    ]
  }
]

# Delete rule
DELETE /api/v1/organizations/pii-detection-rules?sensitiveDataType=EMAIL

Project Level:

# Get project-specific rules
GET /api/v1/projects/{projectId}/pii-detection-rules/effective

# Create/update project rules
PUT /api/v1/projects/{projectId}/pii-detection-rules
Content-Type: application/json

[
  {
    "name": "Project Email Rules",
    "sensitive_data_type": "EMAIL",
    "patterns": [
      ".*@project\\.company\\.com"
    ]
  }
]