Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Policy Frameworks

Aquilon DLP includes built-in compliance policy frameworks that automatically classify findings and generate violations according to regulatory requirements. You can also create custom policies using TOML configuration.

Built-in Compliance Frameworks

Overview

FrameworkStandardKey ControlsEdition
GDPREU General Data Protection RegulationArticles 5, 32, 33All
CCPACalifornia Consumer Privacy ActSections 1798.100-199All
HIPAAHealth Insurance Portability and Accountability ActSections 164.306, 164.312Enterprise
PCI DSSPayment Card Industry Data Security StandardRequirements 3, 4, 12Enterprise
SOXSarbanes-Oxley ActSections 302, 404, 409Enterprise
ISO 27001Information Security ManagementControls A.8.12, A.5.12, A.8.11Enterprise
CUIControlled Unclassified InformationNIST SP 800-171Enterprise
CMMCCybersecurity Maturity Model CertificationDFARS 252.204-7012Enterprise
FedRAMPFederal Risk and Authorization ManagementNIST SP 800-53Enterprise
FISMAFederal Information Security Modernization ActFIPS 199, NIST SP 800-53Enterprise

GDPR (General Data Protection Regulation)

The GDPR policy detects EU personal data subject to data protection regulations.

Detected Data Types:

  • Personal identifiers (names, addresses, phone numbers)
  • Email addresses
  • National identification numbers
  • Financial account data
  • Health information

Configuration:

[policies]
enabled_policies = ["gdpr"]

[policies.policy_configs.gdpr]
enabled = true
settings = { confidence_threshold = "0.7", requires_cc_context = "true" }

Context-Aware Credit Card Detection:

By default, GDPR policy requires payment context keywords to detect credit card numbers. This reduces false positives from Luhn-valid numbers appearing in non-payment contexts (JSON logs, test files, etc.).

SettingDefaultEffect
requires_cc_context"true"CC findings require payment context keywords

Payment context keywords: payment, card, merchant, transaction, billing, invoice

To restore legacy behavior (alert on all Luhn-valid credit cards regardless of context):

settings = { requires_cc_context = "false" }

CCPA (California Consumer Privacy Act)

The CCPA policy detects California consumer personal information.

Detected Data Types:

  • Personal identifiers
  • Social Security numbers
  • Driver’s license numbers
  • Financial information
  • Geolocation data
  • Biometric information

Configuration:

[policies]
enabled_policies = ["ccpa"]

[policies.policy_configs.ccpa]
enabled = true
settings = { confidence_threshold = "0.7" }

HIPAA (Health Insurance Portability and Accountability Act)

Enterprise Edition Only

The HIPAA policy detects Protected Health Information (PHI).

Detected Data Types:

  • Medical record numbers
  • Health plan beneficiary numbers
  • Social Security numbers
  • Names with medical details
  • Dates of service
  • Provider information

Configuration:

[policies]
enabled_policies = ["hipaa"]

[policies.policy_configs.hipaa]
enabled = true
settings = { confidence_threshold = "0.8" }

PCI DSS (Payment Card Industry Data Security Standard)

Enterprise Edition Only

The PCI DSS policy detects payment card data.

Detected Data Types:

  • Credit card numbers (validated with Luhn algorithm)
  • Card security codes (CVV/CVC)
  • Cardholder names
  • Expiration dates
  • Magnetic stripe data

Configuration:

[policies]
enabled_policies = ["pci_dss"]

[policies.policy_configs.pci_dss]
enabled = true
settings = { alert_on_test_data = "false", requires_cc_context = "true" }

Context-Aware Credit Card Detection:

By default, PCI DSS policy requires payment context keywords to detect credit card numbers. This reduces false positives from Luhn-valid numbers appearing in non-payment contexts (JSON logs, test files, etc.).

SettingDefaultEffect
requires_cc_context"true"CC findings require payment context keywords

Payment context keywords: payment, card, merchant, transaction, billing, invoice

To restore legacy behavior (alert on all Luhn-valid credit cards regardless of context):

settings = { requires_cc_context = "false" }

SOX (Sarbanes-Oxley Act)

Enterprise Edition Only

The SOX policy detects financial data subject to internal controls.

Detected Data Types:

  • Financial statements
  • Account numbers
  • Transaction identifiers
  • Audit information
  • Executive communications

Configuration:

[policies]
enabled_policies = ["sox"]

[policies.policy_configs.sox]
enabled = true
settings = { confidence_threshold = "0.85" }

ISO 27001:2022

Enterprise Edition Only

The ISO 27001:2022 policy implements information security management controls, particularly Control A.8.12 (Data leakage prevention) which explicitly mandates DLP capabilities.

Features:

  • 4-level data classification: Restricted, Confidential, Internal, Public
  • Automatic classification of all 33 scanners by sensitivity
  • Configurable controls for data masking, encryption, access

Detected Data Types:

  • All categories classified by sensitivity level
  • Automatic assignment based on scanner type

Configuration:

[policies]
enabled_policies = ["iso27001"]

[policies.policy_configs.iso27001]
enabled = true
settings = { confidence_threshold = "0.7", enforce_data_masking = "true" }

Enabling Multiple Policies

You can enable multiple policies simultaneously:

[policies]
enabled_policies = ["gdpr", "hipaa", "pci_dss", "sox", "ccpa", "iso27001"]

Each policy evaluates scan findings independently and generates violations according to its regulatory framework. A single file might trigger alerts from multiple policies if it contains different types of sensitive data.

Custom Policies

Aquilon DLP supports custom policies and scanners to detect company-specific data patterns without writing code.

Creating Custom Scanners

Define scanners for proprietary identifiers:

[[scanners]]
name = "employee_id"
regex = "EMP-([0-9]{6})"
redaction_pattern = "EMP-XXXXXX"
base_confidence = 0.85
description = "ACME Corp employee IDs"
context_signals = ["hr", "confidential", "personnel"]

[scanners.confidence_boost]
keywords = ["employee", "personnel", "payroll", "badge"]
boost_amount = 0.10
proximity = 200

Scanner Fields:

FieldRequiredDescription
nameYesUnique identifier (alphanumeric + underscore)
regexYesPattern to match (must be bounded)
redaction_patternYesTemplate for redacting matches
base_confidenceYesBase confidence score (0.0 - 1.0)
descriptionNoHuman-readable description
context_signalsNoKeywords for classification
confidence_boostNoBoost confidence when keywords found nearby

Pattern Safety

All regex patterns must be bounded to prevent performance issues:

# SAFE - bounded patterns
[[scanners]]
name = "fixed_length"
regex = "EMP-([0-9]{6})"           # Fixed length

Unsafe patterns (unbounded) will be rejected: \d+, .*, [A-Z]+

Dictionary Scanners

Dictionary scanners detect words and phrases from configurable inline lists using the Aho-Corasick algorithm for efficient O(n) multi-pattern matching.

When to Use Dictionary Scanners

  • Detect lists of keywords or terms (medical terms, project codes, product names)
  • Match multi-word phrases (e.g., “social security number”, “patient record”)
  • Domain-specific vocabulary that doesn’t follow a regex pattern

Basic Configuration

[[dictionary_scanners]]
name = "medical_terms"
words = [
    "diagnosis",
    "prescription",
    "patient record",
    "medical history"
]
case_sensitive = false
match_whole_words = true
base_confidence = 0.85

Configuration Fields

FieldTypeDefaultDescription
nameStringRequiredUnique scanner identifier (alphanumeric + underscore)
wordsArrayRequiredWords and phrases to detect
case_sensitiveBooleanfalseCase-sensitive matching
match_whole_wordsBooleantrueMatch only at word boundaries
base_confidenceFloat0.8Base confidence score (0.0-1.0)
min_matchesIntegerNoneMinimum matches required to report
match_proximityIntegerNoneMaximum bytes between matches
descriptionStringNoneHuman-readable description
context_signalsArrayNoneKeywords for classification

Advanced: Match Constraints

Use min_matches and match_proximity to reduce false positives by requiring multiple terms to appear together:

[[dictionary_scanners]]
name = "hipaa_terms"
words = [
    "protected health information",
    "PHI",
    "patient",
    "medical record",
    "diagnosis",
    "treatment"
]
base_confidence = 0.75
min_matches = 2
match_proximity = 500

This configuration only reports findings when at least 2 terms appear within 500 bytes of each other.

Advanced: Confidence Adjustments

Boost or reduce confidence based on nearby keywords:

[[dictionary_scanners]]
name = "project_codenames"
words = ["Project Alpha", "Operation Gamma", "Initiative Delta"]
base_confidence = 0.70
boost_keywords = ["confidential", "restricted", "internal only"]
boost_amount = 0.20
reduce_keywords = ["example", "test", "demo", "sample"]
reduce_amount = 0.30

When “confidential” appears nearby, confidence increases from 0.70 to 0.90. When “test” appears nearby, confidence decreases from 0.70 to 0.40.

Referencing Dictionary Scanners in Policies

Dictionary scanners use the custom: prefix when referenced in policies:

[[custom_policies]]
name = "healthcare_data"
enabled = true
required_scanners = ["custom:medical_terms", "ssn", "email"]

[[custom_policies.rules]]
id = "phi_exposure"
severity = "high"

[custom_policies.rules.composition]
operator = "AND"
proximity = 500

[[custom_policies.rules.composition.conditions]]
scanner = "custom:medical_terms"
min_confidence = 0.70

[[custom_policies.rules.composition.conditions]]
scanner = "ssn"
min_confidence = 0.75

Built-in Validators

Validators provide checksum or format validation for regex matches, significantly reducing false positives by verifying that detected patterns are mathematically valid.

Available Validators

ValidatorAlgorithmUse Case
luhnLuhn (mod 10)Credit cards, IMEI numbers
mod10Modulo 10Various identifiers with check digits
mod11Modulo 11ISBN-10, some national IDs
ibanIBAN checksumInternational Bank Account Numbers

Using Validators in Custom Scanners

Add a validator to filter out matches that fail checksum validation:

[[scanners]]
name = "company_account"
regex = "ACCT-([0-9]{10})"
redaction_pattern = "ACCT-XXXXXXXXXX"
base_confidence = 0.85

[scanners.validation]
validator = "luhn"
min_confidence = 0.70
invalid_patterns = ["^0+$", "1234567890$"]

Validation Configuration Fields

FieldTypeDescription
validatorStringChecksum validator: luhn, mod10, mod11, iban
min_confidenceFloatMinimum confidence threshold (0.0-1.0)
invalid_patternsArrayRegex patterns to reject (e.g., all zeros)

Example: Credit Card with Luhn Validation

The built-in credit card scanner already uses Luhn validation internally. For custom patterns that should use Luhn:

[[scanners]]
name = "loyalty_card"
regex = "([0-9]{4})([0-9]{4})([0-9]{4})([0-9]{4})"
redaction_pattern = "XXXX-XXXX-XXXX-XXXX"
base_confidence = 0.80
description = "16-digit loyalty card numbers with Luhn check"

[scanners.validation]
validator = "luhn"
invalid_patterns = ["^0{16}$", "^1{16}$"]

This configuration:

  1. Matches any 16-digit number
  2. Validates it passes the Luhn checksum
  3. Rejects all-zeros and all-ones patterns
  4. Reports only valid matches

Confidence Scoring

Aquilon DLP uses weighted confidence scoring to reduce false positives. Confidence can be boosted by nearby keywords or reduced by negative indicators.

How Confidence Works

Each scanner assigns a base_confidence score (0.0 to 1.0). This score can be adjusted based on:

  • Nearby positive keywords → Boost confidence (more likely a real match)
  • Nearby negative keywords → Reduce confidence (likely a false positive)
  • Validator success → Maintains or boosts confidence
  • Validator failure → Match is discarded

Boosting Confidence with Keywords

When specific keywords appear near a match, boost the confidence:

[[scanners]]
name = "employee_id"
regex = "EMP-([0-9]{6})"
redaction_pattern = "EMP-XXXXXX"
base_confidence = 0.75

[scanners.confidence_boost]
keywords = ["employee", "badge", "payroll", "personnel", "HR"]
boost_amount = 0.15
proximity = 200

If “employee” or “payroll” appears within 200 bytes, confidence increases from 0.75 to 0.90.

Reducing Confidence with Negative Keywords

When negative keywords appear near a match, reduce the confidence to suppress likely false positives:

[[scanners]]
name = "ssn_custom"
regex = "([0-9]{3})-([0-9]{2})-([0-9]{4})"
redaction_pattern = "XXX-XX-XXXX"
base_confidence = 0.80

[scanners.confidence_reduce]
keywords = ["example", "test", "fake", "sample", "xxx", "000-00-0000"]
boost_amount = 0.50
proximity = 100

If “example” or “test” appears within 100 bytes, confidence is reduced by 0.50 (from 0.80 to 0.30).

Combining Boost and Reduce

You can use both boost and reduce on the same scanner:

[[scanners]]
name = "account_number"
regex = "ACC-([0-9]{8})"
redaction_pattern = "ACC-XXXXXXXX"
base_confidence = 0.70

[scanners.confidence_boost]
keywords = ["account", "balance", "statement", "transaction"]
boost_amount = 0.20
proximity = 150

[scanners.confidence_reduce]
keywords = ["example", "test", "demo", "documentation"]
boost_amount = 0.40
proximity = 100

Confidence calculation:

  • Base: 0.70
  • With “account” nearby: 0.70 + 0.20 = 0.90
  • With “test” nearby: 0.70 - 0.40 = 0.30
  • With both: Boost and reduce are applied independently based on proximity

Creating Custom Policies

Define policies to enforce business rules:

[[custom_policies]]
name = "employee_data_protection"
description = "Detects employee PII exposure"
enabled = true
required_scanners = ["ssn", "custom:employee_id", "email"]

[[custom_policies.rules]]
id = "employee_pii_leak"
severity = "high"
remediation = "Contact HR compliance - do not share file"

[custom_policies.rules.composition]
operator = "AND"
proximity = 500

[[custom_policies.rules.composition.conditions]]
scanner = "custom:employee_id"
min_confidence = 0.70

[[custom_policies.rules.composition.conditions]]
scanner = "ssn"
min_confidence = 0.75

[custom_policies.rules.exclusions]
file_patterns = ["*/hr/authorized/*", "*/payroll/approved/*"]

Rule Types

Composition Rules (AND/OR Logic)

Alert when multiple data types appear together:

[custom_policies.rules.composition]
operator = "AND"              # All conditions must match
proximity = 500               # Within 500 characters

[[custom_policies.rules.composition.conditions]]
scanner = "ssn"
min_confidence = 0.75

[[custom_policies.rules.composition.conditions]]
scanner = "email"
min_confidence = 0.70

Threshold Rules (Count-Based)

Alert when count exceeds threshold (bulk export detection):

[custom_policies.rules.threshold]
scanner = "custom:employee_id"
operator = "greater_equal"
count = 10

Operators: >, >=, <, <=, ==

Context Rules (Exclusions)

Control when rules fire based on details:

[custom_policies.rules.context]
requires_any = ["external", "public", "shared"]

[custom_policies.rules.exclusions]
file_patterns = ["*/hr/authorized/*"]
requires_context_signals = ["approved", "authorized"]

Scanner References

When referencing scanners in policies:

  • Built-in scanners: Use direct name (ssn, email, cc)
  • Custom scanners: Use custom: prefix (custom:employee_id)
# Scanner references in required_scanners
required_scanners = [
    "ssn",                      # Built-in
    "email",                    # Built-in
    "custom:employee_id",       # Custom
    "custom:project_code"       # Custom
]

Adding Custom Policies

Custom scanners and policies are defined directly in your main configuration file using [[scanners]] and [[custom_policies]] sections:

# In aquilon_dlp_config.toml
[[scanners]]
name = "employee_id"
regex = "EMP-([0-9]{6})"
base_confidence = 0.85

[[custom_policies]]
name = "employee_data_protection"
enabled = true
required_scanners = ["custom:employee_id", "ssn"]

Validate your configuration:

sudo aquilon-dlp --config /etc/aquilon/aquilon_dlp_config.toml --validate-config

Built-in Scanners

Aquilon DLP includes 50+ built-in scanner plugins across multiple categories:

CategoryScanner CountExamples
National IDs28EU, Americas, Asia-Pacific, Middle East national IDs
PII8SSN, email, phone, address, date of birth
Financial5Credit card, bank account, IBAN, CVV
Medical6MRN, NPI, MBI, medical device IDs
Government3Passport, driver’s license, vehicle identifier
Technical3API keys, database connections, crypto keys
Business5Executive communications, financial figures, audit docs

All scanners integrate automatically with compliance policies.

National ID Scanners

Aquilon DLP includes comprehensive national ID detection with country-specific checksum validation:

Europe (14 scanners):

CountryScannerFormatValidation
Francefrance_nir15 digits (NIR)Mod 97
Germanygermany_steurid11 digits (Steuer-ID)Format rules
Italyitaly_cf16 chars (Codice Fiscale)Mod 26
Spainspain_dni8-9 chars (DNI/NIE)Mod 23
Polandpoland_pesel11 digits (PESEL)Weighted mod 10
Netherlandsnetherlands_bsn9 digits (BSN)11-proof
Belgiumbelgium_nrn11 digits (NRN)Mod 97
UKuk_nino9 chars (NINO)Format rules
Swedensweden_personnummer10-12 digitsLuhn
Norwaynorway_fodselsnummer11 digitsDual mod-11
Finlandfinland_hetu11 chars (HETU)Mod 31
Portugalportugal_nif9 digits (NIF)Weighted mod 11
Romaniaromania_cnp13 digits (CNP)Weighted mod 11
Czech/Slovakiaczech_rodne_cislo9-10 digitsMod 11

Americas (4 scanners):

CountryScannerFormatValidation
Brazilbrazil_cpf11 digits (CPF)Dual mod 11
Canadacanada_sin9 digits (SIN)Luhn
Chilechile_rut8-9 chars (RUT)Mod 11
Argentinaargentina_cuit11 digits (CUIT/CUIL)Weighted mod 11

Asia-Pacific (8 scanners):

CountryScannerFormatValidation
Australiaaustralia_tfn9 digits (TFN)Weighted mod 11
Indiaindia_aadhaar12 digits (Aadhaar)Format rules
Indiaindia_pan10 chars (PAN)Format rules
South Koreasouth_korea_rrn13 digits (RRN)Weighted mod 11
Japanjapan_my_number12 digitsGovernment checksum
Chinachina_resident_id18 charsISO 7064 MOD 11-2
Taiwantaiwan_national_id10 charsWeighted mod 10
New Zealandnew_zealand_ird8-9 digits (IRD)Mod 11

Middle East & Africa (2 scanners):

CountryScannerFormatValidation
Israelisrael_teudat_zehut9 digitsLuhn variant
Turkeyturkey_tc_kimlik11 digits (TC Kimlik)Two-step checksum

Other Scanners

PII: ssn, email, phone, address, date_of_birth, biometric, facial_photo, ip_address

Financial: credit_card, cvv, bank_account, iban, account_number

Medical: mrn, medical_id, npi, mbi, medical_device, certificate_license

Government: passport, drivers_license, vehicle_identifier

Technical: api_key, crypto, database_connection

Business: business_ip, audit_docs, executive_comms, financial_figures, material_info

Web: web_url

Policy Metadata

Add metadata for compliance tracking:

[[custom_policies]]
name = "employee_data_protection"
enabled = true
required_scanners = ["ssn", "custom:employee_id"]

[custom_policies.metadata]
compliance_framework = "ACME_DATA_PROTECTION_2024"
owner = "hr-compliance@acme.com"
review_date = "2025-01-15"

Severity Levels

Policy violations can have severity levels:

SeverityDescriptionExample
criticalImmediate action requiredBulk SSN export
highUrgent investigationPII with contact info
mediumReview requiredSingle finding in unexpected location
lowInformationalContext-appropriate finding

Example: Complete Custom Configuration

# Custom scanner for employee IDs
[[scanners]]
name = "employee_id"
regex = "EMP-([0-9]{6})"
redaction_pattern = "EMP-XXXXXX"
base_confidence = 0.85
description = "ACME Corp employee IDs"
context_signals = ["hr", "personnel"]

[scanners.confidence_boost]
keywords = ["employee", "badge", "payroll"]
boost_amount = 0.10
proximity = 200

# Custom policy for employee protection
[[custom_policies]]
name = "employee_data_protection"
enabled = true
required_scanners = ["ssn", "custom:employee_id", "email"]

[custom_policies.metadata]
owner = "security@acme.com"
review_date = "2025-06-01"

# Rule 1: Employee ID with SSN
[[custom_policies.rules]]
id = "employee_pii_leak"
severity = "high"
remediation = "Contact HR compliance immediately"

[custom_policies.rules.composition]
operator = "AND"
proximity = 500

[[custom_policies.rules.composition.conditions]]
scanner = "custom:employee_id"
min_confidence = 0.70

[[custom_policies.rules.composition.conditions]]
scanner = "ssn"
min_confidence = 0.75

[custom_policies.rules.exclusions]
file_patterns = ["*/hr/authorized/*"]

# Rule 2: Bulk employee export
[[custom_policies.rules]]
id = "bulk_employee_export"
severity = "critical"
remediation = "Investigate potential data breach"

[custom_policies.rules.threshold]
scanner = "custom:employee_id"
operator = "greater_equal"
count = 50

Troubleshooting Policies

Common Errors

“Unsafe regex pattern” - Pattern is unbounded. Add length limits.

“Reserved policy name” - Cannot use HIPAA, PCI-DSS, GDPR, etc. as custom policy names.

“Unknown scanner” - Check scanner name and custom: prefix.

No Alerts Appearing

  1. Verify policy is enabled (enabled = true)
  2. Check confidence thresholds aren’t too high
  3. Verify rule conditions are met
  4. Check exclusions aren’t blocking alerts

See the Configuration guide for applying changes and checking logs.