Policy Frameworks
Aquilon DLP includes built-in compliance policy frameworks that automatically classify findings and generate violations according to regulatory requirements. You can also create custom policies using TOML configuration.
Built-in Compliance Frameworks
Overview
| Framework | Standard | Key Controls | Edition |
|---|---|---|---|
| GDPR | EU General Data Protection Regulation | Articles 5, 32, 33 | All |
| CCPA | California Consumer Privacy Act | Sections 1798.100-199 | All |
| HIPAA | Health Insurance Portability and Accountability Act | Sections 164.306, 164.312 | Enterprise |
| PCI DSS | Payment Card Industry Data Security Standard | Requirements 3, 4, 12 | Enterprise |
| SOX | Sarbanes-Oxley Act | Sections 302, 404, 409 | Enterprise |
| ISO 27001 | Information Security Management | Controls A.8.12, A.5.12, A.8.11 | Enterprise |
| CUI | Controlled Unclassified Information | NIST SP 800-171 | Enterprise |
| CMMC | Cybersecurity Maturity Model Certification | DFARS 252.204-7012 | Enterprise |
| FedRAMP | Federal Risk and Authorization Management | NIST SP 800-53 | Enterprise |
| FISMA | Federal Information Security Modernization Act | FIPS 199, NIST SP 800-53 | Enterprise |
GDPR (General Data Protection Regulation)
The GDPR policy detects EU personal data subject to data protection regulations.
Detected Data Types:
- Personal identifiers (names, addresses, phone numbers)
- Email addresses
- National identification numbers
- Financial account data
- Health information
Configuration:
[policies]
enabled_policies = ["gdpr"]
[policies.policy_configs.gdpr]
enabled = true
settings = { confidence_threshold = "0.7", requires_cc_context = "true" }
Context-Aware Credit Card Detection:
By default, GDPR policy requires payment context keywords to detect credit card numbers. This reduces false positives from Luhn-valid numbers appearing in non-payment contexts (JSON logs, test files, etc.).
| Setting | Default | Effect |
|---|---|---|
requires_cc_context | "true" | CC findings require payment context keywords |
Payment context keywords: payment, card, merchant, transaction, billing, invoice
To restore legacy behavior (alert on all Luhn-valid credit cards regardless of context):
settings = { requires_cc_context = "false" }
CCPA (California Consumer Privacy Act)
The CCPA policy detects California consumer personal information.
Detected Data Types:
- Personal identifiers
- Social Security numbers
- Driver’s license numbers
- Financial information
- Geolocation data
- Biometric information
Configuration:
[policies]
enabled_policies = ["ccpa"]
[policies.policy_configs.ccpa]
enabled = true
settings = { confidence_threshold = "0.7" }
HIPAA (Health Insurance Portability and Accountability Act)
Enterprise Edition Only
The HIPAA policy detects Protected Health Information (PHI).
Detected Data Types:
- Medical record numbers
- Health plan beneficiary numbers
- Social Security numbers
- Names with medical details
- Dates of service
- Provider information
Configuration:
[policies]
enabled_policies = ["hipaa"]
[policies.policy_configs.hipaa]
enabled = true
settings = { confidence_threshold = "0.8" }
PCI DSS (Payment Card Industry Data Security Standard)
Enterprise Edition Only
The PCI DSS policy detects payment card data.
Detected Data Types:
- Credit card numbers (validated with Luhn algorithm)
- Card security codes (CVV/CVC)
- Cardholder names
- Expiration dates
- Magnetic stripe data
Configuration:
[policies]
enabled_policies = ["pci_dss"]
[policies.policy_configs.pci_dss]
enabled = true
settings = { alert_on_test_data = "false", requires_cc_context = "true" }
Context-Aware Credit Card Detection:
By default, PCI DSS policy requires payment context keywords to detect credit card numbers. This reduces false positives from Luhn-valid numbers appearing in non-payment contexts (JSON logs, test files, etc.).
| Setting | Default | Effect |
|---|---|---|
requires_cc_context | "true" | CC findings require payment context keywords |
Payment context keywords: payment, card, merchant, transaction, billing, invoice
To restore legacy behavior (alert on all Luhn-valid credit cards regardless of context):
settings = { requires_cc_context = "false" }
SOX (Sarbanes-Oxley Act)
Enterprise Edition Only
The SOX policy detects financial data subject to internal controls.
Detected Data Types:
- Financial statements
- Account numbers
- Transaction identifiers
- Audit information
- Executive communications
Configuration:
[policies]
enabled_policies = ["sox"]
[policies.policy_configs.sox]
enabled = true
settings = { confidence_threshold = "0.85" }
ISO 27001:2022
Enterprise Edition Only
The ISO 27001:2022 policy implements information security management controls, particularly Control A.8.12 (Data leakage prevention) which explicitly mandates DLP capabilities.
Features:
- 4-level data classification: Restricted, Confidential, Internal, Public
- Automatic classification of all 33 scanners by sensitivity
- Configurable controls for data masking, encryption, access
Detected Data Types:
- All categories classified by sensitivity level
- Automatic assignment based on scanner type
Configuration:
[policies]
enabled_policies = ["iso27001"]
[policies.policy_configs.iso27001]
enabled = true
settings = { confidence_threshold = "0.7", enforce_data_masking = "true" }
Enabling Multiple Policies
You can enable multiple policies simultaneously:
[policies]
enabled_policies = ["gdpr", "hipaa", "pci_dss", "sox", "ccpa", "iso27001"]
Each policy evaluates scan findings independently and generates violations according to its regulatory framework. A single file might trigger alerts from multiple policies if it contains different types of sensitive data.
Custom Policies
Aquilon DLP supports custom policies and scanners to detect company-specific data patterns without writing code.
Creating Custom Scanners
Define scanners for proprietary identifiers:
[[scanners]]
name = "employee_id"
regex = "EMP-([0-9]{6})"
redaction_pattern = "EMP-XXXXXX"
base_confidence = 0.85
description = "ACME Corp employee IDs"
context_signals = ["hr", "confidential", "personnel"]
[scanners.confidence_boost]
keywords = ["employee", "personnel", "payroll", "badge"]
boost_amount = 0.10
proximity = 200
Scanner Fields:
| Field | Required | Description |
|---|---|---|
name | Yes | Unique identifier (alphanumeric + underscore) |
regex | Yes | Pattern to match (must be bounded) |
redaction_pattern | Yes | Template for redacting matches |
base_confidence | Yes | Base confidence score (0.0 - 1.0) |
description | No | Human-readable description |
context_signals | No | Keywords for classification |
confidence_boost | No | Boost confidence when keywords found nearby |
Pattern Safety
All regex patterns must be bounded to prevent performance issues:
# SAFE - bounded patterns
[[scanners]]
name = "fixed_length"
regex = "EMP-([0-9]{6})" # Fixed length
Unsafe patterns (unbounded) will be rejected:
\d+,.*,[A-Z]+
Dictionary Scanners
Dictionary scanners detect words and phrases from configurable inline lists using the Aho-Corasick algorithm for efficient O(n) multi-pattern matching.
When to Use Dictionary Scanners
- Detect lists of keywords or terms (medical terms, project codes, product names)
- Match multi-word phrases (e.g., “social security number”, “patient record”)
- Domain-specific vocabulary that doesn’t follow a regex pattern
Basic Configuration
[[dictionary_scanners]]
name = "medical_terms"
words = [
"diagnosis",
"prescription",
"patient record",
"medical history"
]
case_sensitive = false
match_whole_words = true
base_confidence = 0.85
Configuration Fields
| Field | Type | Default | Description |
|---|---|---|---|
name | String | Required | Unique scanner identifier (alphanumeric + underscore) |
words | Array | Required | Words and phrases to detect |
case_sensitive | Boolean | false | Case-sensitive matching |
match_whole_words | Boolean | true | Match only at word boundaries |
base_confidence | Float | 0.8 | Base confidence score (0.0-1.0) |
min_matches | Integer | None | Minimum matches required to report |
match_proximity | Integer | None | Maximum bytes between matches |
description | String | None | Human-readable description |
context_signals | Array | None | Keywords for classification |
Advanced: Match Constraints
Use min_matches and match_proximity to reduce false positives by requiring multiple terms to appear together:
[[dictionary_scanners]]
name = "hipaa_terms"
words = [
"protected health information",
"PHI",
"patient",
"medical record",
"diagnosis",
"treatment"
]
base_confidence = 0.75
min_matches = 2
match_proximity = 500
This configuration only reports findings when at least 2 terms appear within 500 bytes of each other.
Advanced: Confidence Adjustments
Boost or reduce confidence based on nearby keywords:
[[dictionary_scanners]]
name = "project_codenames"
words = ["Project Alpha", "Operation Gamma", "Initiative Delta"]
base_confidence = 0.70
boost_keywords = ["confidential", "restricted", "internal only"]
boost_amount = 0.20
reduce_keywords = ["example", "test", "demo", "sample"]
reduce_amount = 0.30
When “confidential” appears nearby, confidence increases from 0.70 to 0.90. When “test” appears nearby, confidence decreases from 0.70 to 0.40.
Referencing Dictionary Scanners in Policies
Dictionary scanners use the custom: prefix when referenced in policies:
[[custom_policies]]
name = "healthcare_data"
enabled = true
required_scanners = ["custom:medical_terms", "ssn", "email"]
[[custom_policies.rules]]
id = "phi_exposure"
severity = "high"
[custom_policies.rules.composition]
operator = "AND"
proximity = 500
[[custom_policies.rules.composition.conditions]]
scanner = "custom:medical_terms"
min_confidence = 0.70
[[custom_policies.rules.composition.conditions]]
scanner = "ssn"
min_confidence = 0.75
Built-in Validators
Validators provide checksum or format validation for regex matches, significantly reducing false positives by verifying that detected patterns are mathematically valid.
Available Validators
| Validator | Algorithm | Use Case |
|---|---|---|
luhn | Luhn (mod 10) | Credit cards, IMEI numbers |
mod10 | Modulo 10 | Various identifiers with check digits |
mod11 | Modulo 11 | ISBN-10, some national IDs |
iban | IBAN checksum | International Bank Account Numbers |
Using Validators in Custom Scanners
Add a validator to filter out matches that fail checksum validation:
[[scanners]]
name = "company_account"
regex = "ACCT-([0-9]{10})"
redaction_pattern = "ACCT-XXXXXXXXXX"
base_confidence = 0.85
[scanners.validation]
validator = "luhn"
min_confidence = 0.70
invalid_patterns = ["^0+$", "1234567890$"]
Validation Configuration Fields
| Field | Type | Description |
|---|---|---|
validator | String | Checksum validator: luhn, mod10, mod11, iban |
min_confidence | Float | Minimum confidence threshold (0.0-1.0) |
invalid_patterns | Array | Regex patterns to reject (e.g., all zeros) |
Example: Credit Card with Luhn Validation
The built-in credit card scanner already uses Luhn validation internally. For custom patterns that should use Luhn:
[[scanners]]
name = "loyalty_card"
regex = "([0-9]{4})([0-9]{4})([0-9]{4})([0-9]{4})"
redaction_pattern = "XXXX-XXXX-XXXX-XXXX"
base_confidence = 0.80
description = "16-digit loyalty card numbers with Luhn check"
[scanners.validation]
validator = "luhn"
invalid_patterns = ["^0{16}$", "^1{16}$"]
This configuration:
- Matches any 16-digit number
- Validates it passes the Luhn checksum
- Rejects all-zeros and all-ones patterns
- Reports only valid matches
Confidence Scoring
Aquilon DLP uses weighted confidence scoring to reduce false positives. Confidence can be boosted by nearby keywords or reduced by negative indicators.
How Confidence Works
Each scanner assigns a base_confidence score (0.0 to 1.0). This score can be adjusted based on:
- Nearby positive keywords → Boost confidence (more likely a real match)
- Nearby negative keywords → Reduce confidence (likely a false positive)
- Validator success → Maintains or boosts confidence
- Validator failure → Match is discarded
Boosting Confidence with Keywords
When specific keywords appear near a match, boost the confidence:
[[scanners]]
name = "employee_id"
regex = "EMP-([0-9]{6})"
redaction_pattern = "EMP-XXXXXX"
base_confidence = 0.75
[scanners.confidence_boost]
keywords = ["employee", "badge", "payroll", "personnel", "HR"]
boost_amount = 0.15
proximity = 200
If “employee” or “payroll” appears within 200 bytes, confidence increases from 0.75 to 0.90.
Reducing Confidence with Negative Keywords
When negative keywords appear near a match, reduce the confidence to suppress likely false positives:
[[scanners]]
name = "ssn_custom"
regex = "([0-9]{3})-([0-9]{2})-([0-9]{4})"
redaction_pattern = "XXX-XX-XXXX"
base_confidence = 0.80
[scanners.confidence_reduce]
keywords = ["example", "test", "fake", "sample", "xxx", "000-00-0000"]
boost_amount = 0.50
proximity = 100
If “example” or “test” appears within 100 bytes, confidence is reduced by 0.50 (from 0.80 to 0.30).
Combining Boost and Reduce
You can use both boost and reduce on the same scanner:
[[scanners]]
name = "account_number"
regex = "ACC-([0-9]{8})"
redaction_pattern = "ACC-XXXXXXXX"
base_confidence = 0.70
[scanners.confidence_boost]
keywords = ["account", "balance", "statement", "transaction"]
boost_amount = 0.20
proximity = 150
[scanners.confidence_reduce]
keywords = ["example", "test", "demo", "documentation"]
boost_amount = 0.40
proximity = 100
Confidence calculation:
- Base: 0.70
- With “account” nearby: 0.70 + 0.20 = 0.90
- With “test” nearby: 0.70 - 0.40 = 0.30
- With both: Boost and reduce are applied independently based on proximity
Creating Custom Policies
Define policies to enforce business rules:
[[custom_policies]]
name = "employee_data_protection"
description = "Detects employee PII exposure"
enabled = true
required_scanners = ["ssn", "custom:employee_id", "email"]
[[custom_policies.rules]]
id = "employee_pii_leak"
severity = "high"
remediation = "Contact HR compliance - do not share file"
[custom_policies.rules.composition]
operator = "AND"
proximity = 500
[[custom_policies.rules.composition.conditions]]
scanner = "custom:employee_id"
min_confidence = 0.70
[[custom_policies.rules.composition.conditions]]
scanner = "ssn"
min_confidence = 0.75
[custom_policies.rules.exclusions]
file_patterns = ["*/hr/authorized/*", "*/payroll/approved/*"]
Rule Types
Composition Rules (AND/OR Logic)
Alert when multiple data types appear together:
[custom_policies.rules.composition]
operator = "AND" # All conditions must match
proximity = 500 # Within 500 characters
[[custom_policies.rules.composition.conditions]]
scanner = "ssn"
min_confidence = 0.75
[[custom_policies.rules.composition.conditions]]
scanner = "email"
min_confidence = 0.70
Threshold Rules (Count-Based)
Alert when count exceeds threshold (bulk export detection):
[custom_policies.rules.threshold]
scanner = "custom:employee_id"
operator = "greater_equal"
count = 10
Operators: >, >=, <, <=, ==
Context Rules (Exclusions)
Control when rules fire based on details:
[custom_policies.rules.context]
requires_any = ["external", "public", "shared"]
[custom_policies.rules.exclusions]
file_patterns = ["*/hr/authorized/*"]
requires_context_signals = ["approved", "authorized"]
Scanner References
When referencing scanners in policies:
- Built-in scanners: Use direct name (
ssn,email,cc) - Custom scanners: Use
custom:prefix (custom:employee_id)
# Scanner references in required_scanners
required_scanners = [
"ssn", # Built-in
"email", # Built-in
"custom:employee_id", # Custom
"custom:project_code" # Custom
]
Adding Custom Policies
Custom scanners and policies are defined directly in your main configuration file using [[scanners]] and [[custom_policies]] sections:
# In aquilon_dlp_config.toml
[[scanners]]
name = "employee_id"
regex = "EMP-([0-9]{6})"
base_confidence = 0.85
[[custom_policies]]
name = "employee_data_protection"
enabled = true
required_scanners = ["custom:employee_id", "ssn"]
Validate your configuration:
sudo aquilon-dlp --config /etc/aquilon/aquilon_dlp_config.toml --validate-config
Built-in Scanners
Aquilon DLP includes 50+ built-in scanner plugins across multiple categories:
| Category | Scanner Count | Examples |
|---|---|---|
| National IDs | 28 | EU, Americas, Asia-Pacific, Middle East national IDs |
| PII | 8 | SSN, email, phone, address, date of birth |
| Financial | 5 | Credit card, bank account, IBAN, CVV |
| Medical | 6 | MRN, NPI, MBI, medical device IDs |
| Government | 3 | Passport, driver’s license, vehicle identifier |
| Technical | 3 | API keys, database connections, crypto keys |
| Business | 5 | Executive communications, financial figures, audit docs |
All scanners integrate automatically with compliance policies.
National ID Scanners
Aquilon DLP includes comprehensive national ID detection with country-specific checksum validation:
Europe (14 scanners):
| Country | Scanner | Format | Validation |
|---|---|---|---|
| France | france_nir | 15 digits (NIR) | Mod 97 |
| Germany | germany_steurid | 11 digits (Steuer-ID) | Format rules |
| Italy | italy_cf | 16 chars (Codice Fiscale) | Mod 26 |
| Spain | spain_dni | 8-9 chars (DNI/NIE) | Mod 23 |
| Poland | poland_pesel | 11 digits (PESEL) | Weighted mod 10 |
| Netherlands | netherlands_bsn | 9 digits (BSN) | 11-proof |
| Belgium | belgium_nrn | 11 digits (NRN) | Mod 97 |
| UK | uk_nino | 9 chars (NINO) | Format rules |
| Sweden | sweden_personnummer | 10-12 digits | Luhn |
| Norway | norway_fodselsnummer | 11 digits | Dual mod-11 |
| Finland | finland_hetu | 11 chars (HETU) | Mod 31 |
| Portugal | portugal_nif | 9 digits (NIF) | Weighted mod 11 |
| Romania | romania_cnp | 13 digits (CNP) | Weighted mod 11 |
| Czech/Slovakia | czech_rodne_cislo | 9-10 digits | Mod 11 |
Americas (4 scanners):
| Country | Scanner | Format | Validation |
|---|---|---|---|
| Brazil | brazil_cpf | 11 digits (CPF) | Dual mod 11 |
| Canada | canada_sin | 9 digits (SIN) | Luhn |
| Chile | chile_rut | 8-9 chars (RUT) | Mod 11 |
| Argentina | argentina_cuit | 11 digits (CUIT/CUIL) | Weighted mod 11 |
Asia-Pacific (8 scanners):
| Country | Scanner | Format | Validation |
|---|---|---|---|
| Australia | australia_tfn | 9 digits (TFN) | Weighted mod 11 |
| India | india_aadhaar | 12 digits (Aadhaar) | Format rules |
| India | india_pan | 10 chars (PAN) | Format rules |
| South Korea | south_korea_rrn | 13 digits (RRN) | Weighted mod 11 |
| Japan | japan_my_number | 12 digits | Government checksum |
| China | china_resident_id | 18 chars | ISO 7064 MOD 11-2 |
| Taiwan | taiwan_national_id | 10 chars | Weighted mod 10 |
| New Zealand | new_zealand_ird | 8-9 digits (IRD) | Mod 11 |
Middle East & Africa (2 scanners):
| Country | Scanner | Format | Validation |
|---|---|---|---|
| Israel | israel_teudat_zehut | 9 digits | Luhn variant |
| Turkey | turkey_tc_kimlik | 11 digits (TC Kimlik) | Two-step checksum |
Other Scanners
PII: ssn, email, phone, address, date_of_birth, biometric, facial_photo, ip_address
Financial: credit_card, cvv, bank_account, iban, account_number
Medical: mrn, medical_id, npi, mbi, medical_device, certificate_license
Government: passport, drivers_license, vehicle_identifier
Technical: api_key, crypto, database_connection
Business: business_ip, audit_docs, executive_comms, financial_figures, material_info
Web: web_url
Policy Metadata
Add metadata for compliance tracking:
[[custom_policies]]
name = "employee_data_protection"
enabled = true
required_scanners = ["ssn", "custom:employee_id"]
[custom_policies.metadata]
compliance_framework = "ACME_DATA_PROTECTION_2024"
owner = "hr-compliance@acme.com"
review_date = "2025-01-15"
Severity Levels
Policy violations can have severity levels:
| Severity | Description | Example |
|---|---|---|
critical | Immediate action required | Bulk SSN export |
high | Urgent investigation | PII with contact info |
medium | Review required | Single finding in unexpected location |
low | Informational | Context-appropriate finding |
Example: Complete Custom Configuration
# Custom scanner for employee IDs
[[scanners]]
name = "employee_id"
regex = "EMP-([0-9]{6})"
redaction_pattern = "EMP-XXXXXX"
base_confidence = 0.85
description = "ACME Corp employee IDs"
context_signals = ["hr", "personnel"]
[scanners.confidence_boost]
keywords = ["employee", "badge", "payroll"]
boost_amount = 0.10
proximity = 200
# Custom policy for employee protection
[[custom_policies]]
name = "employee_data_protection"
enabled = true
required_scanners = ["ssn", "custom:employee_id", "email"]
[custom_policies.metadata]
owner = "security@acme.com"
review_date = "2025-06-01"
# Rule 1: Employee ID with SSN
[[custom_policies.rules]]
id = "employee_pii_leak"
severity = "high"
remediation = "Contact HR compliance immediately"
[custom_policies.rules.composition]
operator = "AND"
proximity = 500
[[custom_policies.rules.composition.conditions]]
scanner = "custom:employee_id"
min_confidence = 0.70
[[custom_policies.rules.composition.conditions]]
scanner = "ssn"
min_confidence = 0.75
[custom_policies.rules.exclusions]
file_patterns = ["*/hr/authorized/*"]
# Rule 2: Bulk employee export
[[custom_policies.rules]]
id = "bulk_employee_export"
severity = "critical"
remediation = "Investigate potential data breach"
[custom_policies.rules.threshold]
scanner = "custom:employee_id"
operator = "greater_equal"
count = 50
Troubleshooting Policies
Common Errors
“Unsafe regex pattern” - Pattern is unbounded. Add length limits.
“Reserved policy name” - Cannot use HIPAA, PCI-DSS, GDPR, etc. as custom policy names.
“Unknown scanner” - Check scanner name and custom: prefix.
No Alerts Appearing
- Verify policy is enabled (
enabled = true) - Check confidence thresholds aren’t too high
- Verify rule conditions are met
- Check exclusions aren’t blocking alerts
See the Configuration guide for applying changes and checking logs.