Custom Scanner Integration
This guide covers integrating custom scanners with policies, SIEM systems, and fleet deployments. For creating custom scanners, see Custom Scanners.
Combining Built-in and Custom Scanners
Custom scanners work alongside built-in scanners in policies. The key difference is the naming convention:
- Built-in scanners: Use the scanner name directly (e.g.,
ssn,email,iban) - Custom scanners: Use the
custom:prefix (e.g.,custom:employee_id)
[policies]
enabled_policies = ["data_protection"]
[policies.policy_configs.data_protection]
enabled = true
settings = { confidence_threshold = "0.7" }
For advanced policy composition (AND/OR rules, thresholds), see Policy Frameworks.
Example: GDPR with Custom Identifiers
A common scenario is extending GDPR compliance with organization-specific identifiers. This example combines built-in EU data scanners with custom project codes:
# Custom scanner for internal project codes
[[scanners]]
name = "project_code"
regex = "PROJ-([A-Z]{2})-([0-9]{4})"
redaction_pattern = "PROJ-XX-XXXX"
base_confidence = 0.80
context_signals = ["internal", "confidential"]
[scanners.confidence_boost]
keywords = ["confidential", "restricted", "internal"]
boost_amount = 0.15
proximity = 150
[scanners.confidence_reduce]
keywords = ["example", "template", "documentation"]
boost_amount = 0.30
proximity = 100
[policies]
enabled_policies = ["gdpr_extended"]
[policies.policy_configs.gdpr_extended]
enabled = true
settings = { confidence_threshold = "0.7" }
For complete GDPR scanner mappings and compliance guidance, see GDPR Compliance.
Reducing False Positives from Test Files
Development environments often contain test fixtures with fake sensitive data. Use these strategies to reduce false positives.
Path-Based Exclusions
Exclude entire directories from scanning using global exclude_paths:
watch_paths = ["/home/%%", "/var/data/%%"]
exclude_paths = [
# Test directories
"/home/*/projects/*/tests/%%",
"/home/*/projects/*/test/%%",
"/home/*/projects/*/__tests__/%%",
# Test fixtures and mock data
"/home/*/projects/*/fixtures/%%",
"/home/*/projects/*/mock-data/%%",
"/home/*/projects/*/testdata/%%",
# Build artifacts
"/home/*/projects/*/node_modules/%%",
"/home/*/projects/*/target/%%",
"/home/*/projects/*/.git/%%"
]
Keyword-Based Confidence Reduction
For files that can’t be excluded by path, use confidence_reduce to lower confidence when test-related keywords appear nearby:
[[scanners]]
name = "customer_id"
regex = "CUST-([0-9]{8})"
redaction_pattern = "CUST-XXXXXXXX"
base_confidence = 0.80
[scanners.confidence_reduce]
keywords = [
# Test indicators
"test", "spec", "mock", "fake", "dummy",
# Documentation indicators
"example", "sample", "demo", "placeholder",
# Development indicators
"fixture", "seed", "factory"
]
boost_amount = 0.50
proximity = 100
With base_confidence = 0.80 and boost_amount = 0.50, matches near test keywords drop to 0.30 confidence, which typically falls below policy thresholds.
For detailed confidence tuning patterns, see Custom Scanners - Confidence Tuning. For global configuration options, see Configuration.
SIEM Integration
Custom scanner findings flow to your SIEM through the OSQuery aquilon_dlp_alerts table.
How context_signals Flow to Alerts
The context_signals you define on custom scanners appear in alert metadata:
[[scanners]]
name = "patient_id"
regex = "PAT-([0-9]{8})"
redaction_pattern = "PAT-XXXXXXXX"
base_confidence = 0.85
context_signals = ["healthcare", "phi", "hipaa"] # These flow to alerts
Key Alert Fields for Custom Scanners
Query custom scanner alerts via OSQuery:
SELECT
timestamp,
path,
scanner,
confidence,
policy,
severity,
context
FROM aquilon_dlp_alerts
WHERE scanner LIKE 'custom:%'
ORDER BY timestamp DESC
LIMIT 100;
The context JSON field contains context_signals for SIEM filtering and routing.
Splunk Integration Example
Schedule OSQuery to export alerts, then query in Splunk:
index=osquery sourcetype=osquery:results name=aquilon_dlp_alerts
| spath input=columns.context
| search context_signals="*healthcare*"
| stats count by scanner, severity, policy
For complete SIEM integration including Elastic Stack, see Monitoring - SIEM Integration. For the full alert schema, see API Integration.
Fleet Deployment
Deploy custom scanner configurations across your fleet using MDM or configuration management tools.
Centralized Configuration
- Create a base configuration with your custom scanners and policies
- Deploy via MDM (Jamf, Intune, Kandji) to managed devices
- Verify deployment using OSQuery fleet queries
Example verification query to confirm custom scanners are active:
SELECT
name,
version,
status
FROM aquilon_dlp_status
WHERE status = 'running';
Deployment Resources
- MDM deployment guide: MDM Deployment - PPPC profiles, staged rollout
- Enterprise patterns: Enterprise Deployment - pilot groups, success metrics
- macOS requirements: macOS Installation - Full Disk Access, entitlements
Performance Considerations
Custom scanners add minimal overhead, but keep these guidelines in mind for large fleets:
Scanner Count
- 10-20 custom scanners: Negligible performance impact
- 20-50 custom scanners: Monitor scan latency metrics
- 50+ custom scanners: Consider splitting into multiple policies by use case
Proximity Search Tuning
Large proximity values in confidence_boost/confidence_reduce increase memory usage per scan:
[scanners.confidence_boost]
keywords = ["confidential"]
boost_amount = 0.20
proximity = 100 # Recommended: 100-200 bytes
# proximity = 1000 # Avoid: increases memory per match
Monitoring Performance
Track scanner performance via Prometheus metrics:
aquilon_scan_duration_seconds- Per-file scan timeaquilon_scanner_matches_total- Matches by scanner nameaquilon_queue_depth- Work queue backlog
For metrics setup, see Monitoring.