Configuration
Aquilon DLP is configured through a TOML file that controls all aspects of operation including watch paths, policies, caching, and performance settings.
Configuration File
Location
| Platform | Default Location |
|---|---|
| macOS | /etc/aquilon/config.toml |
| Linux | /etc/aquilon/config.toml |
Initial Setup
After installation, copy the default configuration and customize:
sudo cp /etc/aquilon/config.toml.default /etc/aquilon/config.toml
sudo nano /etc/aquilon/config.toml
Core Configuration
Watch Paths
Define which directories Aquilon DLP monitors for sensitive data. Use %% to recursively watch all subdirectories:
watch_paths = [
"/home/%%",
"/var/data/%%",
"/srv/%%",
"/Users/%%"
]
Path Syntax:
/path/to/dir/%%- Watch directory and all subdirectories recursively/path/to/dir- Watch only the directory itself (no recursion)
Best Practices:
- Include directories where users store documents
- Include shared drives and collaboration folders
- Exclude system directories (already excluded by default)
- Exclude known safe directories like source code repos
Exclusions
Exclude specific paths from monitoring:
watch_paths = ["/home/%%", "/var/data/%%"]
# Exclude specific directories
exclude_paths = [
"/home/*/.cache/%%",
"/home/*/Downloads/%%",
"/var/log/%%"
]
Policy Configuration
Enable Policies
Select which compliance frameworks to enable:
[policies]
enabled_policies = ["gdpr", "ccpa", "hipaa", "pci_dss", "sox", "iso27001"]
Available Policies:
| Policy | Description | Edition |
|---|---|---|
gdpr | EU General Data Protection Regulation | All |
ccpa | California Consumer Privacy Act | All |
hipaa | Health Insurance Portability and Accountability | Enterprise |
pci_dss | Payment Card Industry Data Security Standard | Enterprise |
sox | Sarbanes-Oxley Act | Enterprise |
iso27001 | Information Security Management | Enterprise |
Policy-Specific Settings
Configure individual policy behavior:
[policies.policy_configs.hipaa]
enabled = true
settings = { confidence_threshold = "0.8" }
[policies.policy_configs.pci_dss]
enabled = true
settings = { alert_on_test_data = "false" }
[policies.policy_configs.iso27001]
enabled = true
settings = { confidence_threshold = "0.7", enforce_data_masking = "true" }
Caching Configuration
Aquilon DLP uses a two-tier caching system to minimize redundant scanning:
[cache]
# Enable/disable caching (default: true)
enabled = true
# In-memory cache TTL in seconds (default: 0 = no expiry)
ttl_secs = 3600
# Database scan cache TTL in days (default: 7)
scan_cache_ttl_days = 7
Note: The database location is configured via the top-level
database_pathfield:# Linux: /var/lib/aquilon/aquilon.db # macOS: /var/db/aquilon/aquilon.db database_path = "/var/lib/aquilon/aquilon.db"Platform Note: Default database paths differ by platform. The macOS PKG installer automatically configures the macOS path at
/var/db/aquilon/aquilon.db.
Cache Performance
- Cache hit on clean file: <5ms p99
- Cache hit with alerts: <20ms p95
- Cache vs full scan: 10-100x faster
Cache Behavior
| File State | Cache Behavior |
|---|---|
| Clean (no findings) | Fully cached, subsequent scans skipped |
| Has alerts | Alert details cached as JSON (up to 25 alerts) |
| Modified | Cache entry invalidated, full rescan |
Removable Media
Configure automatic scanning of USB drives and external media:
[removable_media]
# Automatically scan removable media when mounted (default: false)
auto_scan_on_mount = true
Platform-Specific Behavior:
| Platform | Detection Method | Monitored Paths |
|---|---|---|
| macOS | Endpoint Security mount events | /Volumes/* (excluding system) |
| Linux | /proc/self/mounts polling | /media/*, /mnt/*, /run/media/* |
Use Cases:
- Data exfiltration detection
- Compliance monitoring for removable media
- Incident response device scanning
Performance Note: Large external drives (8TB+) with significant data will take time to scan. Consider the resource impact before enabling.
Performance Tuning
Scan Settings
[scan]
# Maximum findings per scanner per file (default: 5)
max_findings_per_scanner = 5
# Maximum file size in MB to scan (default: 40)
max_scan_size_mb = 40
# Maximum recursion depth for nested archives (default: 5)
max_recursion_depth = 5
# Regex size limits
regex_size_limit_mb = 10
regex_dfa_size_limit_mb = 2
# File update cooldown in minutes (default: 30)
file_update_cooldown_mins = 30
# Event coalesce delay in seconds (default: 120)
event_coalesce_delay_secs = 120
Resource Limits
[resource_limits]
# Enable resource limiting (default: false)
enabled = true
# Maximum CPU usage percentage (default: 50.0)
max_cpu_percent = 50.0
# Maximum memory in MB (default: 512)
max_memory_mb = 512
# Maximum disk I/O in MB/s (default: 50.0)
max_disk_io_mbps = 50.0
# Process nice level (default: 10)
nice_level = 10
# Throttle delay between scans in ms (default: 10)
throttle_delay_ms = 10
Worker Configuration
[worker]
# Number of worker threads (default: 0 = auto-detect CPU cores)
num_workers = 4
# Timeout for receiving work items in ms (default: 1000)
recv_timeout_ms = 1000
[work_queue]
# Maximum queue size (default: 10000)
max_queue_size = 10000
# Submit timeout in seconds (default: 5)
submit_timeout_secs = 5
Context Configuration
[context]
# Context window size in bytes for surrounding text capture (default: 200)
# Larger values provide more details but impact performance
window_size = 200
# Enable specific context profiles
# Available profiles:
# - healthcare: Medical terms (patient, diagnosis, HIPAA keywords)
# - payment: Financial transaction terms (credit card, payment, PCI keywords)
# - personal_data: PII identifiers (SSN, address, contact info)
# - employment: HR/payroll terms (employee, salary, W-2)
# - sox_financial: SOX compliance terms (revenue, earnings, 10-K, quarterly)
# - gdpr_phone: Personal vs business phone context (mobile, cell, office)
enabled_profiles = ["healthcare", "payment", "personal_data", "employment", "sox_financial", "gdpr_phone"]
Context Trace
Enable debug tracing for context enrichment decisions. When enabled, detailed JSON logs are emitted showing how each finding’s confidence was adjusted based on surrounding context.
Note: This feature generates verbose output and should only be enabled when debugging enrichment behavior (e.g., investigating false positives or negatives).
[context_trace]
# Enable context enrichment debug tracing (default: false)
# When enabled, emits JSON logs showing enrichment decisions:
# - Original confidence scores
# - Context profiles matched
# - Confidence adjustments applied
# - Final enriched confidence
enabled = false
See Troubleshooting: Debugging Enrichment for usage guidance.
CPU Debugging
Enable detailed performance metrics for troubleshooting:
[cpu_debugging]
# Enable CPU debugging features (default: true)
enabled = true
# Histogram buckets for latency tracking in ms (must be ascending)
histogram_buckets = [10, 50, 100, 500, 1000, 5000, 10000, 30000]
# Threshold for slow file warnings in ms (default: 1000)
slow_file_threshold_ms = 1000
# Maximum slow files to track (default: 10)
max_slow_files = 10
# Enable worker thread status tracking (default: true)
worker_tracking_enabled = true
# Enable performance alerting (default: false)
alerting_enabled = false
# Scanner processing time alert threshold in ms (default: 5000)
scanner_alert_threshold_ms = 5000
# Work queue pending items alert threshold (default: 1000)
queue_alert_threshold = 1000
Database Maintenance
Aquilon DLP includes automatic database maintenance to manage disk usage and keep the local database cache healthy. The local database is designed as a cache—your SIEM should handle long-term retention.
⚠️ Compliance Warning
The default
findings_max_age_daysof 7 days is SHORT for compliance requirements:
- HIPAA: 6 years (2190 days)
- SOX: 7 years (2555 days)
- PCI-DSS: 1 year (365 days)
Ensure your SIEM captures findings for long-term retention before enabling aggressive cleanup. The local database is intended as a cache, not permanent storage.
Basic Configuration
[maintenance]
# Enable background maintenance thread (default: true)
enabled = true
# Interval between maintenance runs in seconds (default: 3600 = 1 hour)
# Minimum: 60 seconds
interval_secs = 3600
Retention Settings
Configure how long data is retained before cleanup:
[maintenance.retention]
# Maximum age for findings before soft-delete (default: 7 days)
# Minimum: 1 day
findings_max_age_days = 7
# Maximum age for scan cache entries (default: 7 days)
# Minimum: 1 day
cache_max_age_days = 7
# Days to wait before hard-deleting soft-deleted findings (default: 1)
# Set to 0 for immediate hard delete (when SIEM has captured data)
hard_delete_grace_days = 1
Vacuum Settings
Configure incremental vacuum to reclaim disk space:
[maintenance.vacuum]
# Pages to reclaim per incremental vacuum run (default: 1000)
# Each page is ~4KB, so 1000 pages = ~4MB per run
# Set to 0 to disable vacuum operations
incremental_pages = 1000
Manual Maintenance
Run maintenance immediately without starting the daemon:
# Run maintenance once and exit
aquilon-dlp --maintenance-now --config /etc/aquilon/config.toml
# Output is JSON with counts and duration:
# {
# "soft_deleted": 42,
# "hard_deleted": 15,
# "cache_evicted": 128,
# "pages_vacuumed": 1000,
# "duration_ms": 234,
# "errors": []
# }
See Operations for additional database management commands.
Logging Configuration
Logging is configured via the RUST_LOG environment variable:
# Set log level
export RUST_LOG=info
# Set per-module log levels
export RUST_LOG=aquilon_dlp=debug,warn
# Available levels: error, warn, info, debug, trace
The application uses structured logging via the tracing crate. Logs are written to stdout/stderr and can be redirected as needed by your init system.
OSQuery Integration
Aquilon DLP exposes alerts via an OSQuery virtual table. Configure behavior:
[osquery]
# Maximum rows returned for alerts table without explicit LIMIT clause
# Prevents memory exhaustion from unbounded queries
# Default: 10000, set to 0 for unlimited (not recommended)
max_alert_rows = 10000
Note: When querying large alert sets, use WHERE clauses to filter results. Unbounded
SELECT * FROM aquilon_dlp_alertsqueries will be truncated at this limit.
Example Configurations
Healthcare Organization (HIPAA Focus)
watch_paths = ["/home/%%", "/var/data/%%", "/srv/%%", "/mnt/medical-records/%%"]
exclude_paths = ["/home/*/.cache/%%"]
database_path = "/var/lib/aquilon/aquilon.db"
[policies]
enabled_policies = ["hipaa", "gdpr", "pci_dss"]
[policies.policy_configs.hipaa]
enabled = true
settings = { confidence_threshold = "0.8" }
[removable_media]
auto_scan_on_mount = true
[cache]
enabled = true
ttl_secs = 3600
scan_cache_ttl_days = 7
[scan]
max_findings_per_scanner = 10
max_scan_size_mb = 100
[resource_limits]
enabled = true
max_cpu_percent = 75.0
max_memory_mb = 1024
Financial Services (PCI DSS/SOX Focus)
watch_paths = ["/home/%%", "/var/data/%%", "/srv/transactions/%%"]
exclude_paths = ["/home/*/Downloads/%%"]
# Linux: /var/lib/aquilon/aquilon.db
# macOS: /var/db/aquilon/aquilon.db
database_path = "/var/lib/aquilon/aquilon.db"
[policies]
enabled_policies = ["pci_dss", "sox", "gdpr", "ccpa"]
[policies.policy_configs.pci_dss]
enabled = true
settings = { alert_on_test_data = "false" }
[policies.policy_configs.sox]
enabled = true
settings = { confidence_threshold = "0.85" }
[removable_media]
auto_scan_on_mount = true
[cache]
enabled = true
ttl_secs = 7200
scan_cache_ttl_days = 14
[worker]
num_workers = 8
[resource_limits]
enabled = true
max_cpu_percent = 60.0
max_memory_mb = 1024
Small Business (Basic Edition)
watch_paths = ["/home/%%", "/var/data/%%"]
exclude_paths = ["/home/*/.cache/%%"]
# Linux: /var/lib/aquilon/aquilon.db
# macOS: /var/db/aquilon/aquilon.db
database_path = "/var/lib/aquilon/aquilon.db"
[policies]
enabled_policies = ["gdpr", "ccpa"]
[cache]
enabled = true
ttl_secs = 3600
scan_cache_ttl_days = 7
[scan]
max_findings_per_scanner = 5
max_scan_size_mb = 40
[worker]
num_workers = 2 # Conservative for small systems
[resource_limits]
enabled = true
max_cpu_percent = 30.0
max_memory_mb = 256
Complete Example Configurations
For complete, production-ready configuration examples, see:
- Basic Edition:
docs/config-examples/aquilon_dlp_config_basic.toml - Enterprise Edition:
docs/config-examples/aquilon_dlp_config_enterprise.toml
Custom Scanners and Policies
Custom scanners and policies are defined directly in the main configuration file using [[scanners]] and [[custom_policies]] sections. See Policy Frameworks for creating custom policies.
Example custom scanner (add to your main config):
[[scanners]]
name = "employee_id"
description = "ACME Corp employee IDs (format: EMP-######)"
regex = "EMP-[0-9]{6}"
redaction_pattern = "EMP-XXXXXX"
base_confidence = 0.85
Validate your configuration:
sudo aquilon-dlp --config /etc/aquilon/aquilon_dlp_config.toml --validate-config
Applying Configuration Changes
After modifying the configuration file, restart the osqueryd service:
macOS:
sudo launchctl unload /Library/LaunchDaemons/io.osquery.agent.plist
sudo launchctl load /Library/LaunchDaemons/io.osquery.agent.plist
Note: OSQuery 5.0.1+ uses
io.osquery.agent.plist. Older versions usecom.facebook.osqueryd.plist.
Linux:
sudo systemctl restart osqueryd
Validating Configuration
Check for configuration errors in the logs:
# macOS
tail -f /var/log/aquilon/aquilon-dlp.log | grep -i error
# Linux
journalctl -u osqueryd -f | grep -i aquilon
Common validation errors:
- Invalid TOML syntax
- Unknown policy names
- Invalid regex patterns in custom scanners
- Missing required fields
Environment Variables
Override configuration settings with environment variables using the AQUILON_DLP_ prefix:
# Worker configuration
export AQUILON_DLP_WORKER_NUM_WORKERS=8
# Resource limits
export AQUILON_DLP_RESOURCE_LIMITS_ENABLED=true
export AQUILON_DLP_RESOURCE_LIMITS_MAX_CPU_PERCENT=75.0
# Cache configuration
export AQUILON_DLP_CACHE_ENABLED=true
export AQUILON_DLP_CACHE_TTL_SECS=3600
# Watch paths (JSON array format)
export AQUILON_DLP_WATCH_PATHS='["/home/%%","/var/data/%%"]'
# Database path
export AQUILON_DLP_DATABASE_PATH=/var/lib/aquilon/aquilon.db
Environment variables override TOML configuration using underscore-separated paths. For example, [resource_limits] max_cpu_percent becomes AQUILON_DLP_RESOURCE_LIMITS_MAX_CPU_PERCENT.
Configuration Reference
For complete configuration reference and schema documentation, see the comments in the default configuration file:
cat /etc/aquilon/config.toml.default