Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Configuration

Aquilon DLP is configured through a TOML file that controls all aspects of operation including watch paths, policies, caching, and performance settings.

Configuration File

Location

PlatformDefault Location
macOS/etc/aquilon/config.toml
Linux/etc/aquilon/config.toml

Initial Setup

After installation, copy the default configuration and customize:

sudo cp /etc/aquilon/config.toml.default /etc/aquilon/config.toml
sudo nano /etc/aquilon/config.toml

Core Configuration

Watch Paths

Define which directories Aquilon DLP monitors for sensitive data. Use %% to recursively watch all subdirectories:

watch_paths = [
    "/home/%%",
    "/var/data/%%",
    "/srv/%%",
    "/Users/%%"
]

Path Syntax:

  • /path/to/dir/%% - Watch directory and all subdirectories recursively
  • /path/to/dir - Watch only the directory itself (no recursion)

Best Practices:

  • Include directories where users store documents
  • Include shared drives and collaboration folders
  • Exclude system directories (already excluded by default)
  • Exclude known safe directories like source code repos

Exclusions

Exclude specific paths from monitoring:

watch_paths = ["/home/%%", "/var/data/%%"]

# Exclude specific directories
exclude_paths = [
    "/home/*/.cache/%%",
    "/home/*/Downloads/%%",
    "/var/log/%%"
]

Policy Configuration

Enable Policies

Select which compliance frameworks to enable:

[policies]
enabled_policies = ["gdpr", "ccpa", "hipaa", "pci_dss", "sox", "iso27001"]

Available Policies:

PolicyDescriptionEdition
gdprEU General Data Protection RegulationAll
ccpaCalifornia Consumer Privacy ActAll
hipaaHealth Insurance Portability and AccountabilityEnterprise
pci_dssPayment Card Industry Data Security StandardEnterprise
soxSarbanes-Oxley ActEnterprise
iso27001Information Security ManagementEnterprise

Policy-Specific Settings

Configure individual policy behavior:

[policies.policy_configs.hipaa]
enabled = true
settings = { confidence_threshold = "0.8" }

[policies.policy_configs.pci_dss]
enabled = true
settings = { alert_on_test_data = "false" }

[policies.policy_configs.iso27001]
enabled = true
settings = { confidence_threshold = "0.7", enforce_data_masking = "true" }

Caching Configuration

Aquilon DLP uses a two-tier caching system to minimize redundant scanning:

[cache]
# Enable/disable caching (default: true)
enabled = true

# In-memory cache TTL in seconds (default: 0 = no expiry)
ttl_secs = 3600

# Database scan cache TTL in days (default: 7)
scan_cache_ttl_days = 7

Note: The database location is configured via the top-level database_path field:

# Linux: /var/lib/aquilon/aquilon.db
# macOS: /var/db/aquilon/aquilon.db
database_path = "/var/lib/aquilon/aquilon.db"

Platform Note: Default database paths differ by platform. The macOS PKG installer automatically configures the macOS path at /var/db/aquilon/aquilon.db.

Cache Performance

  • Cache hit on clean file: <5ms p99
  • Cache hit with alerts: <20ms p95
  • Cache vs full scan: 10-100x faster

Cache Behavior

File StateCache Behavior
Clean (no findings)Fully cached, subsequent scans skipped
Has alertsAlert details cached as JSON (up to 25 alerts)
ModifiedCache entry invalidated, full rescan

Removable Media

Configure automatic scanning of USB drives and external media:

[removable_media]
# Automatically scan removable media when mounted (default: false)
auto_scan_on_mount = true

Platform-Specific Behavior:

PlatformDetection MethodMonitored Paths
macOSEndpoint Security mount events/Volumes/* (excluding system)
Linux/proc/self/mounts polling/media/*, /mnt/*, /run/media/*

Use Cases:

  • Data exfiltration detection
  • Compliance monitoring for removable media
  • Incident response device scanning

Performance Note: Large external drives (8TB+) with significant data will take time to scan. Consider the resource impact before enabling.

Performance Tuning

Scan Settings

[scan]
# Maximum findings per scanner per file (default: 5)
max_findings_per_scanner = 5

# Maximum file size in MB to scan (default: 40)
max_scan_size_mb = 40

# Maximum recursion depth for nested archives (default: 5)
max_recursion_depth = 5

# Regex size limits
regex_size_limit_mb = 10
regex_dfa_size_limit_mb = 2

# File update cooldown in minutes (default: 30)
file_update_cooldown_mins = 30

# Event coalesce delay in seconds (default: 120)
event_coalesce_delay_secs = 120

Resource Limits

[resource_limits]
# Enable resource limiting (default: false)
enabled = true

# Maximum CPU usage percentage (default: 50.0)
max_cpu_percent = 50.0

# Maximum memory in MB (default: 512)
max_memory_mb = 512

# Maximum disk I/O in MB/s (default: 50.0)
max_disk_io_mbps = 50.0

# Process nice level (default: 10)
nice_level = 10

# Throttle delay between scans in ms (default: 10)
throttle_delay_ms = 10

Worker Configuration

[worker]
# Number of worker threads (default: 0 = auto-detect CPU cores)
num_workers = 4

# Timeout for receiving work items in ms (default: 1000)
recv_timeout_ms = 1000

[work_queue]
# Maximum queue size (default: 10000)
max_queue_size = 10000

# Submit timeout in seconds (default: 5)
submit_timeout_secs = 5

Context Configuration

[context]
# Context window size in bytes for surrounding text capture (default: 200)
# Larger values provide more details but impact performance
window_size = 200

# Enable specific context profiles
# Available profiles:
#   - healthcare: Medical terms (patient, diagnosis, HIPAA keywords)
#   - payment: Financial transaction terms (credit card, payment, PCI keywords)
#   - personal_data: PII identifiers (SSN, address, contact info)
#   - employment: HR/payroll terms (employee, salary, W-2)
#   - sox_financial: SOX compliance terms (revenue, earnings, 10-K, quarterly)
#   - gdpr_phone: Personal vs business phone context (mobile, cell, office)
enabled_profiles = ["healthcare", "payment", "personal_data", "employment", "sox_financial", "gdpr_phone"]

Context Trace

Enable debug tracing for context enrichment decisions. When enabled, detailed JSON logs are emitted showing how each finding’s confidence was adjusted based on surrounding context.

Note: This feature generates verbose output and should only be enabled when debugging enrichment behavior (e.g., investigating false positives or negatives).

[context_trace]
# Enable context enrichment debug tracing (default: false)
# When enabled, emits JSON logs showing enrichment decisions:
# - Original confidence scores
# - Context profiles matched
# - Confidence adjustments applied
# - Final enriched confidence
enabled = false

See Troubleshooting: Debugging Enrichment for usage guidance.

CPU Debugging

Enable detailed performance metrics for troubleshooting:

[cpu_debugging]
# Enable CPU debugging features (default: true)
enabled = true

# Histogram buckets for latency tracking in ms (must be ascending)
histogram_buckets = [10, 50, 100, 500, 1000, 5000, 10000, 30000]

# Threshold for slow file warnings in ms (default: 1000)
slow_file_threshold_ms = 1000

# Maximum slow files to track (default: 10)
max_slow_files = 10

# Enable worker thread status tracking (default: true)
worker_tracking_enabled = true

# Enable performance alerting (default: false)
alerting_enabled = false

# Scanner processing time alert threshold in ms (default: 5000)
scanner_alert_threshold_ms = 5000

# Work queue pending items alert threshold (default: 1000)
queue_alert_threshold = 1000

Database Maintenance

Aquilon DLP includes automatic database maintenance to manage disk usage and keep the local database cache healthy. The local database is designed as a cache—your SIEM should handle long-term retention.

⚠️ Compliance Warning

The default findings_max_age_days of 7 days is SHORT for compliance requirements:

  • HIPAA: 6 years (2190 days)
  • SOX: 7 years (2555 days)
  • PCI-DSS: 1 year (365 days)

Ensure your SIEM captures findings for long-term retention before enabling aggressive cleanup. The local database is intended as a cache, not permanent storage.

Basic Configuration

[maintenance]
# Enable background maintenance thread (default: true)
enabled = true

# Interval between maintenance runs in seconds (default: 3600 = 1 hour)
# Minimum: 60 seconds
interval_secs = 3600

Retention Settings

Configure how long data is retained before cleanup:

[maintenance.retention]
# Maximum age for findings before soft-delete (default: 7 days)
# Minimum: 1 day
findings_max_age_days = 7

# Maximum age for scan cache entries (default: 7 days)
# Minimum: 1 day
cache_max_age_days = 7

# Days to wait before hard-deleting soft-deleted findings (default: 1)
# Set to 0 for immediate hard delete (when SIEM has captured data)
hard_delete_grace_days = 1

Vacuum Settings

Configure incremental vacuum to reclaim disk space:

[maintenance.vacuum]
# Pages to reclaim per incremental vacuum run (default: 1000)
# Each page is ~4KB, so 1000 pages = ~4MB per run
# Set to 0 to disable vacuum operations
incremental_pages = 1000

Manual Maintenance

Run maintenance immediately without starting the daemon:

# Run maintenance once and exit
aquilon-dlp --maintenance-now --config /etc/aquilon/config.toml

# Output is JSON with counts and duration:
# {
#   "soft_deleted": 42,
#   "hard_deleted": 15,
#   "cache_evicted": 128,
#   "pages_vacuumed": 1000,
#   "duration_ms": 234,
#   "errors": []
# }

See Operations for additional database management commands.

Logging Configuration

Logging is configured via the RUST_LOG environment variable:

# Set log level
export RUST_LOG=info

# Set per-module log levels
export RUST_LOG=aquilon_dlp=debug,warn

# Available levels: error, warn, info, debug, trace

The application uses structured logging via the tracing crate. Logs are written to stdout/stderr and can be redirected as needed by your init system.

OSQuery Integration

Aquilon DLP exposes alerts via an OSQuery virtual table. Configure behavior:

[osquery]
# Maximum rows returned for alerts table without explicit LIMIT clause
# Prevents memory exhaustion from unbounded queries
# Default: 10000, set to 0 for unlimited (not recommended)
max_alert_rows = 10000

Note: When querying large alert sets, use WHERE clauses to filter results. Unbounded SELECT * FROM aquilon_dlp_alerts queries will be truncated at this limit.

Example Configurations

Healthcare Organization (HIPAA Focus)

watch_paths = ["/home/%%", "/var/data/%%", "/srv/%%", "/mnt/medical-records/%%"]
exclude_paths = ["/home/*/.cache/%%"]
database_path = "/var/lib/aquilon/aquilon.db"

[policies]
enabled_policies = ["hipaa", "gdpr", "pci_dss"]

[policies.policy_configs.hipaa]
enabled = true
settings = { confidence_threshold = "0.8" }

[removable_media]
auto_scan_on_mount = true

[cache]
enabled = true
ttl_secs = 3600
scan_cache_ttl_days = 7

[scan]
max_findings_per_scanner = 10
max_scan_size_mb = 100

[resource_limits]
enabled = true
max_cpu_percent = 75.0
max_memory_mb = 1024

Financial Services (PCI DSS/SOX Focus)

watch_paths = ["/home/%%", "/var/data/%%", "/srv/transactions/%%"]
exclude_paths = ["/home/*/Downloads/%%"]
# Linux: /var/lib/aquilon/aquilon.db
# macOS: /var/db/aquilon/aquilon.db
database_path = "/var/lib/aquilon/aquilon.db"

[policies]
enabled_policies = ["pci_dss", "sox", "gdpr", "ccpa"]

[policies.policy_configs.pci_dss]
enabled = true
settings = { alert_on_test_data = "false" }

[policies.policy_configs.sox]
enabled = true
settings = { confidence_threshold = "0.85" }

[removable_media]
auto_scan_on_mount = true

[cache]
enabled = true
ttl_secs = 7200
scan_cache_ttl_days = 14

[worker]
num_workers = 8

[resource_limits]
enabled = true
max_cpu_percent = 60.0
max_memory_mb = 1024

Small Business (Basic Edition)

watch_paths = ["/home/%%", "/var/data/%%"]
exclude_paths = ["/home/*/.cache/%%"]
# Linux: /var/lib/aquilon/aquilon.db
# macOS: /var/db/aquilon/aquilon.db
database_path = "/var/lib/aquilon/aquilon.db"

[policies]
enabled_policies = ["gdpr", "ccpa"]

[cache]
enabled = true
ttl_secs = 3600
scan_cache_ttl_days = 7

[scan]
max_findings_per_scanner = 5
max_scan_size_mb = 40

[worker]
num_workers = 2  # Conservative for small systems

[resource_limits]
enabled = true
max_cpu_percent = 30.0
max_memory_mb = 256

Complete Example Configurations

For complete, production-ready configuration examples, see:

  • Basic Edition: docs/config-examples/aquilon_dlp_config_basic.toml
  • Enterprise Edition: docs/config-examples/aquilon_dlp_config_enterprise.toml

Custom Scanners and Policies

Custom scanners and policies are defined directly in the main configuration file using [[scanners]] and [[custom_policies]] sections. See Policy Frameworks for creating custom policies.

Example custom scanner (add to your main config):

[[scanners]]
name = "employee_id"
description = "ACME Corp employee IDs (format: EMP-######)"
regex = "EMP-[0-9]{6}"
redaction_pattern = "EMP-XXXXXX"
base_confidence = 0.85

Validate your configuration:

sudo aquilon-dlp --config /etc/aquilon/aquilon_dlp_config.toml --validate-config

Applying Configuration Changes

After modifying the configuration file, restart the osqueryd service:

macOS:

sudo launchctl unload /Library/LaunchDaemons/io.osquery.agent.plist
sudo launchctl load /Library/LaunchDaemons/io.osquery.agent.plist

Note: OSQuery 5.0.1+ uses io.osquery.agent.plist. Older versions use com.facebook.osqueryd.plist.

Linux:

sudo systemctl restart osqueryd

Validating Configuration

Check for configuration errors in the logs:

# macOS
tail -f /var/log/aquilon/aquilon-dlp.log | grep -i error

# Linux
journalctl -u osqueryd -f | grep -i aquilon

Common validation errors:

  • Invalid TOML syntax
  • Unknown policy names
  • Invalid regex patterns in custom scanners
  • Missing required fields

Environment Variables

Override configuration settings with environment variables using the AQUILON_DLP_ prefix:

# Worker configuration
export AQUILON_DLP_WORKER_NUM_WORKERS=8

# Resource limits
export AQUILON_DLP_RESOURCE_LIMITS_ENABLED=true
export AQUILON_DLP_RESOURCE_LIMITS_MAX_CPU_PERCENT=75.0

# Cache configuration
export AQUILON_DLP_CACHE_ENABLED=true
export AQUILON_DLP_CACHE_TTL_SECS=3600

# Watch paths (JSON array format)
export AQUILON_DLP_WATCH_PATHS='["/home/%%","/var/data/%%"]'

# Database path
export AQUILON_DLP_DATABASE_PATH=/var/lib/aquilon/aquilon.db

Environment variables override TOML configuration using underscore-separated paths. For example, [resource_limits] max_cpu_percent becomes AQUILON_DLP_RESOURCE_LIMITS_MAX_CPU_PERCENT.

Configuration Reference

For complete configuration reference and schema documentation, see the comments in the default configuration file:

cat /etc/aquilon/config.toml.default