Architecture
This page provides an architectural overview of Aquilon DLP, including system details, component architecture, and deployment topologies.
System Context
Aquilon DLP operates within an enterprise security ecosystem, integrating with OSQuery for system monitoring and exposing findings to SIEM systems for alerting and compliance reporting.
graph TB
subgraph "Enterprise Environment"
SA[Security Analysts]
SysAdmin[System Administrators]
SIEM[SIEM/Alerting System]
subgraph "Monitored System"
FS[File System]
OSQ[OSQuery]
AquilonDLP[Aquilon DLP]
end
end
SA -->|Query Alerts| OSQ
SA -->|Review Findings| SIEM
SysAdmin -->|Configure| AquilonDLP
FS -->|File Events| AquilonDLP
AquilonDLP -->|Scan Results| OSQ
OSQ -->|Export| SIEM
style AquilonDLP fill:#4a90e2,stroke:#2e5c8a,stroke-width:3px,color:#fff
style OSQ fill:#7cb342,stroke:#558b2f,stroke-width:2px,color:#fff
style SIEM fill:#f57c00,stroke:#e65100,stroke-width:2px,color:#fff
Key Interactions:
- File System Monitoring: Aquilon DLP monitors directories for new/modified files
- Scan and Detect: Files are parsed, decompressed (if needed), and scanned for sensitive data
- OSQuery Integration: Findings exposed via
aquilon_dlp_alertsand related tables - SIEM Export: OSQuery exports alerts to enterprise SIEM systems
- Analyst Queries: Security analysts query findings via OSQuery or SIEM dashboards
Component Architecture
Aquilon DLP uses a plugin-based architecture with three primary layers: Scanner Engine, File Handler Layer, and Policy Engine.
graph TB
subgraph "Aquilon DLP Core"
FW[File Watcher]
FH[File Handler Layer]
SE[Scanner Engine]
PE[Policy Engine]
DB[(SQLite Cache)]
OSE[OSQuery Extension Interface]
end
subgraph "File Handlers (9 Formats)"
ZIP[ZIP Handler]
TAR[TAR Handler]
GZIP[GZIP Handler]
PDF[PDF Handler]
DOCX[DOCX Handler]
XLSX[XLSX Handler]
SEVEN[7-Zip Handler]
RAR[RAR Handler]
TEXT[Text Handler]
end
subgraph "Scanner Plugins (50+ Scanners)"
SSN[SSN Scanner]
CC[Credit Card Scanner]
EMAIL[Email Scanner]
PHONE[Phone Scanner]
PASSPORT[Passport Scanner]
NATID[National ID Scanners]
MORE[... 40+ more scanners]
end
subgraph "Policy Frameworks"
HIPAA[HIPAA Framework]
PCI[PCI DSS Framework]
GDPR[GDPR Framework]
CCPA[CCPA Framework]
SOX[SOX Framework]
ISO[ISO 27001 Framework]
end
FW -->|New/Modified Files| FH
FH --> ZIP
FH --> TAR
FH --> GZIP
FH --> PDF
FH --> DOCX
FH --> XLSX
FH --> SEVEN
FH --> RAR
FH --> TEXT
ZIP -->|Extracted Text| SE
TAR -->|Extracted Text| SE
GZIP -->|Decompressed Text| SE
PDF -->|Extracted Text| SE
DOCX -->|Extracted Text| SE
XLSX -->|Extracted Text| SE
SEVEN -->|Extracted Text| SE
RAR -->|Extracted Text| SE
TEXT -->|Raw Text| SE
SE --> SSN
SE --> CC
SE --> EMAIL
SE --> PHONE
SE --> PASSPORT
SE --> NATID
SE --> MORE
SSN -->|Findings| PE
CC -->|Findings| PE
EMAIL -->|Findings| PE
PHONE -->|Findings| PE
PASSPORT -->|Findings| PE
NATID -->|Findings| PE
MORE -->|Findings| PE
PE --> HIPAA
PE --> PCI
PE --> GDPR
PE --> CCPA
PE --> SOX
PE --> ISO
HIPAA -->|Violations| DB
PCI -->|Violations| DB
GDPR -->|Violations| DB
CCPA -->|Violations| DB
SOX -->|Violations| DB
ISO -->|Violations| DB
DB -->|Query Interface| OSE
style FW fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
style SE fill:#7cb342,stroke:#558b2f,stroke-width:2px,color:#fff
style PE fill:#f57c00,stroke:#e65100,stroke-width:2px,color:#fff
style DB fill:#9c27b0,stroke:#6a1b9a,stroke-width:2px,color:#fff
Layer Descriptions:
File Handler Layer (9 Handlers)
Processes various file formats and containers:
- Archive Handlers: ZIP, TAR, GZIP, 7-Zip, RAR (recursive extraction)
- Document Handlers: PDF, DOCX, XLSX (text extraction)
- Text Handler: Plain text, source code, config files
Key Feature: Recursive descent into nested archives (e.g., ZIP inside TAR inside GZIP)
Scanner Engine (50+ Plugins)
Detects sensitive data patterns:
- National ID Scanners: 28 country-specific national IDs (EU, Americas, Asia-Pacific, Middle East)
- Identity Scanners: SSN, passport, driverβs license
- Financial Scanners: Credit cards, bank accounts, IBAN
- Healthcare Scanners: Medical record numbers, NPI, MBI
- Contact Scanners: Emails, phone numbers, physical addresses
- Credential Scanners: API keys, tokens, crypto keys, database connections
Key Feature: Stream-based scanning with O(1) memory usage (constant memory regardless of file size)
Policy Engine (6 Frameworks)
Maps findings to compliance requirements:
- π’ HIPAA: Healthcare PHI detection (Enterprise only)
- π’ PCI DSS: Payment card data (Enterprise only)
- π’ SOX: Financial data (Enterprise only)
- π’ ISO 27001: Information security (Enterprise only)
- GDPR: EU personal data (All editions)
- CCPA: California consumer data (All editions)
Key Feature: Multi-framework evaluation (single file can trigger multiple policy violations)
Database Cache (SQLite)
Stores scan results with:
- Hash-based deduplication: Skip rescanning unchanged files
- Metadata indexing: Fast lookups by path, policy, severity
- Retention policies: Configurable cleanup of old findings
Performance: 5.4M operations/sec query throughput
Deployment Topology
Aquilon DLP supports multiple deployment models depending on organizational needs.
graph TB
subgraph "Single-Node Deployment"
subgraph "Host System"
FS1[File System]
OSQ1[OSQuery]
AQD1[Aquilon DLP]
DB1[(SQLite Cache)]
end
FS1 --> AQD1
AQD1 --> DB1
DB1 --> OSQ1
OSQ1 -->|Export| SIEM1[SIEM/Alerting]
end
subgraph "Enterprise Deployment (Distributed)"
subgraph "Fleet (100s-1000s of hosts)"
subgraph "Host 1"
FS2[File System]
OSQ2[OSQuery]
AQD2[Aquilon DLP]
DB2[(Cache)]
end
subgraph "Host 2"
FS3[File System]
OSQ3[OSQuery]
AQD3[Aquilon DLP]
DB3[(Cache)]
end
subgraph "Host N"
FS4[File System]
OSQ4[OSQuery]
AQD4[Aquilon DLP]
DB4[(Cache)]
end
end
subgraph "Central Infrastructure"
FLEET[OSQuery Fleet Manager]
CENTRAL_SIEM[Central SIEM]
DASHBOARD[Compliance Dashboard]
end
OSQ2 --> FLEET
OSQ3 --> FLEET
OSQ4 --> FLEET
FLEET --> CENTRAL_SIEM
CENTRAL_SIEM --> DASHBOARD
end
subgraph "π MDM Deployment (macOS)"
subgraph "MDM System"
JAMF[Jamf Pro / Intune / Kandji]
CONFIG[Configuration Profiles]
PKG[Aquilon DLP PKG]
end
subgraph "macOS Fleet"
MAC1[MacBook 1]
MAC2[MacBook 2]
MACN[MacBook N]
end
JAMF -->|Deploy PKG| MAC1
JAMF -->|Deploy PKG| MAC2
JAMF -->|Deploy PKG| MACN
CONFIG -->|Full Disk Access| MAC1
CONFIG -->|Full Disk Access| MAC2
CONFIG -->|Full Disk Access| MACN
end
style AQD1 fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
style AQD2 fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
style AQD3 fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
style AQD4 fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
Deployment Models:
Single-Node Deployment
Best for:
- Small teams (< 5 servers for Basic Edition)
- Development/staging environments
- Proof-of-concept deployments
Architecture:
- Aquilon DLP runs on each monitored system
- Local SQLite cache stores findings
- OSQuery exposes findings locally
- Optional SIEM export for centralized alerting
Setup Time: ~5 minutes per host
Enterprise Deployment (Distributed)
Best for:
- Large organizations (100s-1000s of hosts)
- Multi-site deployments
- Compliance-driven environments (healthcare, finance)
Architecture:
- Aquilon DLP deployed on every monitored host
- OSQuery Fleet Manager aggregates findings across fleet
- Central SIEM processes alerts and generates compliance reports
- Compliance Dashboard provides executive visibility
Key Features:
- Unlimited server licensing (Enterprise Edition)
- All policy frameworks (HIPAA, PCI DSS, SOX, ISO 27001, GDPR, CCPA)
- Enterprise support with 4-hour SLA for critical issues
π MDM Deployment (macOS)
Best for:
- macOS fleet management (Enterprise Edition only)
- Organizations using Jamf Pro, Microsoft Intune, or Kandji
- Zero-touch deployment for new devices
Architecture:
- PKG installer deployed via MDM system
- Configuration profiles grant Full Disk Access
- Launch Daemon ensures Aquilon DLP starts on boot
- Integration with OSQuery for monitoring
Key Features:
- Automated deployment to 100s-1000s of Macs
- Centralized configuration management
- Native Endpoint Security integration
- User-transparent operation
Data Flow
Understanding how data flows through Aquilon DLP:
1. File Monitoring
macOS (Enterprise Edition):
- Native Endpoint Security API monitors file system events
- Events filtered by watch paths and exclusions
- New/modified files queued for scanning
Linux (All Editions):
- inotify-based file system monitoring
- Recursive directory watching with pattern matching
- Event deduplication to prevent scan storms
2. File Processing
File Detected β File Handler Selection β Format Processing β Text Extraction
Handler Selection:
- Based on file extension and magic number detection
- Archive handlers recursively process nested containers
- Document handlers extract text from structured formats
- Text handler processes plain text files directly
Example Flow (nested archive):
report.zip β ZIP Handler
ββ data.tar β TAR Handler
β ββ records.txt β Text Handler β Scanner Engine
β ββ patient.pdf β PDF Handler β Scanner Engine
ββ summary.docx β DOCX Handler β Scanner Engine
3. Scanning
Stream-Based Processing:
- Text streamed to scanner plugins (not loaded into memory)
- All 50+ scanners run concurrently on same stream
- Constant O(1) memory usage regardless of file size
- 5.4M operations/sec throughput
Finding Generation:
- Each scanner reports matches with details (line number, surrounding text)
- Metadata captured: file path, scanner type, confidence score
- Findings passed to Policy Engine for evaluation
4. Policy Evaluation
Framework Matching:
- Each finding evaluated against enabled policy frameworks
- HIPAA: Checks for PHI patterns (SSN + medical details)
- PCI DSS: Validates credit card numbers with checksums
- GDPR/CCPA: Identifies EU/CA personal data
- SOX: Detects financial records requiring retention
- ISO 27001: Flags sensitive information assets
Severity Assignment:
- Critical: SSN, credit cards, passport numbers
- High: Email addresses, phone numbers (in sensitive contexts)
- Medium: Generic PII without strong identifiers
- Low: Informational findings (email domains)
5. Storage and Exposure
SQLite Cache:
- Hash-based deduplication (skip unchanged files)
- Indexed by path, policy, severity, timestamp
- Configurable retention (default 90 days)
- Vacuum and optimization on schedule
OSQuery Tables:
aquilon_dlp_alerts: Findings with policy violations, triage status, and metadata
SIEM Export:
- OSQuery scheduled queries export to SIEM
- JSON format with full details
- Configurable alert thresholds and grouping
- Integration with Splunk, Elasticsearch, QRadar, etc.
Performance Characteristics
Aquilon DLP is optimized for production workloads with minimal system impact:
Memory Usage
- O(1) Memory: Constant memory regardless of file size
- Stream Processing: Files scanned incrementally (no full load)
- Typical Usage: 50-150MB per process (depending on plugin count)
- Archive Handling: Temporary extraction cleaned up immediately
Throughput
- Scanner Engine: 5.4M operations/sec (single-threaded)
- File Processing: Limited by disk I/O (linear scaling)
- Concurrent Scanning: Configurable worker pool (default 4 workers)
- Archive Decompression: Streamed (no disk spooling for small files)
Latency
- Small Files (< 1MB): Sub-millisecond scan time
- Medium Files (1-100MB): Milliseconds to seconds
- Large Files (> 100MB): Seconds (configurable skip threshold)
- Archives: Proportional to number of contained files
Optimization Strategies
Cache Hit Rate:
- Hash-based deduplication: ~85-95% cache hits in typical environments
- Skip rescanning unchanged files
- Invalidation on modification timestamp change
Exclusion Patterns:
- Exclude high-churn directories (caches, temp files)
- Skip binary-only files (executables, images)
- Configurable max file size (default 100MB)
Worker Tuning:
- Adjust
num_workersbased on CPU cores - Default 4 workers balances throughput and system impact
- Increase for I/O-bound workloads, decrease for CPU-constrained systems
Plugin Architecture
Aquilon DLPβs extensibility comes from its plugin-based design:
Scanner Plugin Interface
All scanners implement the StreamScanner trait:
#![allow(unused)]
fn main() {
pub trait StreamScanner {
fn scan(&self, content: &str) -> anyhow::Result<Vec<Finding>>;
fn scanner_type(&self) -> &str;
}
}
Benefits:
- New scanners added without modifying core engine
- Independent testing and versioning
- Community contributions possible
File Handler Plugin Interface
Handlers implement the FileHandler trait:
#![allow(unused)]
fn main() {
pub trait FileHandler {
fn can_handle(&self, path: &Path) -> bool;
fn process(&self, path: &Path) -> anyhow::Result<Vec<String>>;
}
}
Benefits:
- Support new formats without core changes
- Recursive container handling (archives in archives)
- Fallback to text handler if format unknown
Policy Framework Interface
Frameworks implement the PolicyFramework trait:
#![allow(unused)]
fn main() {
pub trait PolicyFramework {
fn evaluate(&self, findings: &[Finding]) -> Vec<PolicyViolation>;
fn framework_name(&self) -> &str;
}
}
Benefits:
- Custom compliance frameworks
- Org-specific rules via TOML policies
- Combine multiple frameworks (e.g., HIPAA + PCI DSS)
Security Considerations
Data Handling
- No External Transmission: All scanning happens locally
- Local Cache Only: Findings stored in local SQLite database
- Configurable Retention: Auto-delete old findings (compliance requirement)
- Access Control: Cache file permissions restrict to root/admin
macOS Endpoint Security
π Enterprise Edition:
- Native Endpoint Security framework (requires entitlements)
- System Extension approval required
- Full Disk Access permission for comprehensive monitoring
- Code signed and notarized for enterprise deployment
Linux Security
- inotify Limits: Configurable watch limits (sysctl tuning)
- File Permissions: Respects existing file ACLs
- Systemd Integration: Runs as systemd service with restart policies
- SELinux Support: Compatible with enforcing mode (policy module available)
Next Steps
- User Guide: Learn how to configure policies
- Deployment: Explore deployment options for your environment
- API Integration: Query findings via OSQuery tables
- Compliance: Review compliance frameworks for your industry