Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Architecture

This page provides an architectural overview of Aquilon DLP, including system details, component architecture, and deployment topologies.

System Context

Aquilon DLP operates within an enterprise security ecosystem, integrating with OSQuery for system monitoring and exposing findings to SIEM systems for alerting and compliance reporting.

graph TB
    subgraph "Enterprise Environment"
        SA[Security Analysts]
        SysAdmin[System Administrators]
        SIEM[SIEM/Alerting System]

        subgraph "Monitored System"
            FS[File System]
            OSQ[OSQuery]
            AquilonDLP[Aquilon DLP]
        end
    end

    SA -->|Query Alerts| OSQ
    SA -->|Review Findings| SIEM
    SysAdmin -->|Configure| AquilonDLP

    FS -->|File Events| AquilonDLP
    AquilonDLP -->|Scan Results| OSQ
    OSQ -->|Export| SIEM

    style AquilonDLP fill:#4a90e2,stroke:#2e5c8a,stroke-width:3px,color:#fff
    style OSQ fill:#7cb342,stroke:#558b2f,stroke-width:2px,color:#fff
    style SIEM fill:#f57c00,stroke:#e65100,stroke-width:2px,color:#fff

Key Interactions:

  1. File System Monitoring: Aquilon DLP monitors directories for new/modified files
  2. Scan and Detect: Files are parsed, decompressed (if needed), and scanned for sensitive data
  3. OSQuery Integration: Findings exposed via aquilon_dlp_alerts and related tables
  4. SIEM Export: OSQuery exports alerts to enterprise SIEM systems
  5. Analyst Queries: Security analysts query findings via OSQuery or SIEM dashboards

Component Architecture

Aquilon DLP uses a plugin-based architecture with three primary layers: Scanner Engine, File Handler Layer, and Policy Engine.

graph TB
    subgraph "Aquilon DLP Core"
        FW[File Watcher]
        FH[File Handler Layer]
        SE[Scanner Engine]
        PE[Policy Engine]
        DB[(SQLite Cache)]
        OSE[OSQuery Extension Interface]
    end

    subgraph "File Handlers (9 Formats)"
        ZIP[ZIP Handler]
        TAR[TAR Handler]
        GZIP[GZIP Handler]
        PDF[PDF Handler]
        DOCX[DOCX Handler]
        XLSX[XLSX Handler]
        SEVEN[7-Zip Handler]
        RAR[RAR Handler]
        TEXT[Text Handler]
    end

    subgraph "Scanner Plugins (50+ Scanners)"
        SSN[SSN Scanner]
        CC[Credit Card Scanner]
        EMAIL[Email Scanner]
        PHONE[Phone Scanner]
        PASSPORT[Passport Scanner]
        NATID[National ID Scanners]
        MORE[... 40+ more scanners]
    end

    subgraph "Policy Frameworks"
        HIPAA[HIPAA Framework]
        PCI[PCI DSS Framework]
        GDPR[GDPR Framework]
        CCPA[CCPA Framework]
        SOX[SOX Framework]
        ISO[ISO 27001 Framework]
    end

    FW -->|New/Modified Files| FH
    FH --> ZIP
    FH --> TAR
    FH --> GZIP
    FH --> PDF
    FH --> DOCX
    FH --> XLSX
    FH --> SEVEN
    FH --> RAR
    FH --> TEXT

    ZIP -->|Extracted Text| SE
    TAR -->|Extracted Text| SE
    GZIP -->|Decompressed Text| SE
    PDF -->|Extracted Text| SE
    DOCX -->|Extracted Text| SE
    XLSX -->|Extracted Text| SE
    SEVEN -->|Extracted Text| SE
    RAR -->|Extracted Text| SE
    TEXT -->|Raw Text| SE

    SE --> SSN
    SE --> CC
    SE --> EMAIL
    SE --> PHONE
    SE --> PASSPORT
    SE --> NATID
    SE --> MORE

    SSN -->|Findings| PE
    CC -->|Findings| PE
    EMAIL -->|Findings| PE
    PHONE -->|Findings| PE
    PASSPORT -->|Findings| PE
    NATID -->|Findings| PE
    MORE -->|Findings| PE

    PE --> HIPAA
    PE --> PCI
    PE --> GDPR
    PE --> CCPA
    PE --> SOX
    PE --> ISO

    HIPAA -->|Violations| DB
    PCI -->|Violations| DB
    GDPR -->|Violations| DB
    CCPA -->|Violations| DB
    SOX -->|Violations| DB
    ISO -->|Violations| DB

    DB -->|Query Interface| OSE

    style FW fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
    style SE fill:#7cb342,stroke:#558b2f,stroke-width:2px,color:#fff
    style PE fill:#f57c00,stroke:#e65100,stroke-width:2px,color:#fff
    style DB fill:#9c27b0,stroke:#6a1b9a,stroke-width:2px,color:#fff

Layer Descriptions:

File Handler Layer (9 Handlers)

Processes various file formats and containers:

  • Archive Handlers: ZIP, TAR, GZIP, 7-Zip, RAR (recursive extraction)
  • Document Handlers: PDF, DOCX, XLSX (text extraction)
  • Text Handler: Plain text, source code, config files

Key Feature: Recursive descent into nested archives (e.g., ZIP inside TAR inside GZIP)

Scanner Engine (50+ Plugins)

Detects sensitive data patterns:

  • National ID Scanners: 28 country-specific national IDs (EU, Americas, Asia-Pacific, Middle East)
  • Identity Scanners: SSN, passport, driver’s license
  • Financial Scanners: Credit cards, bank accounts, IBAN
  • Healthcare Scanners: Medical record numbers, NPI, MBI
  • Contact Scanners: Emails, phone numbers, physical addresses
  • Credential Scanners: API keys, tokens, crypto keys, database connections

Key Feature: Stream-based scanning with O(1) memory usage (constant memory regardless of file size)

Policy Engine (6 Frameworks)

Maps findings to compliance requirements:

  • 🏒 HIPAA: Healthcare PHI detection (Enterprise only)
  • 🏒 PCI DSS: Payment card data (Enterprise only)
  • 🏒 SOX: Financial data (Enterprise only)
  • 🏒 ISO 27001: Information security (Enterprise only)
  • GDPR: EU personal data (All editions)
  • CCPA: California consumer data (All editions)

Key Feature: Multi-framework evaluation (single file can trigger multiple policy violations)

Database Cache (SQLite)

Stores scan results with:

  • Hash-based deduplication: Skip rescanning unchanged files
  • Metadata indexing: Fast lookups by path, policy, severity
  • Retention policies: Configurable cleanup of old findings

Performance: 5.4M operations/sec query throughput


Deployment Topology

Aquilon DLP supports multiple deployment models depending on organizational needs.

graph TB
    subgraph "Single-Node Deployment"
        subgraph "Host System"
            FS1[File System]
            OSQ1[OSQuery]
            AQD1[Aquilon DLP]
            DB1[(SQLite Cache)]
        end

        FS1 --> AQD1
        AQD1 --> DB1
        DB1 --> OSQ1
        OSQ1 -->|Export| SIEM1[SIEM/Alerting]
    end

    subgraph "Enterprise Deployment (Distributed)"
        subgraph "Fleet (100s-1000s of hosts)"
            subgraph "Host 1"
                FS2[File System]
                OSQ2[OSQuery]
                AQD2[Aquilon DLP]
                DB2[(Cache)]
            end

            subgraph "Host 2"
                FS3[File System]
                OSQ3[OSQuery]
                AQD3[Aquilon DLP]
                DB3[(Cache)]
            end

            subgraph "Host N"
                FS4[File System]
                OSQ4[OSQuery]
                AQD4[Aquilon DLP]
                DB4[(Cache)]
            end
        end

        subgraph "Central Infrastructure"
            FLEET[OSQuery Fleet Manager]
            CENTRAL_SIEM[Central SIEM]
            DASHBOARD[Compliance Dashboard]
        end

        OSQ2 --> FLEET
        OSQ3 --> FLEET
        OSQ4 --> FLEET

        FLEET --> CENTRAL_SIEM
        CENTRAL_SIEM --> DASHBOARD
    end

    subgraph "🍎 MDM Deployment (macOS)"
        subgraph "MDM System"
            JAMF[Jamf Pro / Intune / Kandji]
            CONFIG[Configuration Profiles]
            PKG[Aquilon DLP PKG]
        end

        subgraph "macOS Fleet"
            MAC1[MacBook 1]
            MAC2[MacBook 2]
            MACN[MacBook N]
        end

        JAMF -->|Deploy PKG| MAC1
        JAMF -->|Deploy PKG| MAC2
        JAMF -->|Deploy PKG| MACN

        CONFIG -->|Full Disk Access| MAC1
        CONFIG -->|Full Disk Access| MAC2
        CONFIG -->|Full Disk Access| MACN
    end

    style AQD1 fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
    style AQD2 fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
    style AQD3 fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff
    style AQD4 fill:#4a90e2,stroke:#2e5c8a,stroke-width:2px,color:#fff

Deployment Models:

Single-Node Deployment

Best for:

  • Small teams (< 5 servers for Basic Edition)
  • Development/staging environments
  • Proof-of-concept deployments

Architecture:

  • Aquilon DLP runs on each monitored system
  • Local SQLite cache stores findings
  • OSQuery exposes findings locally
  • Optional SIEM export for centralized alerting

Setup Time: ~5 minutes per host

Enterprise Deployment (Distributed)

Best for:

  • Large organizations (100s-1000s of hosts)
  • Multi-site deployments
  • Compliance-driven environments (healthcare, finance)

Architecture:

  • Aquilon DLP deployed on every monitored host
  • OSQuery Fleet Manager aggregates findings across fleet
  • Central SIEM processes alerts and generates compliance reports
  • Compliance Dashboard provides executive visibility

Key Features:

  • Unlimited server licensing (Enterprise Edition)
  • All policy frameworks (HIPAA, PCI DSS, SOX, ISO 27001, GDPR, CCPA)
  • Enterprise support with 4-hour SLA for critical issues

🍎 MDM Deployment (macOS)

Best for:

  • macOS fleet management (Enterprise Edition only)
  • Organizations using Jamf Pro, Microsoft Intune, or Kandji
  • Zero-touch deployment for new devices

Architecture:

  • PKG installer deployed via MDM system
  • Configuration profiles grant Full Disk Access
  • Launch Daemon ensures Aquilon DLP starts on boot
  • Integration with OSQuery for monitoring

Key Features:

  • Automated deployment to 100s-1000s of Macs
  • Centralized configuration management
  • Native Endpoint Security integration
  • User-transparent operation

Data Flow

Understanding how data flows through Aquilon DLP:

1. File Monitoring

macOS (Enterprise Edition):

  • Native Endpoint Security API monitors file system events
  • Events filtered by watch paths and exclusions
  • New/modified files queued for scanning

Linux (All Editions):

  • inotify-based file system monitoring
  • Recursive directory watching with pattern matching
  • Event deduplication to prevent scan storms

2. File Processing

File Detected β†’ File Handler Selection β†’ Format Processing β†’ Text Extraction

Handler Selection:

  • Based on file extension and magic number detection
  • Archive handlers recursively process nested containers
  • Document handlers extract text from structured formats
  • Text handler processes plain text files directly

Example Flow (nested archive):

report.zip β†’ ZIP Handler
  β”œβ”€ data.tar β†’ TAR Handler
  β”‚   β”œβ”€ records.txt β†’ Text Handler β†’ Scanner Engine
  β”‚   └─ patient.pdf β†’ PDF Handler β†’ Scanner Engine
  └─ summary.docx β†’ DOCX Handler β†’ Scanner Engine

3. Scanning

Stream-Based Processing:

  • Text streamed to scanner plugins (not loaded into memory)
  • All 50+ scanners run concurrently on same stream
  • Constant O(1) memory usage regardless of file size
  • 5.4M operations/sec throughput

Finding Generation:

  • Each scanner reports matches with details (line number, surrounding text)
  • Metadata captured: file path, scanner type, confidence score
  • Findings passed to Policy Engine for evaluation

4. Policy Evaluation

Framework Matching:

  • Each finding evaluated against enabled policy frameworks
  • HIPAA: Checks for PHI patterns (SSN + medical details)
  • PCI DSS: Validates credit card numbers with checksums
  • GDPR/CCPA: Identifies EU/CA personal data
  • SOX: Detects financial records requiring retention
  • ISO 27001: Flags sensitive information assets

Severity Assignment:

  • Critical: SSN, credit cards, passport numbers
  • High: Email addresses, phone numbers (in sensitive contexts)
  • Medium: Generic PII without strong identifiers
  • Low: Informational findings (email domains)

5. Storage and Exposure

SQLite Cache:

  • Hash-based deduplication (skip unchanged files)
  • Indexed by path, policy, severity, timestamp
  • Configurable retention (default 90 days)
  • Vacuum and optimization on schedule

OSQuery Tables:

  • aquilon_dlp_alerts: Findings with policy violations, triage status, and metadata

SIEM Export:

  • OSQuery scheduled queries export to SIEM
  • JSON format with full details
  • Configurable alert thresholds and grouping
  • Integration with Splunk, Elasticsearch, QRadar, etc.

Performance Characteristics

Aquilon DLP is optimized for production workloads with minimal system impact:

Memory Usage

  • O(1) Memory: Constant memory regardless of file size
  • Stream Processing: Files scanned incrementally (no full load)
  • Typical Usage: 50-150MB per process (depending on plugin count)
  • Archive Handling: Temporary extraction cleaned up immediately

Throughput

  • Scanner Engine: 5.4M operations/sec (single-threaded)
  • File Processing: Limited by disk I/O (linear scaling)
  • Concurrent Scanning: Configurable worker pool (default 4 workers)
  • Archive Decompression: Streamed (no disk spooling for small files)

Latency

  • Small Files (< 1MB): Sub-millisecond scan time
  • Medium Files (1-100MB): Milliseconds to seconds
  • Large Files (> 100MB): Seconds (configurable skip threshold)
  • Archives: Proportional to number of contained files

Optimization Strategies

Cache Hit Rate:

  • Hash-based deduplication: ~85-95% cache hits in typical environments
  • Skip rescanning unchanged files
  • Invalidation on modification timestamp change

Exclusion Patterns:

  • Exclude high-churn directories (caches, temp files)
  • Skip binary-only files (executables, images)
  • Configurable max file size (default 100MB)

Worker Tuning:

  • Adjust num_workers based on CPU cores
  • Default 4 workers balances throughput and system impact
  • Increase for I/O-bound workloads, decrease for CPU-constrained systems

Plugin Architecture

Aquilon DLP’s extensibility comes from its plugin-based design:

Scanner Plugin Interface

All scanners implement the StreamScanner trait:

#![allow(unused)]
fn main() {
pub trait StreamScanner {
    fn scan(&self, content: &str) -> anyhow::Result<Vec<Finding>>;
    fn scanner_type(&self) -> &str;
}
}

Benefits:

  • New scanners added without modifying core engine
  • Independent testing and versioning
  • Community contributions possible

File Handler Plugin Interface

Handlers implement the FileHandler trait:

#![allow(unused)]
fn main() {
pub trait FileHandler {
    fn can_handle(&self, path: &Path) -> bool;
    fn process(&self, path: &Path) -> anyhow::Result<Vec<String>>;
}
}

Benefits:

  • Support new formats without core changes
  • Recursive container handling (archives in archives)
  • Fallback to text handler if format unknown

Policy Framework Interface

Frameworks implement the PolicyFramework trait:

#![allow(unused)]
fn main() {
pub trait PolicyFramework {
    fn evaluate(&self, findings: &[Finding]) -> Vec<PolicyViolation>;
    fn framework_name(&self) -> &str;
}
}

Benefits:

  • Custom compliance frameworks
  • Org-specific rules via TOML policies
  • Combine multiple frameworks (e.g., HIPAA + PCI DSS)

Security Considerations

Data Handling

  • No External Transmission: All scanning happens locally
  • Local Cache Only: Findings stored in local SQLite database
  • Configurable Retention: Auto-delete old findings (compliance requirement)
  • Access Control: Cache file permissions restrict to root/admin

macOS Endpoint Security

🍎 Enterprise Edition:

  • Native Endpoint Security framework (requires entitlements)
  • System Extension approval required
  • Full Disk Access permission for comprehensive monitoring
  • Code signed and notarized for enterprise deployment

Linux Security

  • inotify Limits: Configurable watch limits (sysctl tuning)
  • File Permissions: Respects existing file ACLs
  • Systemd Integration: Runs as systemd service with restart policies
  • SELinux Support: Compatible with enforcing mode (policy module available)

Next Steps