Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Disaster Recovery

Planning and procedures for recovering Aquilon DLP in disaster scenarios.

Recovery Planning

Recovery Objectives

MetricTargetDescription
RTO (Recovery Time Objective)1 hourTime to restore service
RPO (Recovery Point Objective)24 hoursMaximum data loss acceptable

Critical Components

ComponentRecovery PriorityNotes
ConfigurationP1Required for service start
Service binaryP1Application itself
DatabaseP2Historical findings
CacheP3Can be rebuilt

Disaster Scenarios

Scenario 1: Single Endpoint Failure

Symptoms: Service down on one machine

Recovery:

  1. Restore from backup (see Backup & Restore)
  2. Or reinstall and reconfigure
# Restore configuration
cp /backup/aquilon-dlp/latest/aquilon_dlp_config.toml /etc/aquilon-dlp/

# Start service
sudo systemctl start aquilon-dlp

# Verify
sudo systemctl status aquilon-dlp

Scenario 2: Database Corruption

Symptoms: Service fails to start with database errors

Recovery:

# Stop service
sudo systemctl stop aquilon-dlp

# Check corruption
sqlite3 /var/lib/aquilon-dlp/aquilon_dlp.db "PRAGMA integrity_check;"

# If corrupted, restore from backup
cp /backup/aquilon-dlp/latest/aquilon_dlp.db /var/lib/aquilon-dlp/

# If no backup, recreate (loses history)
rm /var/lib/aquilon-dlp/aquilon_dlp.db
sudo systemctl start aquilon-dlp  # Creates new database

Scenario 3: Configuration Loss

Symptoms: Invalid or missing configuration

Recovery:

# Restore from backup
cp /backup/aquilon-dlp/latest/aquilon_dlp_config.toml /etc/aquilon-dlp/

# Or download default
curl -o /etc/aquilon-dlp/aquilon_dlp_config.toml \
  https://raw.githubusercontent.com/aquilonsecurity/aquilon-dlp/main/docs/config-examples/aquilon_dlp_config_enterprise.toml

# Validate
aquilon-dlp --validate-config /etc/aquilon-dlp/aquilon_dlp_config.toml

# Restart
sudo systemctl start aquilon-dlp

Scenario 4: Fleet-Wide Outage

Symptoms: Multiple endpoints affected

Recovery:

  1. Identify root cause (bad update, configuration push, etc.)
  2. Prepare fix (rollback version, configuration fix)
  3. Deploy fix via MDM or configuration management
  4. Monitor recovery

Version Rollback

Download Previous Version

Download the previous version from the Aquilon Security portal and save to /tmp/aquilon-dlp-previous.

Rollback Procedure

# Stop current service
sudo systemctl stop aquilon-dlp

# Backup current binary
cp /usr/local/bin/aquilon-dlp-enterprise /usr/local/bin/aquilon-dlp-enterprise.bak

# Install previous version
cp /tmp/aquilon-dlp-previous /usr/local/bin/aquilon-dlp-enterprise
chmod +x /usr/local/bin/aquilon-dlp-enterprise

# Restart
sudo systemctl start aquilon-dlp

# Verify version
aquilon-dlp-enterprise --version

MDM Rollback

  1. Upload previous version to MDM
  2. Deploy to affected endpoints
  3. Monitor deployment status

Recovery Procedures

Minimal Recovery (Configuration Only)

Fastest recovery - loses historical data but restores monitoring:

  1. Download fresh binary from the Aquilon Security portal
  2. Install to /usr/local/bin/aquilon-dlp-enterprise
  3. Restore configuration from backup or use default
  4. Restart aquilon-dlp service
# Restore configuration from backup
cp /backup/aquilon-dlp/latest/aquilon_dlp_config.toml /etc/aquilon-dlp/

# Restart service
sudo systemctl restart aquilon-dlp

Full Recovery (With History)

Complete recovery with all historical data:

# 1. Install binary from Aquilon Security portal
# Save to: /usr/local/bin/aquilon-dlp-enterprise

# 2. Restore from backup
tar -xzf /backup/aquilon-dlp/latest.tar.gz -C /tmp/
cp /tmp/backup/aquilon_dlp_config.toml /etc/aquilon-dlp/
cp /tmp/backup/aquilon_dlp.db /var/lib/aquilon-dlp/

# 3. Verify integrity
sqlite3 /var/lib/aquilon-dlp/aquilon_dlp.db "PRAGMA integrity_check;"

# 4. Restart service
sudo systemctl restart aquilon-dlp

macOS Recovery

FDA Re-grant After Recovery

After recovery on macOS, FDA may need re-granting:

  1. Check profile:

    sudo profiles list | grep aquilon
    
  2. If missing, redeploy PPPC profile via MDM

  3. Reinstall app:

    sudo rm -rf /Library/Application\ Support/aquilon-dlp.app
    # MDM will reinstall on next check-in
    
  4. Verify FDA:

    sudo sqlite3 /Library/Application\ Support/com.apple.TCC/TCC.db \
      "SELECT auth_value FROM access WHERE client = 'dev.aquilon.dlp-plugin';"
    

Verification

Post-Recovery Checklist

  • Service running: systemctl status aquilon-dlp
  • Configuration valid: –validate-config
  • Database accessible: OSQuery tables return data
  • Findings generating: New alerts appearing
  • Monitoring active: Prometheus metrics available
  • macOS: FDA granted (if applicable)

Recovery Test Queries

-- Service health - verify table exists
SELECT COUNT(*) as total_alerts FROM aquilon_dlp_alerts;

-- Recent activity
SELECT COUNT(*) as alerts_24h
FROM aquilon_dlp_alerts
WHERE timestamp > (strftime('%s', 'now') - 86400);

-- Alert breakdown
SELECT severity, COUNT(*) as count
FROM aquilon_dlp_alerts
GROUP BY severity;

Automated Recovery

Systemd Auto-Restart

Configure in service file:

[Service]
Restart=on-failure
RestartSec=10s
StartLimitBurst=5
StartLimitIntervalSec=60s

Health Check Script

#!/bin/bash
# /usr/local/bin/aquilon-dlp-healthcheck.sh

if ! systemctl is-active --quiet aquilon-dlp; then
    echo "Service down, attempting restart"
    systemctl start aquilon-dlp
    sleep 10

    if ! systemctl is-active --quiet aquilon-dlp; then
        echo "CRITICAL: Service failed to start"
        # Send alert to monitoring system
        exit 1
    fi
fi

exit 0

Add to crontab:

*/5 * * * * /usr/local/bin/aquilon-dlp-healthcheck.sh

Communication Plan

During Outage

  1. Notify security team of reduced DLP coverage
  2. Update incident ticket
  3. Monitor recovery progress

Post-Recovery

  1. Verify all endpoints recovered
  2. Check for data gaps in findings
  3. Document root cause
  4. Update runbooks if needed

Prevention

Regular Testing

  • Monthly: Test restore from backup
  • Quarterly: Full DR drill
  • Annually: Review and update DR plan

Monitoring

Set up alerts for:

  • Service down
  • Database corruption
  • Configuration validation failures
  • Scan rate drops