Disaster Recovery
Planning and procedures for recovering Aquilon DLP in disaster scenarios.
Recovery Planning
Recovery Objectives
| Metric | Target | Description |
|---|---|---|
| RTO (Recovery Time Objective) | 1 hour | Time to restore service |
| RPO (Recovery Point Objective) | 24 hours | Maximum data loss acceptable |
Critical Components
| Component | Recovery Priority | Notes |
|---|---|---|
| Configuration | P1 | Required for service start |
| Service binary | P1 | Application itself |
| Database | P2 | Historical findings |
| Cache | P3 | Can be rebuilt |
Disaster Scenarios
Scenario 1: Single Endpoint Failure
Symptoms: Service down on one machine
Recovery:
- Restore from backup (see Backup & Restore)
- Or reinstall and reconfigure
# Restore configuration
cp /backup/aquilon-dlp/latest/aquilon_dlp_config.toml /etc/aquilon-dlp/
# Start service
sudo systemctl start aquilon-dlp
# Verify
sudo systemctl status aquilon-dlp
Scenario 2: Database Corruption
Symptoms: Service fails to start with database errors
Recovery:
# Stop service
sudo systemctl stop aquilon-dlp
# Check corruption
sqlite3 /var/lib/aquilon-dlp/aquilon_dlp.db "PRAGMA integrity_check;"
# If corrupted, restore from backup
cp /backup/aquilon-dlp/latest/aquilon_dlp.db /var/lib/aquilon-dlp/
# If no backup, recreate (loses history)
rm /var/lib/aquilon-dlp/aquilon_dlp.db
sudo systemctl start aquilon-dlp # Creates new database
Scenario 3: Configuration Loss
Symptoms: Invalid or missing configuration
Recovery:
# Restore from backup
cp /backup/aquilon-dlp/latest/aquilon_dlp_config.toml /etc/aquilon-dlp/
# Or download default
curl -o /etc/aquilon-dlp/aquilon_dlp_config.toml \
https://raw.githubusercontent.com/aquilonsecurity/aquilon-dlp/main/docs/config-examples/aquilon_dlp_config_enterprise.toml
# Validate
aquilon-dlp --validate-config /etc/aquilon-dlp/aquilon_dlp_config.toml
# Restart
sudo systemctl start aquilon-dlp
Scenario 4: Fleet-Wide Outage
Symptoms: Multiple endpoints affected
Recovery:
- Identify root cause (bad update, configuration push, etc.)
- Prepare fix (rollback version, configuration fix)
- Deploy fix via MDM or configuration management
- Monitor recovery
Version Rollback
Download Previous Version
Download the previous version from the Aquilon Security portal and save to /tmp/aquilon-dlp-previous.
Rollback Procedure
# Stop current service
sudo systemctl stop aquilon-dlp
# Backup current binary
cp /usr/local/bin/aquilon-dlp-enterprise /usr/local/bin/aquilon-dlp-enterprise.bak
# Install previous version
cp /tmp/aquilon-dlp-previous /usr/local/bin/aquilon-dlp-enterprise
chmod +x /usr/local/bin/aquilon-dlp-enterprise
# Restart
sudo systemctl start aquilon-dlp
# Verify version
aquilon-dlp-enterprise --version
MDM Rollback
- Upload previous version to MDM
- Deploy to affected endpoints
- Monitor deployment status
Recovery Procedures
Minimal Recovery (Configuration Only)
Fastest recovery - loses historical data but restores monitoring:
- Download fresh binary from the Aquilon Security portal
- Install to
/usr/local/bin/aquilon-dlp-enterprise - Restore configuration from backup or use default
- Restart aquilon-dlp service
# Restore configuration from backup
cp /backup/aquilon-dlp/latest/aquilon_dlp_config.toml /etc/aquilon-dlp/
# Restart service
sudo systemctl restart aquilon-dlp
Full Recovery (With History)
Complete recovery with all historical data:
# 1. Install binary from Aquilon Security portal
# Save to: /usr/local/bin/aquilon-dlp-enterprise
# 2. Restore from backup
tar -xzf /backup/aquilon-dlp/latest.tar.gz -C /tmp/
cp /tmp/backup/aquilon_dlp_config.toml /etc/aquilon-dlp/
cp /tmp/backup/aquilon_dlp.db /var/lib/aquilon-dlp/
# 3. Verify integrity
sqlite3 /var/lib/aquilon-dlp/aquilon_dlp.db "PRAGMA integrity_check;"
# 4. Restart service
sudo systemctl restart aquilon-dlp
macOS Recovery
FDA Re-grant After Recovery
After recovery on macOS, FDA may need re-granting:
-
Check profile:
sudo profiles list | grep aquilon -
If missing, redeploy PPPC profile via MDM
-
Reinstall app:
sudo rm -rf /Library/Application\ Support/aquilon-dlp.app # MDM will reinstall on next check-in -
Verify FDA:
sudo sqlite3 /Library/Application\ Support/com.apple.TCC/TCC.db \ "SELECT auth_value FROM access WHERE client = 'dev.aquilon.dlp-plugin';"
Verification
Post-Recovery Checklist
- Service running:
systemctl status aquilon-dlp - Configuration valid:
–validate-config - Database accessible: OSQuery tables return data
- Findings generating: New alerts appearing
- Monitoring active: Prometheus metrics available
- macOS: FDA granted (if applicable)
Recovery Test Queries
-- Service health - verify table exists
SELECT COUNT(*) as total_alerts FROM aquilon_dlp_alerts;
-- Recent activity
SELECT COUNT(*) as alerts_24h
FROM aquilon_dlp_alerts
WHERE timestamp > (strftime('%s', 'now') - 86400);
-- Alert breakdown
SELECT severity, COUNT(*) as count
FROM aquilon_dlp_alerts
GROUP BY severity;
Automated Recovery
Systemd Auto-Restart
Configure in service file:
[Service]
Restart=on-failure
RestartSec=10s
StartLimitBurst=5
StartLimitIntervalSec=60s
Health Check Script
#!/bin/bash
# /usr/local/bin/aquilon-dlp-healthcheck.sh
if ! systemctl is-active --quiet aquilon-dlp; then
echo "Service down, attempting restart"
systemctl start aquilon-dlp
sleep 10
if ! systemctl is-active --quiet aquilon-dlp; then
echo "CRITICAL: Service failed to start"
# Send alert to monitoring system
exit 1
fi
fi
exit 0
Add to crontab:
*/5 * * * * /usr/local/bin/aquilon-dlp-healthcheck.sh
Communication Plan
During Outage
- Notify security team of reduced DLP coverage
- Update incident ticket
- Monitor recovery progress
Post-Recovery
- Verify all endpoints recovered
- Check for data gaps in findings
- Document root cause
- Update runbooks if needed
Prevention
Regular Testing
- Monthly: Test restore from backup
- Quarterly: Full DR drill
- Annually: Review and update DR plan
Monitoring
Set up alerts for:
- Service down
- Database corruption
- Configuration validation failures
- Scan rate drops