Data Exfiltration Investigation
When monitoring detects large outbound data transfers, unusual cloud storage uploads, or archive file creation on sensitive systems, investigate for data exfiltration. The MOVEit Transfer vulnerability exploitation by the Cl0p ransomware group (May-June 2023) resulted in data theft from over 2,600 organizations affecting 77 million individuals, demonstrating that exfiltration can be massive, automated, and completed before detection. Identify what data was taken, how it left the network, and the full scope of exposure.
Overview
Data exfiltration is the unauthorized transfer of data from an organization. It represents the attacker achieving their objective in many breach scenarios, whether for intellectual property theft, double extortion ransomware, or espionage. The Cl0p group MOVEit exploitation (2023) demonstrated mass-automated exfiltration, stealing data from over 2,600 organizations before deploying ransomware.
APT groups like APT29 (SolarWinds) exfiltrate data slowly over months to avoid detection. Insider threats may copy files to personal cloud storage over weeks. This playbook covers detection and investigation of exfiltration regardless of method, network-based, cloud-based, physical media, or DNS tunneling.
When You See This
- 1
DLP alert for sensitive files being uploaded to personal cloud storage (Google Drive, Dropbox, OneDrive personal)
- 2
Firewall logs show unusually large outbound data transfers (GB+) to external IPs, especially during off-hours
- 3
Archive file creation (7zip, RAR, WinRAR) detected on servers containing sensitive data
- 4
DNS query logs show high-volume queries with encoded data in subdomains (DNS tunneling pattern)
- 5
Abnormal database query volumes or bulk export operations from data warehouses
Investigation Steps
- 1
Quantify the data transfer
Determine exactly how much data left the network, when it started, and where it went. Check firewall logs for total bytes transferred to each external destination. Compare against baselines for the source systems. Data exfiltration is often staged; the attacker collects data internally first, then transfers it in bulk.
FirewallSIEMindex=firewall action=allowed direction=outbound | stats sum(bytes_out) as total_bytes by src_ip, dest_ip, dest_port | where total_bytes > 500000000 | eval GB=round(total_bytes/1073741824,2) | sort -total_bytes
index=firewall src_ip="suspect_host" direction=outbound | timechart span=1h sum(bytes_out) as bytes_out_hourly | where bytes_out_hourly > 100000000
- 2
Identify the exfiltration method
Determine HOW the data left. Common methods: direct HTTPS upload to cloud storage, FTP/SFTP to attacker infrastructure, DNS tunneling (encoded data in DNS queries), email with attachments, or physical USB media. Each method requires different log sources to investigate.
SIEMFirewallindex=proxy dest_domain IN ("drive.google.com","dropbox.com","mega.nz","transfer.sh","pastebin.com") src_ip="suspect_host" | stats sum(bytes_out) as uploaded by dest_domain | sort -uploadedindex=dns query_type=TXT src_ip="suspect_host" | stats count, avg(len(query)) as avg_query_length by query_domain | where avg_query_length > 50 AND count > 100
Decision Point
If: Data was sent to known cloud storage or file sharing services
Yes → Check if the destination is a personal account vs corporate. Attempt to identify or recover the data via cloud provider cooperation (legal hold/warrant may be needed).
No → Data went to attacker-controlled infrastructure. Focus on identifying what data was taken and assess regulatory impact.
- 3
Determine what data was accessed and staged
Trace back from the exfiltration to determine what the attacker accessed. Check database query logs, file access audits, and archive creation events. In the MOVEit attacks, Cl0p used SQL injection to extract data directly from the database. Look for bulk file access patterns that differ from normal user behavior.
SIEMXDRindex=endpoint dest_host="data_server" process_name IN ("7z.exe","rar.exe","zip.exe","tar.exe") | table _time, user, command_line, file_path | sort _timeindex=database user="suspect_account" | stats count as queries, sum(rows_returned) as total_rows by table_name, query_type | where total_rows > 10000 | sort -total_rows
- 4
Assess regulatory and legal impact
Determine if the exfiltrated data includes PII, PHI, financial records, or intellectual property. Different data types trigger different regulatory reporting requirements (GDPR 72-hour notification, HIPAA breach notification, SEC disclosure for public companies). Work with legal and compliance teams to assess obligations.
SIEM - 5
Contain, preserve evidence, and notify
Block the exfiltration destination at the firewall. Preserve all logs and evidence for forensic analysis and potential legal proceedings. If this is a double-extortion ransomware scenario, the stolen data may be published on a leak site; prepare the communications response. Coordinate with legal on breach notification timelines.
FirewallSIEM
Common Mistakes
- 1
Focusing only on the volume of data transferred without determining WHAT data was taken; 1GB of customer PII has different impact than 1GB of marketing materials
- 2
Not checking for data staging (archive creation) on internal systems before exfiltration; this reveals the attackers intent and scope
- 3
Assuming exfiltration stopped because you blocked one channel; sophisticated attackers maintain multiple exfiltration methods
- 4
Delaying legal/compliance notification; GDPR requires notification within 72 hours of discovery
Escalation Criteria
Any confirmed unauthorized data transfer to external destinations
PII, PHI, or regulated data confirmed in the exfiltrated dataset
Evidence of database dumps or bulk file access preceding the exfiltration
Practice This Investigation
SOCSimulator provides hands-on training rooms where you work through real-world attack scenarios, including data exfiltration investigation investigations with live SIEM alerts. Build analyst muscle memory with zero consequences. Free forever.
Frequently Asked Questions
- What was the MOVEit breach and how does it relate to exfiltration?
- In May-June 2023, the Cl0p ransomware group exploited a zero-day vulnerability in MOVEit Transfer (CVE-2023-34362) to mass-exfiltrate data from over 2,600 organizations affecting 77 million individuals. They automated SQL injection to extract data without deploying ransomware; pure data theft for extortion. It demonstrated that exfiltration can be industrial-scale and completed in hours.
- How can I detect DNS tunneling?
- DNS tunneling encodes data in DNS query subdomains. Look for: unusually long DNS queries (>50 characters), high query volume to a single domain, TXT record queries from non-mail systems, and domains with high entropy in subdomain names. Normal DNS queries are short and predictable; tunneling queries look like random strings.
- How do I practice exfiltration investigations?
- SOCSimulator includes multi-stage scenarios with data exfiltration phases. Practice correlating firewall, SIEM, and DLP alerts to trace data movement. Start free forever.
Related SOC Training Resources
Exfiltration Over Alternative Protocol (T1048) — Detection Training
Adversaries may steal data by exfiltrating it over a different protocol than that used for command and control. Data exf…
Read more TechniqueExfiltration Over C2 Channel (T1041) — Detection Training
Adversaries may steal data by exfiltrating it over an existing command and control channel. Stolen data is encoded into …
Read more TechniqueArchive Collected Data (T1560) — Detection Training
An adversary may compress and/or encrypt data that is collected prior to exfiltration. Compressing the data can help to …
Read more GlossaryWhat is Ransomware? — SOC Glossary
Ransomware is malware that encrypts victim data or systems and demands payment (typically cryptocurrency) for the decryp…
Read more GlossaryWhat is Exfiltration? — SOC Glossary
Data exfiltration is the unauthorized transfer of sensitive data from a victim environment to attacker-controlled infras…
Read more GlossaryWhat is Incident Response? — SOC Glossary
Incident response (IR) is the structured process for preparing for, detecting, containing, eradicating, recovering from,…
Read more GlossaryWhat is Containment? — SOC Glossary
Containment is the incident response phase focused on limiting the spread and impact of a confirmed security incident: i…
Read more Career PathIncident Responder Career Guide — Salary & Skills
Incident Responders lead the technical response when confirmed breaches happen. You coordinate containment, run forensic…
Read more Career PathDFIR Analyst Career Guide — Salary & Skills
DFIR Analysts combine forensic investigation with incident response. You collect and analyze digital evidence from compr…
Read more ComparisonSOCSimulator vs LetsDefend — Comparison
SOCSimulator wins on operational realism. You get multi-tool shift simulation with SLA pressure, noise injection, and al…
Read more ComparisonSOCSimulator vs TryHackMe — Comparison
SOCSimulator is the better tool for dedicated SOC analyst preparation. TryHackMe is the better tool for broad cybersecur…
Read more PlaybookPhishing Email Investigation — Investigation Playbook
When a phishing email is reported or detected, investigate by analyzing email headers for spoofing indicators, inspectin…
Read more