Skip to main content
Hard difficultyExfiltration & Impact40-60 minutes
SIEMXDRFirewall

Data Exfiltration Investigation

When monitoring detects large outbound data transfers, unusual cloud storage uploads, or archive file creation on sensitive systems, investigate for data exfiltration. The MOVEit Transfer vulnerability exploitation by the Cl0p ransomware group (May-June 2023) resulted in data theft from over 2,600 organizations affecting 77 million individuals, demonstrating that exfiltration can be massive, automated, and completed before detection. Identify what data was taken, how it left the network, and the full scope of exposure.

Overview

Data exfiltration is the unauthorized transfer of data from an organization. It represents the attacker achieving their objective in many breach scenarios, whether for intellectual property theft, double extortion ransomware, or espionage. The Cl0p group MOVEit exploitation (2023) demonstrated mass-automated exfiltration, stealing data from over 2,600 organizations before deploying ransomware.

APT groups like APT29 (SolarWinds) exfiltrate data slowly over months to avoid detection. Insider threats may copy files to personal cloud storage over weeks. This playbook covers detection and investigation of exfiltration regardless of method, network-based, cloud-based, physical media, or DNS tunneling.

When You See This

  1. 1

    DLP alert for sensitive files being uploaded to personal cloud storage (Google Drive, Dropbox, OneDrive personal)

  2. 2

    Firewall logs show unusually large outbound data transfers (GB+) to external IPs, especially during off-hours

  3. 3

    Archive file creation (7zip, RAR, WinRAR) detected on servers containing sensitive data

  4. 4

    DNS query logs show high-volume queries with encoded data in subdomains (DNS tunneling pattern)

  5. 5

    Abnormal database query volumes or bulk export operations from data warehouses

Investigation Steps

  1. 1

    Quantify the data transfer

    Determine exactly how much data left the network, when it started, and where it went. Check firewall logs for total bytes transferred to each external destination. Compare against baselines for the source systems. Data exfiltration is often staged; the attacker collects data internally first, then transfers it in bulk.

    FirewallSIEM
    index=firewall action=allowed direction=outbound | stats sum(bytes_out) as total_bytes by src_ip, dest_ip, dest_port | where total_bytes > 500000000 | eval GB=round(total_bytes/1073741824,2) | sort -total_bytes
    index=firewall src_ip="suspect_host" direction=outbound | timechart span=1h sum(bytes_out) as bytes_out_hourly | where bytes_out_hourly > 100000000
  2. 2

    Identify the exfiltration method

    Determine HOW the data left. Common methods: direct HTTPS upload to cloud storage, FTP/SFTP to attacker infrastructure, DNS tunneling (encoded data in DNS queries), email with attachments, or physical USB media. Each method requires different log sources to investigate.

    SIEMFirewall
    index=proxy dest_domain IN ("drive.google.com","dropbox.com","mega.nz","transfer.sh","pastebin.com") src_ip="suspect_host" | stats sum(bytes_out) as uploaded by dest_domain | sort -uploaded
    index=dns query_type=TXT src_ip="suspect_host" | stats count, avg(len(query)) as avg_query_length by query_domain | where avg_query_length > 50 AND count > 100

    Decision Point

    If: Data was sent to known cloud storage or file sharing services

    Yes → Check if the destination is a personal account vs corporate. Attempt to identify or recover the data via cloud provider cooperation (legal hold/warrant may be needed).

    No → Data went to attacker-controlled infrastructure. Focus on identifying what data was taken and assess regulatory impact.

  3. 3

    Determine what data was accessed and staged

    Trace back from the exfiltration to determine what the attacker accessed. Check database query logs, file access audits, and archive creation events. In the MOVEit attacks, Cl0p used SQL injection to extract data directly from the database. Look for bulk file access patterns that differ from normal user behavior.

    SIEMXDR
    index=endpoint dest_host="data_server" process_name IN ("7z.exe","rar.exe","zip.exe","tar.exe") | table _time, user, command_line, file_path | sort _time
    index=database user="suspect_account" | stats count as queries, sum(rows_returned) as total_rows by table_name, query_type | where total_rows > 10000 | sort -total_rows
  4. 4

    Assess regulatory and legal impact

    Determine if the exfiltrated data includes PII, PHI, financial records, or intellectual property. Different data types trigger different regulatory reporting requirements (GDPR 72-hour notification, HIPAA breach notification, SEC disclosure for public companies). Work with legal and compliance teams to assess obligations.

    SIEM
  5. 5

    Contain, preserve evidence, and notify

    Block the exfiltration destination at the firewall. Preserve all logs and evidence for forensic analysis and potential legal proceedings. If this is a double-extortion ransomware scenario, the stolen data may be published on a leak site; prepare the communications response. Coordinate with legal on breach notification timelines.

    FirewallSIEM

Common Mistakes

  1. 1

    Focusing only on the volume of data transferred without determining WHAT data was taken; 1GB of customer PII has different impact than 1GB of marketing materials

  2. 2

    Not checking for data staging (archive creation) on internal systems before exfiltration; this reveals the attackers intent and scope

  3. 3

    Assuming exfiltration stopped because you blocked one channel; sophisticated attackers maintain multiple exfiltration methods

  4. 4

    Delaying legal/compliance notification; GDPR requires notification within 72 hours of discovery

Escalation Criteria

  • Any confirmed unauthorized data transfer to external destinations

  • PII, PHI, or regulated data confirmed in the exfiltrated dataset

  • Evidence of database dumps or bulk file access preceding the exfiltration

Practice This Investigation

SOCSimulator provides hands-on training rooms where you work through real-world attack scenarios, including data exfiltration investigation investigations with live SIEM alerts. Build analyst muscle memory with zero consequences. Free forever.

12,000+ analysts trained
4.9/5 rating
Free forever tier

Frequently Asked Questions

What was the MOVEit breach and how does it relate to exfiltration?
In May-June 2023, the Cl0p ransomware group exploited a zero-day vulnerability in MOVEit Transfer (CVE-2023-34362) to mass-exfiltrate data from over 2,600 organizations affecting 77 million individuals. They automated SQL injection to extract data without deploying ransomware; pure data theft for extortion. It demonstrated that exfiltration can be industrial-scale and completed in hours.
How can I detect DNS tunneling?
DNS tunneling encodes data in DNS query subdomains. Look for: unusually long DNS queries (>50 characters), high query volume to a single domain, TXT record queries from non-mail systems, and domains with high entropy in subdomain names. Normal DNS queries are short and predictable; tunneling queries look like random strings.
How do I practice exfiltration investigations?
SOCSimulator includes multi-stage scenarios with data exfiltration phases. Practice correlating firewall, SIEM, and DLP alerts to trace data movement. Start free forever.
Technique

Exfiltration Over Alternative Protocol (T1048) — Detection Training

Adversaries may steal data by exfiltrating it over a different protocol than that used for command and control. Data exf…

Read more
Technique

Exfiltration Over C2 Channel (T1041) — Detection Training

Adversaries may steal data by exfiltrating it over an existing command and control channel. Stolen data is encoded into …

Read more
Technique

Archive Collected Data (T1560) — Detection Training

An adversary may compress and/or encrypt data that is collected prior to exfiltration. Compressing the data can help to …

Read more
Glossary

What is Ransomware? — SOC Glossary

Ransomware is malware that encrypts victim data or systems and demands payment (typically cryptocurrency) for the decryp…

Read more
Glossary

What is Exfiltration? — SOC Glossary

Data exfiltration is the unauthorized transfer of sensitive data from a victim environment to attacker-controlled infras…

Read more
Glossary

What is Incident Response? — SOC Glossary

Incident response (IR) is the structured process for preparing for, detecting, containing, eradicating, recovering from,…

Read more
Glossary

What is Containment? — SOC Glossary

Containment is the incident response phase focused on limiting the spread and impact of a confirmed security incident: i…

Read more
Career Path

Incident Responder Career Guide — Salary & Skills

Incident Responders lead the technical response when confirmed breaches happen. You coordinate containment, run forensic…

Read more
Career Path

DFIR Analyst Career Guide — Salary & Skills

DFIR Analysts combine forensic investigation with incident response. You collect and analyze digital evidence from compr…

Read more
Comparison

SOCSimulator vs LetsDefend — Comparison

SOCSimulator wins on operational realism. You get multi-tool shift simulation with SLA pressure, noise injection, and al…

Read more
Comparison

SOCSimulator vs TryHackMe — Comparison

SOCSimulator is the better tool for dedicated SOC analyst preparation. TryHackMe is the better tool for broad cybersecur…

Read more
Playbook

Phishing Email Investigation — Investigation Playbook

When a phishing email is reported or detected, investigate by analyzing email headers for spoofing indicators, inspectin…

Read more

We use cookies to improve your experience and measure usage. Learn more