Digital Forensics Regex Challenge

Use regular expressions to solve the following challenges and uncover evidence in this outsider threat scenario.

Scenario: The External Threat

Your organization has detected suspicious activity suggesting an outsider may be exfiltrating sensitive data.

As the forensic analyst, you need to investigate logs from April 8-9, 2025 to identify the culprit and understand what happened.

Your mission: Solve the four progressive challenges that require regex skills to identify the culprit.

Challenge 1: Identify the Intruder

The first step is to identify any suspicious external IPs attempting to access our systems.

  1. Examine the access-log.txt file
  2. Look for any IP addresses that aren't in our internal network range (192.168.x.x)
  3. Create a regex to find evidence of failed login attempts (HTTP status code 401)

HTTP status code 401 means "Unauthorized" - look for this number in the logs.

Try creating a regex that matches an IP address excluding the internal network range followed by something that contains "401".

Account for irrelevant characters within a line using the pattern "".*" which means any non-newline character 0 to as many times.

Solution Regex:

^(?!192\.168\.)([25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}([25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).*401

What This Reveals:

External IP address 198.51.100.72 made failed login attempts with 401 (Unauthorized) error codes.

Challenge 2: When Did They Gain Access?

Now that we've identified a suspicious IP, we need to determine if and when they successfully accessed our systems.

  1. Examine the auth-log.txt file
  2. Create a regex to find successful login events for the suspicious IP
  3. Note the timestamp of any successful logins

Look for the word "Accepted" in combination with the suspicious IP address.

The auth logs show successful login attempts with the format "Accepted password for [username]".

Solution Regex:

Accepted password for admin from 198\.51\.100\.72

What This Reveals:

After three failed attempts, the intruder successfully logged in as "admin" at 23:19:45 on April 8, 2025. The log even flags this as "SUSPICIOUS_LOGIN_TIME" since it's after normal business hours.

Challenge 3: What Data Was Compromised?

Now we need to determine what sensitive data may have been compromised during this breach.

  1. Examine the auth-log.txt file contents
  2. Create a simple regex to find any command containing the word "customer" to identify what customer data was accessed

Look for the keyword "customer" in the command logs.

For a basic regex, you can simply use the word "customer" to find all related entries.

Solution Regex for auth-log.txt:

.*customer.*

What This Reveals:

This will match the line in auth-log.txt showing:

Apr 09 01:15:41 server sudo: admin : USER=root ; COMMAND=/bin/tar -czvf /tmp/secret_archive.tar.gz customer_database.xlsx

The intruder created an archive of our customer database file, which contains sensitive customer information.

Challenge 4: How Was the Data Exfiltrated?

Finally, we need to determine how the intruder removed the data from our network.

  1. Examine the email-log.txt file
  2. Create a simple regex to find email addresses from external addresses
  3. This will reveal where data was sent outside our organization

Look for email addresses containing "competitor.com" in the domain part.

Look for evidence of emails to external domains, uploads to external sites, or USB storage activity.

Solution Regex (for all logs):

@(?!company\.com).*\.com

What This Reveals:

This will match the line in email-log.txt showing:

Apr 09 01:23:32 mail-server postfix[7830]: to=<jane.doe@competitor.com>, relay=mail.competitor.com, status=sent

The intruder sent our customer database to an email address at our competitor, indicating corporate espionage.

CASE SOLVED!

Investigation Summary


Intruder: External actor from IP 198.51.100.72

Access Time: April 8, 2025 at 23:19:45 (after hours)

Compromised Data: Customer Database (Excel format)

Exfiltration Methods:

  • Email to competitor (jane.doe@competitor.com)

Key Evidence: Multiple connections to competitor.com domain, suggesting corporate espionage.

Available Log Files