Industry News

Being Forensically Curious: Finding and Parsing

Our previous two blog posts about being forensically curious have covered how to discover important information about new and/or unsupported apps, and how to test your hypotheses about the way the apps store data. Relying on research by Magnet Forensics’ Jessica Hyde and Basis Technology’s Cesar Quezada, as well as commentary by forensic research experts Cheeky4n6Monkey and the SANS Institute’s Heather Mahalik, we now explore the final two parts of the methodology: finding and parsing.

A test image, especially of a smartphone, can be large; within it, you need to be able to find applicable test data. This isn’t as easy as looking in data/com for the app’s storage; associated files, such as attachment data, might be media/0. Moreover, different apps store data in different paths, so potentially there could be multiple places in which evidence is located.

Look at the data in a SQLite or other viewer. Clues that can help you include:

  • The app vendor name—not just the app name—in the file path.
  • Look at all possible files. “You’d need to have an idea of what the app does and what type of information is potentially stored,” says Cesar. “A photos app would save pictures, maybe have a running database that stored time, date, coordinates of when the pictures were taken.” From there, determine how apps use those files.
  • Keyword search for “chat” or “message.” However, keep in mind this is no guarantee; every developer chooses filenames differently.
  • Look for applicable files and databases based on test data.
  • Associated key files, which will have .dat extensions, for encrypted databases or fields. You might, for example, see a path that looks like Gamechat1.dat, Gamefriend2.dat, and so on.

Cheeky4n6Monkey adds some points to consider:

  • For a field like a chat threadid, you might need to run more than one test session/test account to confirm/deny your theory about that field.
  • App timestamps are typically seconds/milliseconds since 1JAN1970 (Unix Epoch) for Android. For iOS, they can also be seconds since 1JAN2001 (CFAbsolute / OS X Epoch). Identifying timestamp fields may be helped by generating timestamps that fit the time period being investigated and searching the database fields for similar values. Analysts should also confirm if the time is referenced to a local time zone or [Greenwich Mean Time].
  • Database temporary files (journals/write ahead logs, e.g. messages not yet sent or messages deleted) may be a potential source of data.
  • Some app/OS log files may contain metadata about when the app was used or when messages were sent/received.
  • Check the app permissions to determine what other kinds of information may be stored but not necessarily displayed (e.g. location).

Heather goes further: “I find that [database fields] vary per application, per application version, per phone and per OS on the device. Lots of moving parts here. Best practices would be to always verify. Take what you know and apply it to each application. I will always look at what is installed on the smartphone and then dive right into the what is parsed by my tool. A good tool will tell you where it’s recovering data from within the device.

“I then go to the application directories and manually scan each database or file of interest to ensure nothing is overlooked,” Heather adds. “I have found major glitches in tools where an entire column of a database is not interpreted correctly and manual examination shows the data for analysis. It may be a lot of work, but it’s necessary when examining third-party applications.”

Once you’ve found all the data you need, it’s time to parse it—to decode, or break down, the data into human readable format.

A Few Things to Keep in Mind About App Data Parsing

Before you get deep into the databases you need, think about the following:

  • Metadata about the app may be intertwined with user data. Once you learn something about the way it stores content, look for correlating artifacts to solidify the hypothesis.
  • Sometimes you may see distractor content, such as cached images of contacts the subject never communicated with.
  • Data could be structured, like date and time stamps, or unstructured, like chat message threads.

Cesar explains correlating artifacts in this way: “Oftentimes, an app will require the user to give it access to different parts of your phone. Say your camera, your microphone, your contacts, etc. If the app is accessing those pieces of information, it’s possible the same contacts in your contacts app are also in the app you’re parsing. Maybe they’re all there, or maybe only a few you selected. Testing would help determine how much or little would bleed over.”

Heather adds another example: “If you believe that someone was in a specific location when an action took place on the phone, you can correlate WiFi or cell towers to the metadata of the ‘action’ to place the device in a location.

“An example scenario: a smartphone user is tagged in a photo on Facebook on September 7, 2017 by a friend in California. Some commercial forensic tools parse locations from Facebook, which may make it seem like the user was in California on September 7, 2017, but the WiFi and cell tower data will show otherwise. Additionally, any other artifacts that place a user in a location other than the tag can be used to disprove that finding.”

(What’s difficult, she observes: “Examiners trust what the tool tells them and instead of verifying the artifact, they simply trust it and report false information.”)

Once you find the correct databases, use a SQLite or other (for example, plist) viewer; you can identify the files by extension or by looking at the header in the hex code. Basic database fields for a typical chat message include:

  • This field is represented by the number 1 and appears in a local column.
  • This field is represented by a different number. While sometimes, this opposing party appears in the opposing column, other times only one column is used. In this case, you’ll see only the opposing party whether the message was sent or received; the field won’t show a unique identifier for the local party, and you’ll have to look at the Direction column to identify sender or receiver.
  • Time/date stamps. As with senders and receivers, sometimes two separate columns will appear to denote messages sent to and from the local party; other times, the stamps will appear only for the opposing party, and you must refer to the Direction column to match the messages.
  • This flag or column indicates message status: Unsent, Read, Draft, Sent. It’s sometimes possible to indicate message direction from this column.
  • If a chat message refers to these, locate where these are stored, which may be in another directory entirely.
  • This contains the body of a message itself.
  • Some newer apps allow users to share a PIN code to send and receive data with one another. From this, you may be able to obtain a pictorial reference of the map that was drawn. Other apps only allow the user to drop a pin to denote their current coordinate; others allow them to pin a map anywhere in the world, potentially confounding your investigation. Still other apps offer the ability to flag for current vs. other locations, enabling you to prove that a particular location and time stamp correlate.

The upshot of all these variations: test all of an app’s functionality carefully and thoroughly, and document what you find. Heather recommends documenting the following steps, including full file paths: “I [first] look for which applications are installed on the device. From there, I use my tools to triage and see what they can parse. The tool should tell you where it’s pulling that data from (database/table, plist, xml, etc).

“I then navigate to those files that are being parsed and do a cursory scan to make sure nothing is overlooked. Sometimes a database will be supported by a tool, but only to parse specific tables. I want to make sure all relevant tables are parsed.

“After that, I go to the application directory and examine any other files of interest. For Android, you also have to consider the media and SD card directories. A keyword search within your tool will be helpful here.”

Applying Your Method to Case Data

Depending on what the app developer chose, sometimes, just as you can use a SQLite viewer to parse its data, you can also use SQL queries to help with analysis. You can also Google to see if you can find an open source parser for that particular app, or if a parser for a different app made by the same developer might work for the app you’re examining.

If you followed our instructions in our last blog for creating known test data, here’s where it comes into play: working with the app’s .dat files, which may be in plain text or may be encoded with proprietary formatting. Because you know all the data you created, you’ll be able to tell which is which—and, as a result, you’ll be able to correlate one or more files, tables within one database, or multiple databases to each other because of multiple files in those databases.

Cheeky4n6Monkey adds, “If it’s an Android app, the analyst may be able to decompile and view the .apk source code to determine app behavior.” Compare your evidentiary data to the data content you created, noting how, for example, two simultaneous conversations, their time stamps, and their user IDs are stored.

This is easy when it’s stored in plain text. When it isn’t—when chat messages are encoded—you may not be able to get message content, but you can still obtain metadata about its sender, time/date, etc. Conversely, the metadata, or a certain field within the database, may be encoded, even if the message isn’t.

Keep this in mind: many apps advertise that they offer end-to-end encryption of content. While many do for data in transit, they don’t for data at rest, so that stored data remains unencoded. Search for scripts or papers to help you with this aspect of parsing; additionally, some commercial vendors will help work on specific apps in a time-sensitive case.

This in-depth three-step forensic research methodology series started with our previous blog, “The Process of Discovery,” led to “The Process of Testing,” and will conclude with our final blog, “The Process of Scripting.” Subscribe to our blog to get updates on future mobile forensics topics!