Being Forensically Curious: The Process of Discovery
In previous blogs, we’ve talked about the importance of building the skills that will help you grow your forensic expertise, including research and testing your theories, and some ideas on how to make time to do so. In this part of our series, we’ll focus on the discovery process.
Magnet Director of Forensics Jessica Hyde, together with Cesar Quezada, senior mobile forensics examiner and team lead at Basis Technology, laid out this process as part of their 2016 presentation, “Mobile App Parsing: All about that Data.” In Jessica’s later webinar for Magnet Forensics, “Methods for Parsing New Applications,” she explained that the methodology isn’t useful only for new apps—but also for any forensic validation or other research.
With the recently launched Magnet Artifact Exchange actively seeking new scripts, we caught up with Jessica, Cesar, and other experts in the mobile forensics community to talk through some additional detail about the methodology you can use as a foundation for forensic work. Our next few blog posts will cover what we learned. In our first part: Discovery.
Why is Discovery Necessary?
Apps, and by extension the data within them, change constantly:
- New apps are introduced each month at a pace that commercial forensic tool vendors can’t possibly keep up with. In just one year, according to Statista, the Google Play Store added 800,000 new apps—up from two million the previous year. Meanwhile, 2 million apps are available in the Apple App Store.
- Existing apps are regularly updated, with new versions that can include new or enhanced features.
- All the while, nefarious actors seek new or updated apps and ways to use them—especially if they feature enhanced security functionality.
While commercial tools may parse all or just some of the data from new or updated apps, that’s a chance you shouldn’t take when it comes to a big or high profile case. It takes time for commercial vendors to build support for each. In the space of a single month or quarter, the vendor has to:
- Receive new data
- Recognize the app’s schema—it’s possible that the way it stores data has changed. Moreover, some apps don’t even store data in SQLite databases. (Cesar notes: SQLite is the most popular and probably the one most developers know, but there are other competing databases. Indeed, many mobile developers prefer other forms of databases for security reasons or optimization aspects.)
- Build test data and a parser
- Add to development cycle (whose turnaround time may not match usage patterns)
Five commercial mobile forensics tools support only about 400 popular apps (apart from new versions)—a tiny fraction of the four million. You’re still responsible for finding data, and that’s why you need a methodology you can use to find and validate data. Discovery is the first step in that methodology.
Forensic expert Cheeky4n6Monkey says that when his research and development team is asked to write a script to extract and present the data, it’s because an analyst has seen some relevant data on the device or in the physical dump. “Typically, the analyst is looking for communication artifacts, but we have also been asked to look for artifacts such as timestamps for picture thumbnails,” he says.
Note: it may not be possible to do this proactively. “While it would be ideal to have time to perform non-case related app research,” Cheeky4n6Monkey adds, “the increasing numbers of apps/devices and also the changing popularity of those apps means we have to focus on case-related research.” In other words, plan to build your playbook as you go.
Step 1: Create a Physical Image of the Device
A physical image—a full dump of block 0—is sometimes your first opportunity to be forensically curious. Why a physical image? Because logical and file system images depend on the structure provided to the forensic tool by the device’s operating system, these won’t help you find things that have never been seen before:
- A logical image offers everything that has already been interpreted, but will miss things that the forensic tool doesn’t know how to present and interpret.
- A file system image offers additional files and folders that can be parsed, but doesn’t give you access to unallocated space.
In other words, you have to be able to go underneath what the operating system presents. However, this may not be possible for one of three reasons:
- It may not be possible to create a full dump because of encryption or some other factor, like lack of support. This is true of non-jailbroken iOS devices.
- Advanced techniques like JTAG or chip-off aren’t allowed by your organization (and it should go without saying that you shouldn’t use methods on a test device that you wouldn’t be able to use with your evidence device).
- The image isn’t ingestible by all tools. This can happen when the device or its operating system or file system isn’t fully supported. When a full image isn’t supported—when the file system format chokes the forensic tool, for example—you can use other options to dump the user partition, which will still offer access to the relevant evidentiary data in the unallocated space within that partition.
(Backup files from a synced computer can contain a lot of valuable forensic data, so when a physical dump is unsupported, look for a backup.)
Part of discovery is, therefore, to determine what will work on a given device with a given operating system. You might end up using methods such as the “dd” command with USB debugging, and/or for Android, using a custom recovery tool like Team Win Recovery Project (TWRP, included within Magnet AXIOM and ACQUIRE) or ClockWorkMod (for older devices) to flash a new bootloader to a locked device.
Cesar notes that TWRP provided elevated privileges to access parts of the phone that aren’t present when you have a stock phone. Even so, doing this on a test device is imperative. Load the wrong TWRP image and you can damage the device; even loading the right one risks that you may be asked to wipe all user data, defeating your forensic purpose. Regardless of which method you learn works, be sure to document each step you take, why you chose that method, and what worked and didn’t work.
Once you’ve booted the device, use the Android Debugging Bridge (ADB). While it used to be necessary to use ADB via Android Studio—a developer tool—to download command line tools for the appropriate operating system, Google has since released the command line tools for ADB on its own—a much smaller download than getting the whole studio.
From there, use the “devblockplatformmsm_sdcc.1/by-name” command within the ADB shell to see the partition list and to look inside different partitions. This will help you to infer each partition’s use. (A hint: the largest partition is usually the one that contains user data, but the exact partition won’t be the same for every image.)
Step 2: Run the Image Through Your Commercial Tools
This step is to see what data commercial tools can parse from the physical image, to validate that what they parsed is correct, and to identify (using the file packages.list) installed apps that may be of interest, even if they aren’t fully supported.
If, for example, chat data isn’t parsed or has only been partially parsed for an app that you know has chat functionality, and you have reason to believe the device’s user has used it, take a screen shot showing that the tool failed to parse those apps—and then find it for yourself.
How do you know if an app is of interest, especially if it wasn’t parsed already by commercial tools? Consider that many social or gaming apps have private chat functionality; many apps additionally have access to contacts and location services. (However, beware screenshots or other images of apps, often found on their Google Play Store or iTunes Store, that show functionality which may or may not exist.)
User reviews can additionally offer insight into what the app does and why it might be relevant to your case. While you don’t want to delve into each app, taking the time to look at advertised features and reviews is an important part of due diligence.
Another good rule of thumb, says Cheeky4n6Monkey: “Start with what you know is being displayed on the app. If it’s displayed, then it’s likely stored.” From there, browse the data tables within a working (not the original!) copy of the app database. Look for user identifiers (userids, names, contact details), message content (text, binary content or references to files), and timestamps in various fields.
Once you’ve identified the apps you want to dive deeper on, it’s time to get curious once more. Research the apps for the permissions they require to function, as well as the features listed in the Play or iTunes Store, including the most recent version and what was included in it. Then, look at the appropriate path to ensure the permissions on the device are consistent with what you found.
Remember, additionally, that the Google Play Store may identify apps within the “My Apps” directory that were once installed on the device, even if they are no longer. It may be possible to find fragments of data left by those apps within unallocated space in the user partition.
Step 3: Find a Parser for the App
Again, if an app is unsupported or only partially supported by your go-to commercial tool, then part of due diligence is to find a tool that will offer the parsing support you need. In some cases, another forensic examiner may have programmed a parsing tool for the particular app. You can find these on GitHub, on the examiner’s blog, or repositories such as Magnet’s Artifact Exchange.
Even if a parser for that exact app doesn’t exist, find out whether the app developer has created other apps, and see if parsers exist for those. For example, as Jessica and Cesar’s research shows, Cheeky4n6Monkey’s Words with Friends parser worked on Chess with Friends because they’re both Zynga games with the same database file.
Asked how common this was in the app world, Cheeky4n6Monkey explains, “Generally, chat apps will store a sender, receiver, message and timestamp. For modern mobile apps, data is usually stored via several tables in an SQLite database.
“So, while there is a similar data infrastructure for apps, the specific field names/types/tables usually vary even between apps with similar functionality. On occasion a developer may write separate apps but re-use code for specific functionality (eg chat). This appears to be the case with Words With Friends and Chess With Friends.
“Usually, chat apps are written by different developers with similar (but not exactly the same) schemas. This makes writing parsing scripts somewhat easier as the general structure of the script may not have to change significantly between chat apps—just the query and processing/presenting sections.”
As far as finding existing parsers, searching Google and Github is a good start. However, as most apps use SQLite databases, if you know some programming (e.g. Python) you can also adapt/re-purpose an existing script to extract the data you want and present the information as you need it (eg HTML or TSV output files). Re-purposing a script is probably more likely to occur than finding a script that extracts exactly what you want.
This in-depth three-step discovery process leads you to the next step in forensic research methodology: testing. Subscribe to our blog (on the right of this page) to get the latest updates on this and other topics!