Pages

Saturday, June 25, 2016

Interpreting data from apps


Lots of apps means lots of data


All blog posts to date
In Android, just about all the data you care about will be app data.  Text messages?  App data from the SMS app.  Phone logs?  App data from the phone app.  Facebook chats?  App data from the Facebook app.

Now I think of myself as being a young guy.  At least, I'm not old, or too old.  I am a millennial.  So you could imagine the surprise when I heard the following comment from a college-aged family member:
Mark, you still text?  You are such a dinosaur.

Yes, I am a dinosaur for still texting.  Now I can remember literally laughing out loud when I first received a text from my father.  He had finally graduated from placing a phone call for even the simplest of messages to convey to sending a short message over SMS.   Fast forward a few years and I am a dinosaur for not graduating beyond texting.  But graduating to what?

Well, that college-aged family member said he and all his peers use apps to message each other.  Facebook, WhatsApp (which is now owned by Facebook), and I'm sure others.  C'mon, I asked him, what's wrong with texting?







And why use Facebook instead of texting?





Can you send photos over these apps the way I do with MMS?





And what exactly is WhatsApp?





Now beyond the young kids who dance on their elders' fossils for using a technology which my elders only recently began using (poorly), there are others who, for better or worse, use apps for communicating.  For a dramatic example, the terrorist organization ISIS famously uses the app Telegram for spreading propoganda.  Now you may think that the owners of Telegram may consider it a responsibility to shut down ISIS usage.  And Telegram has in fact made an effort to ban terrorist usage of their app, but that was only after laughing off the notion with the following line: I propose banning words. There’s evidence that they’re being used by terrorists to communicate.

Also sadly in the news was the app Kik  In the recent tragic murder and kidnapping at Virginia Tech, the perpetrators used the app Kik to lure in their prey.  The story is horrifying on so many levels.

Now what about other apps?  Games, sharing apps, social media, etc.  Might those apps store some significant data too?  This post is all about how to parse and interpret data from third-party apps.  Why is this significant?  If you get all the SMS and call logs and other traditional evidence, you may have missed the device owner's primary method of communication.

Now this post will not be a specific how-to-parse-this-app post.  Instead, this is a generic guide for parsing that may be of help.  I hope to convey methods apps store data and how to access and read this data.  I believe the challenge of parsing and interpreting app data is or will soon become more tedious than imaging devices.

Where do apps store data?
First, Android security prohibits users from accessing the userdata partition, which is where all apps store their data.  (Some apps may also store some data on the SD card, but this is "unprotected" data.  Not the "good stuff.")  You either need an image of the device (and you can create an image using my post on live imaging an Android device) or you need root access.  In this post, I am working from an image of a device.

Android by default stores user data in the /userdata partition in the directory /data.  The below screenshot is from a screenshot of FTK Imager looking at the data directory.



You'll see that within the data directory are directories containing package names.  The directory air.WatchESPN stores user data associated with the WatchESPN app.  The directory com.google.android.youtube stores data associated with the YouTube app.  The directory com.android.chrome will story web history and other data associated with the Chrome browser.

What kinds of data?
By default, most user data is stored in SQLite databases.  For a writeup on viewing SQLite data, check out my previous post on SQLite databases.

Most apps use SQLite in some fashion or another.  And if the app you are trying to parse stores all its data in SQLite with no encryption of any kind, you are good to go.

Other types of data can also be found.  The following is a screenshow of the YouTube app's storage:  



All of the directories above store data associated with the YouTube app.  The databases directory will contain SQLite databases.  The directory shared_prefs stores XML files which may be interesting or not.  Depending upon the app, the XML files may store data about the user, such as usernames or maybe even passwords if the developer has a poor grasp on security.  The XMLs can also be pretty much useless.  XMLs can be opened in any text editor of choice.

There also is a "files" directory.  This directory can store anything.  A developer can store images, videos, text, or even more databases in the files directory.

So if your app stores only databases in a nice SQLite format, some user information in XMLs in shared_prefs, and some images or other interesting files in the files directory, you can easily interpret all of this data.  But what about challenging data?  I will highlight two challenges to consider with app parsing.

Encryption
Many apps use encryption to store sensitive data.  In some cases, you open a SQLite database and are parsing through and you get to a table with nothing but random-looking junk stored in rows.  If you come across such a finding, you may have found some encryption.  This is an example of encrypted content within a database.  And if you find such content, you will need to find a way to decrypt the data to make any sense of it.  I am no crypto expert, so I would consult one if I came across such a finding.

And a side note.  If you come across a string that looks like the following:
TWFyaywgYXMgaW4gdGhhdCBuZXJkIHdobyB3cml0ZXMgaHR0cDovL2ZyZWVhbmRyb2lkZm9yZW5zaWNzLmJsb2dzcG90LmNvbSwgaXMgc3VjaCBhIGRpbm9zYXVyIQ==

you have come across Base64 encoded text.  This is not encryption and can easily be decoded.  Base64 encodes anything into a random set of uppercase, lowercase, numbers, and some symbols, and if padding is required, the string ends with equals signs.  Here is a Wikipedia page on Base64.  You can come across Base64 in databases, XML files, URLs, or just about anyplace.

Some apps also encrypt their databases entirely.  WhatsApp, for example, encrypts their entire database.  If you come across an app that you know is storing a good amount of data on the device and yet you cannot find a database but you find entire files of random-looking data, you may have found encrypted databases.  Again, consult a crypto expert.

Non-standard data storage
App developers are developers, and developers like to develop things.  What do I mean?  I mean that developers often get tired of using built-in functions, like SQLite, and so they choose to implement a different database format, or they make their own.  I have plenty of jokes to make at my engineer friends' expenses about over-engineering everything.

So while SQLite is a nice and easy format to parse, there are other database formats out there for Android.  Here is a list of ten known and available non-SQLite database formats.  If you come across an app with a non-SQLite database, you will need to find a way to interpret all the data.  If there is not a parser available, you can use a hex editor to simply view the database and make sense of it, or you can (or have a developer) write a parser so that next time you come across such a database, you will be ready.

Strategies
So big picture.  What do you do when you find an app on a device and you are unfamiliar with it?  Here is a list of steps you can do to make sure you get all the possible data.
  • Examine all the data files.  View SQLite databases.  Open XMLs in a text editor.  Open unknown files in hex editors.  View any media.  Make sure you view everything.
  • Check out the file /data/system/packages.xml.  This is a file which stores information about all apps installed on the device, including device permissions.  See any permissions that stand out?  If you see camera permissions, be on the lookout for photos associated with the app.  In a previous post, I detailed the file.
  • Reverse engineer the app.  Look at the source.  It may help you understand what the app does and how it stores data.  Here is a previous post on reverse engineering.
  • Once you have made sense of the data, report it in a standardized and readable format.
  • If you think you may come across this app again, consider writing a program to parse through data based on your findings so you can do this process automatically.
Resources available
There are plenty of resources available for interpreting data from the diverse apps out there.  I will list out a few.
  • Mobile forensic tool vendors.  I was recently at Mobile Forensics World in South Carolina.  There were many vendors presenting similar information to this post.  Everybody in the mobile forensics community is dealing with all these third party apps.  For example, the company Magnet Forensics sometimes releases findings of different third party apps.  Here is an excellent writeup on data within the app Skype.
  • dex2jar.  In my post on reverse engineering, I show how to use this program to reverse an app to a Java jar.
  • Java Jar decompilers.  With the Java jar from dex2jar, a decompiler can interpret the jar as Java source for reverse engineering.
  • Wireshark.  Especially with chatting or social apps, you may need to understand data coming over the air to the device.  Wireshark can help you capture data in transit.
  • JEB.  An app decompiler.  This is most definitely not free software.  If you need to decompile and debug an Android app to see from the "app's perspective" how data on the device is created, JEB can do the trick.

Summary

  • There are so many apps out there which store important data on the device.  If you only look at SMS and call logs, you may miss the most important conversations on the device.
  • Apps store data in the userdata partition.  You need either an image of the device or root access to get at it.
  • Data can be stored in challenging format.  If you come across encryption and you are not a crypto expert, you may need to call one in.
  • There are resources out there.  Everybody in the community is dealing with this challenge.
Questions, comments?  Any other dinosaurs out there?  Leave a comment below, or send me an email.