Wednesday, 4 January 2017

Monkey Plays (LAN) Turtle

OMG! Sooo Turtle-y!

The Hak5 LAN Turtle recently plodded across our desk so we decided to poke it with a stick and see how effective it is in capturing Windows (7) credentials.
From the LAN Turtle wiki:
The LAN Turtle is a covert Systems Administration and Penetration Testing tool providing stealth remote access, network intelligence gathering, and man-in-the-middle monitoring capabilities.
Housed within a generic "USB Ethernet Adapter" case, the LAN Turtle’s covert appearance allows it to blend into many IT environments.
It costs about U$50 and looks like this:




It consists of a System-On-Chip running an openwrt (Linux) based OS. Amongst other things, it can act like a network bridge/router between:
- a USB Ethernet interface which you plug into your target PC. This interface can also be ssh'd into via its static IP address 172.16.84.1 (for initial configuration and copying off creds).
- a 10/100 Mbps Ethernet port which you can use to connect the Turtle to the Internet (providing remote shell access and allowing the install of modules/updates from LANTurtle.com). It is not required to capture creds during normal operation.

It also has 16 MB on board Flash memory and can be configured to run a bunch of different modules via a Module Manager.

By using the Turtle's USB Ethernet interface to create a new network connection and then sending the appropriate responses, the Turtle is able to capture a logged in user's Windows credentials. Apparently Windows will send credentials over a network whether the screen is locked or not (a user must be logged in).

We will be using the QuickCreds module written by Darren Kitchen which was based on the research of Rob "Mubix" Fuller.
To send the appropriate network responses, QuickCreds calls Laurent Gaffie's Responder Python script and saves credentials (eg NTLMv2 for our Win 7 test case) to numbered directories in /root/loot. The amber Ethernet LED will blink rapidly while QuickCreds is running. When finished capturing (~30 secs to a few minutes), the amber LED is supposed to remain lit.

But wait - there's more! The turtle can also offer remote shell/netcat/meterpreter access, DNS spoofing, man-in-the-middle browser attacks, nmap scans and so much more via various downloadable modules. Alas, we only have enough time/sanity/Turtle food to look at the QuickCreds module.

Setup

We will be both configuring and testing the Turtle on a single laptop running Windows 7 Pro x64 with SP1. Realistically, you would configure it on one PC and then plug it into a separate target PC.
 
We begin setup by plugging the Turtle into the configuration PC and using PuTTY to ssh as root to 172.16.84.1. For proper menu display, be sure to adjust the PuTTY Configuration's Windows, Translation, Remote character set to "Latin-1, Western Eur".

The default root Turtle password is sh3llz. Upon first login, the user is then prompted to change the root password.
Ensure an Internet providing Ethernet cable is plugged in to the Turtle's Ethernet port to provide access to LANTurtle.com updates.

Note: The Turtle may also require Windows to install the "Realtek USB FE Family Controller" Network Adapter driver before you can communicate with it.

Upon entering/confirming the new root password, you should see something like:

LAN Turtle Main Menu via PuTTY session


Under Modules, Module Manager, go to Configure, then Directory to select the QuickCreds module for download. You can select/check a module for download via the arrow/spacebar keys.

Return back to Modules, select the QuickCreds module, then Configure (this will take a few minutes to download/install/configure the dependencies from the Internet). Remember to have an Internet providing Ethernet cable plugged into the Turtle.

Select the QuickCreds Enable option so QuickCreds is launched whenever the Turtle is plugged into a USB port.
(Optional) You can also select the Start option to start the QuickCreds module now and it should collect your current Windows login creds.

We are now ready to remove the Turtle from our config PC and place it into a target PC's USB port.

If you're having issues getting the Turtle working, try to manually reset the Turtle following the "Manually Upgrading" wiki procedure at the bottom of this page.

There's also a Hak5 Turtle/QuickCreds demo and explanation video by Darren Kitchen and Shannon Morse thats well worth a view.

Capturing Creds

Insert the Turtle into the (locked) target PC and wait for the creds to be captured. Our Turtle's amber Ethernet light followed this pattern on insertion:
- ON/OFF
- OFF (10 secs)
- Blinking at 1 Hz (15 secs)
- OFF (1-2 secs)
- Rapid Blinking > 1 Hz (indefinitely or until we launch PuTTY when it remains ON)

From testing, once we see the rapid blinking, the creds have been captured.

If you have an Internet cable plugged in to the Turtle when capturing creds, you can also remote SSH into the Turtle to retrieve the captured creds.This is not in the scope of this post however.

For our testing, we will keep it simple and use PuTTY's scp to retrieve the stored creds (eg capture creds, retrieve Turtle, take Turtle back to base for creds retrieval):
We remove the Turtle from the target PC and re-insert it into our config PC. For our testing on a single laptop this meant - we removed the Turtle, unlocked the laptop and then re-inserted the Turtle.
Note: Due to the auto enable, the Turtle will also capture the config PC's creds upon insertion.

Now PuTTY in to the Turtle, then choose Exit to get to the Turtle command prompt/shell (shell ... Get it? hyuk, hyuk).

To find the latest saved creds we can type something like:

ls -alt /root/loot

which shows us the latest creds (corresponding to our current config PC) is stored under /root/loot/12/

root@turtle:~# ls -alt /root/loot/
-rw-r--r--    1 root     root           319 Jan  2 11:14 responder.log
drwxr-xr-x    2 root     root             0 Jan  2 11:13 12
drwxr-xr-x   14 root     root             0 Jan  2 11:11 .
drwxr-xr-x    2 root     root             0 Jan  2 11:01 11
drwxr-xr-x    2 root     root             0 Jan  2 11:00 10
drwxr-xr-x    2 root     root             0 Jan  2 10:46 9
drwxr-xr-x    2 root     root             0 Jan  2 08:58 8
drwxr-xr-x    2 root     root             0 Jan  2 08:49 7
drwxr-xr-x    2 root     root             0 Jan  2 08:46 6
drwxr-xr-x    2 root     root             0 Jan  2 08:35 5
drwxr-xr-x    2 root     root             0 Jan  2 08:34 4
drwxr-xr-x    2 root     root             0 Jan  2 08:26 3
drwxr-xr-x    2 root     root             0 Jan  2 08:21 2
drwxr-xr-x    2 root     root             0 Jan  2 08:20 1
drwxr-xr-x    1 root     root             0 Jan  2 08:20 ..
root@turtle:~#

So looking further at /root/loot/11/ (ie the creds from when we plugged the Turtle into the locked laptop) shows us a few log files and a text file containing our captured creds (ie HTTP-NTLMv2-172.16.84.182.txt).

root@turtle:~# ls /root/loot/11/
Analyzer-Session.log           Poisoners-Session.log
Config-Responder.log           Responder-Session.log
HTTP-NTLMv2-172.16.84.182.txt
root@turtle:~#


Our creds should be stored in HTTP-NTLMv2-172.16.84.182.txt and we can use the following command to check that the file contents look OK:

more /root/loot/HTTP-NTLMv2-172.16.84.182.txt

which should return something like:

admin::N46iSNekpT:08ca45b7d7ea58ee:88dcbe4446168966a153a0064958dac6:5c7830315c7830310000000000000b45c67103d07d7b95acd12ffa11230e0000000052920b85f78d013c31cdb3b92f5d765c783030

Where admin is the login name and the second field (eg N46iSNekpT) corresponds to the domain.
Note: This is an NTLMv2 example sourced from hashcat.

Once we have found the appropriate file containing the creds we want, we can use PuTTY pscp.exe to copy the files from the Turtle to our config PC.
From our Windows config PC we can use something like:
pscp root@172.16.84.1:/root/loot/11/HTTP-NTLMv2-172.16.84.182.txt .

to copy out the creds file. Note the final . to copy the creds file into the current directory on the config PC.

We can then feed this (file or individual entries) into hashcat to crack the user password. This is an exercise left for the reader.

Turtle Artifacts?

Now that we have our creds, lets see if we can find any fresh Turtle scat er, artifacts.

Starting with the Turtle plugged in to an unlocked PC, we look under the Windows Device Manager and find the Network adapter driver for the Turtle - ie the "Realtek USB FE Family Controller"

Turtle Network Adapter Driver Properties


The Details Tab from the Properties screen yields a "Device Instance Path" of:
USB\VID_0BDA&PID_8152\00E04C36150A

Similarly, the "Hardware Ids" listed were "USB\VID_0BDA&PID_8152" and "USB\VID_0BDA&PID_8152&REV_2000".

The HardwareId string ("VID_0BDA&PID_8152") implies that the driver was communicating with a Realtek 8152 USB Ethernet controller. Note: 0BDA is the vendor id for Realtek Semiconductor (see https://usb-ids.gowdy.us/read/UD/0bda) and the Turtle Wiki specs confirm the Turtle uses a "USB Ethernet Port - Realtek RTL8152".

We then used FTK Imager (v3.4.2.2) to grab the Registry hives so we can check them for artifacts.

Searching the SYSTEM hive for part of the "Device Instance Path" string (ie "VID_0BDA&PID_8152") yields an entry in SYSTEM\ControlSet001\Enum\USB\VID_0BDA&PID_8152

Potential First Turtle Insertion Time

The Last Written Time appears to match the first time the Turtle was inserted into the PC (21DEC2016 @ 21:15:54 UTC).

Another hit occurs in SYSTEM\ControlSet001\Enum\USB\VID_0BDA&PID_8152\00E04C36150A

Potential Most Recent Turtle Insertion Time


The Last Written Time appears to match the most recent time the Turtle was inserted (2JAN2017 @ 11:45:01 UTC).


The Turtle's 172.16.84.1 address appears in the Windows SYSTEM Registry hive as a "DhcpServer" value under SYSTEM\ControlSet001\services\Tcpip\Parameters\Interfaces\{59C1F0C4-66A7-42C8-B25E-6007F3C40925}.

Turtle's DHCP Address and Timestamp

Additionally under that same key, we can see a "LeaseObtainedTime" value which appears to be in seconds since Unix epoch (1JAN1970).
Using DCode to translate gives us:

Turtle DHCP LeaseObtainedTime Conversion


ie 2 JAN 2017 @ 11:24:37
This time occurs between the first time the Turtle was inserted (21DEC2016) and the most recent time the Turtle was inserted (2JAN2017 @ 11:45:01). This is plausible as the Turtle was plugged in multiple times during testing on the 2 JAN 2017. It is estimated that the Turtle was first plugged in on 2 JAN 2017 around the same time as the "LeaseObtainedTime". 

These timestamps potentially enable us to give a timeframe for Turtle use. We say potentially because it is possible that another device using the "Realtek USB FE Family Controller" driver may have also been used. However, the specific IP address (172.16.84.1) can help us point the flipper at a rogue Turtle.

The "Realtek USB FE Family Controller" string also appears in the "Description" value under the SOFTWARE hive:
SOFTWARE\Microsoft\Windows NT\CurrentVersion\NetworkCards\17

NetworkCards entry potentially pointing to Turtle

Note: The NetworkCards number entry will vary (probably will not be 17 in all cases)

There are probably more artifacts to be found but these Registry entries were the ones that were the most obvious to find. The Windows Event logs did not seem to log anything Turtle-y definitive.

However, based on the artifacts above, we can only say that a Turtle was probably plugged in. We don't have enough (yet?) to state which modules (if any) were run.

Final Thoughts

Anecdotally from the Hak5 Turtle Forums, capturing Windows credentials with the LAN Turtle seems to be hit and miss.
From our testing, the Turtle QuickCreds module worked for a Win7 laptop but failed to capture creds for a Win10 VM running on the same laptop. Once the Turtle was plugged in to the laptop, it captured the creds for the host Win7 OS but upon connecting the Turtle to the Win10 VM via the "Removable Devices" VMware 12 Player menu, the amber LED remains solidly lit and the Win10 creds were not captured.
Interestingly, not all of the Win7 Registry artifacts listed previously were observed in the Win10 VM's Registry:
Both SYSTEM\ControlSet001\Enum\USB\VID_0BDA&PID_8152 and SYSTEM\ControlSet001\Enum\USB\VID_0BDA&PID_8152\00E04C36150A were present in the Win10 SYSTEM registry.
However, no hits were observed for "172.16.84.1" in SYSTEM.
There were various hits for "Realtek USB FE Family Controller" in SYSTEM.
The "Realtek USB FE Family Controller" string also appears in the "Description" value under the Win10 SOFTWARE hive:
SOFTWARE\Microsoft\Windows NT\CurrentVersion\NetworkCards\5
The lack of Win10 Registry DHCP artifacts probably indicates that while the Realtek USB Ethernet driver was installed, the Turtle was unable to assign the 172.16.84.1 IP address within the WIN10 VM (possibly because the Win7 still has it reserved?).

Fortunately, Jackk has recorded a helpful YouTube video demonstrating the LAN Turtle running QuickCreds successfully against a Win10 laptop (not VM). So it is possible on Win10 ... Jackk also shows how to use the Turtle's sshfs module to copy off the cred files via a FileZilla client (instead of using pscp).

Any comments/suggestions are turtle-y welcome in the comments section below.

Saturday, 13 August 2016

Google S2 Mapping Scripts

Sorry Monkey - there is just no point to mapping jokes ...

Cindy Murphy's recent forensic forays into Pokemon Go (here and here)
 have inspired further monkey research into the Google S2 Mapping library. The S2 library is also used by Uber, Foursquare and Google (presumably) for mapping location data. So it would probably be useful to recognise and/or translate any S2 encoded location artifacts we might come across in our forensic travels eh? *repeats in whispered voice* travels ...
After a brief introduction to how S2 represents lat/long data, we will demonstrate a couple of multi-platform (Windows/Linux) Python conversion scripts for decoding/encoding S2 locations (using sidewalklabs' s2sphere library).

S2 Mapping (In Theory)

The main resources for this section were:
The TLDR - its possible to convert a latitude/longitude from a sphere into a 64 bit integer. The resultant 64 bit integer is known as a cellid.
But how do we calculate this 64 bit cellid? This is achieved by projecting our spherical point (lat, long) onto one of 6 faces of an enclosing cube and then using a Hilbert curve function to specify a grid location for a specified cell size. Points along the Hilbert curve that are close to each other in value, are also spatially close to one another. This diagram from Christian's blog post better illustrates the point:

Points close to each other on the Hilbert curve have similar values. Source: Christian Perone's Blog

In the above diagram, the Hilbert curve (darker gray line) goes from the bottom LHS to the bottom RHS (or vice versa). Each grid box contains one of those Y-shaped patterns. The scale at the bottom represents the curve if it was straightened out like a piece of string.
If you now imagine the grid becoming finer/smaller but still requiring one Y-shape per box, you can see how a smaller cell grid size requires a finer resolution for points on the curve. So the smaller the cell/grid size, the higher the number of bits required to store the position. For example, a level 2 cell size only needs 4 bits where as the maximum level 30 requires 60 bits. Level 30 cell sizes are approximately 0.48-0.93 square centimetres in size depending on the lat/long.

Fun fact: Uber apparently uses a level 12 cell size (approx. 3.3 to 6.4 square kilometres per cell). 
Second fun fact: The Metric system has been around for over 100 years so stop whining about all the metric measurements already *looks over sternly at the last of the Imperial Guards in the United States, Myanmar and Liberia*.

Ahem ... so here's what a level 30 cellid looks like:

Level 30 cellid structure. Source: Octavian Procopiuc's GoogleDoc

The first 3 bits represent which face of the enclosing cube to use and the remaining 60 bits are used to store the position on the Hilbert curve. Note: The last bit is set to 1 to mark the end of the Hilbert positioning bits.

When cellids are converted into hexadecimal and have their least significant zeroes removed (if present), they are in their shortened "token" form.
eg1 cellid = 10743750136202470315 (decimal) has a token id = 0x951977D377E723AB
eg2 cellid = 9801614157982728192 (decimal) = 0x8806542540000000. The 16 hex digits can then be shortened to a token value of "880654254". To convert back to the original hex number, we keep adding least significant zeroes to "880654254" until its 16 digits long (ie 64 bits).
Analysts should anticipate seeing either cellids or token ids. These might be in plaintext (eg JSON) or may be in an SQLite database.

Note: Windows Calculator sucks at handling large unsigned 64 bit numbers. According to this, its limited between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807. So a number like 10,743,750,136,202,470,315 made it return an incorrect hex representation after conversion.
This monkey spun his paws for a while trying to figure out why the token conversions didn't seem to make sense. The FFs Solution - use the Ubuntu Calculator for hex conversions of 64 bit integers instead.

The Scripts

Two Python 2.7+ scripts were written to handle S2 conversions and are available from my GitHub here. They have been tested on both Windows 7 running Python 2.7.12 and Ubuntu x64 14.04 running Python 2.7.6.

s2-latlong2cellid.py converts lat, long and cellid level to a 64 bit Google S2 cellid.

s2-cellid2latlong.py converts a 64 bit Google S2 cellid to a lat, long and S2 cellid level.

IMPORTANT: These scripts rely on the third-party s2sphere Python library. Users can install it via:
pip install s2sphere
(on Windows) and:
sudo pip install s2sphere
(on Ubuntu)

Here's the help text for s2-latlong2cellid.py:
python s2-latlong2cellid.py -h
Running s2-latlong2cellid.py v2016-08-12

usage: s2-latlong2cellid.py [-h] llat llong level

Converts lat, long and cellid level to a 64 bit Google S2 cellid

positional arguments:
  llat        Latitude in decimal degrees
  llong       Latitude in decimal degrees
  level       S2 cell level

optional arguments:
  -h, --help  show this help message and exit

Here's the help text for s2-cellid2latlong.py:
python s2-cellid2latlong.py -h
Running s2-cellid2latlong.py v2016-08-12

usage: s2-cellid2latlong.py [-h] cellid

Convert a 64 bit Google S2 cellid to a lat, long and S2 cellid level

positional arguments:
  cellid      Google S2 cellid

optional arguments:
  -h, --help  show this help message and exit


Testing

A handy online S2map visualizer was written by David Blackman - the Lead Geo Engineer at Foursquare (and formerly of Google ie he knows his maps). See also the Readme here. S2map also has several other mapping dropbox options besides the default "OSM Light". These include "Mapbox Satellite" which projects the cellid onto an aerial view.

We start our tests by going to GoogleMaps and noting the lat, long of an intersection in Las Vegas.

Pick a spot! Any spot!

We then specify that lat, long as input into the s2-latlong2cellid.py script (with level set to 24):
python s2-latlong2cellid.py 36.114574 -115.180628 24
Running s2-latlong2cellid.py v2016-08-12

S2 cellid = 9279882692622716928

We then put that cellid into s2map.com:

Level 24 test cellid plotted on s2map.com.


Note: The red arrow was added by monkey to better show the plotted cellid (its tiny).
So we can see that our s2-latlong2cellid.py gets us pretty close to where we originally specified on GoogleMaps.

What happens if we keep the same lat, long coordinates but decrease the level of the cellid from 24 to 12?
python s2-latlong2cellid.py 36.114574 -115.180628 12
Running s2-latlong2cellid.py v2016-08-12

S2 cellid = 9279882742634381312

Obviously this is a different cellid because its set at a different level, but just how far away is the plotted level 12 cellid now?

Level 12 test cellid plotted on s2map.com.

Whoa! The cell accuracy has just decreased a bunch. It appears the center of this cellid is completely different to the position we originally set in GoogleMaps. It is now centred on the Bellagio instead of the intersection. This is presumably because the cell size is now larger and the center point of the cell has moved accordingly.

To confirm these findings, we take our level 24 cellid 9279882692622716928 and use it with s2-cellid2latlong.py.

python s2-cellid2latlong.py 9279882692622716928
Running s2-cellid2latlong.py v2016-08-12

S2 Level = 24
36.114574473973924 , -115.18062802526205

We then plot those coordinates on GoogleMaps ...

Level 24 test cellid 9279882692622716928 plotted on GoogleMaps via s2-cellid2latlong.py


ie Our s2-cellid2latlong.py script seems to work OK for level 24.

Here's what it looks like when we use the level 12 cellid 9279882742634381312:
python s2-cellid2latlong.py 9279882742634381312
Running s2-cellid2latlong.py v2016-08-12

S2 Level = 12
36.11195989469266 , -115.17705862118852

Level 12 test cellid 9279882742634381312 plotted on GoogleMaps via s2-cellid2latlong.py


This seems to confirm the results from s2map.com. For the same lat, long, changing the cellid level can significantly affect the returned (centre) lat, long.

We also tested our scripts against s2map.com with a handful of other cellids and lat/long/levels and they seemed consistent. Obviously time contraints will not let us test every possible point.

Final Thoughts

Using the s2sphere library, we were able to create a Python script to convert a lat, long and level to an S2 cellid (s2-latlong2cellid.py). We also created another script to convert a S2 cellid to a lat, long and level (s2-cellid2latlong.py).
The higher the cellid level, the more accurate the location. You can find the cellid level by using the s2-cellid2latlong.py script.
Plotting a cellid with s2map.com is the easiest way of visualizing the cellid boundary on a map. Higher levels (>24) become effectively invisible however.

To locate potential S2 cellids we can use search terms like "cellid" or variations such as "cellid=". If its stored in plaintext (eg JSON), those search terms should find it. No such luck if its encrypted or stored as a binary integer though.

While there are other S2 Python libraries, this Monkey decided to use sidewalklabs s2sphere library based on its available documentation and pain-free cross platform support (pip install is supported).

Other Google S2 Python libraries include:
https://github.com/micolous/s2-geometry-library
(As demonstrated in Christian's blog and also used in the Gillware Pokemon script. This seems to be Linux only)
and
https://github.com/qedus/sphere
(Has the comment: "Needs to be packaged properly for use with PIP")

Some other interesting background stuff ...
Interview article with David Blackman (Foursquare)

Matt Ranney's (Chief Systems Architect at Uber) video presentation on "Scaling Uber's Real-time Market Platform" (see 18:15 mark for S2 content)

Uber uses drivers phone as a backup data source (claims its encrypted)


In the end, creating the Python conversion scripts was surprisingly straight forward and only required a few lines of code.
It will be interesting to see how many apps leave Google S2 cellid artifacts behind (hopefully ones with a high cellid level). Hopefully these scripts will prove useful when looking for location artifacts. eg when an analyst finds a cellid/token and wants to map it to a lat/long or when an analyst wants to calculate a cellid/token for a specific lat, long, level.

Friday, 29 July 2016

A Timestamp Seeking Monkey Dives Into Android Gallery Imgcache

Are you sure?! Those waters look pretty turdy ...
UPDATE 4AUG2016: Added video thumbnail imgcache findings and modified version of script for binary timestamps.

Did you know that an Android device can cache images previously viewed with the stock Gallery3D app?
These cached images can occur in multiple locations throughout the cache files. Their apparent purpose is to speed up Gallery loading times.
If a user views an image and then deletes the original picture, an analyst may still be able to recover a copy of the viewed image from the cache. Admittedly, the cached images will not be as high a quality as the original, but they can still be surprisingly detailed. And if the pictures no longer exist elsewhere on the filesystem - "That'll do monkey, that'll do ..."

The WeAre4n6 blog has already documented their observations about Android imgcache here.
So why are we re-visiting this?
We were asked to see if there was any additional timestamp or ordering information in the cached pictures. If a device camera picture only exists in the Gallery cache, it won't have the typical YYYYMMDD_HHMMSS.JPG filename. Instead, it will be embedded in a cache file with a proprietary structure and will need to be carved out. These embedded cached JPGs do not have any embedded metadata (eg EXIF).
An unnamed commercial phone forensics tool will carve the cached pictures out but it currently does not extract any timestamp information.

Smells like an opportunity for some monkey style R&D eh?
Or was that just Papa Monkey's flatulence striking again? An all banana diet can be so bittersweet :D

Special Thanks to:
- Terry Olson for posting this question about the Gallery3D imgcache on the Forensic Focus Forum and then kindly sharing a research document detailing some imgcache structures.
- Jason Eddy and Jeremy Dupuis who Terry acknowledged as the source of the research document.
- LSB and Rob (@TheHexNinja) for their help and advice in researching the imgcache.
- Cindy Murphy (@CindyMurph) for sharing her recollections of a case involving imgcache and listening to this monkey crap on.
- JoAnn Gibb for her suggestions and also listening to this monkey crap on.

Our main test devices were a Samsung Galaxy Note 4 (SM-910G) and a Galaxy Note 4 Edge (SM-915G) both running Android 5.1.1.

Our initial focus was the following cache file:
userdata:/media/0/Android/data/com.sec.android.gallery3d/cache/imgcache.0

After an image is viewed fullscreen in the Gallery app, imgcache.0 appears to be populated with the viewed picture plus six (sometimes less) other images. It is suspected the other cached pictures are chosen based on the display order of the parent gallery and will be taken from before/after the viewed image. If a picture is found in this cache file, it is likely that the user would have seen it (either from the parent gallery view or when they viewed it fullscreen).
From our testing, this file contains the largest sized cached images. From the filesystem last modified times and file sizes, it is suspected that when the imgcache.0 file reaches a certain size, it gets renamed to imgcache.1 and newly viewed images then start populating imagecache.0.  Due to time constraints, we did not test for this rollover behaviour. By default, the initial imgcache.0 and imgcache.1 files appear to be 4 bytes long.

Also in the directory were mini.0 and micro.0 cache files which contained smaller cached images. Similarly to imgcache.0, these files also had  .1 files.

mini.0 contains the smallest sized, square clipped, thumbnail versions of the cached images. They appear to be similar to the images displayed from the Gallery preview list that is shown when the user long presses on a fullscreen viewed Gallery image.

micro.0 contains non-clipped images which are smaller versions of the images in imgcache.0 but larger in size than the images in mini.0. These appear to be populated when the user views a gallery of pictures. Launching the Gallery app can be enough to populate this cache (likely depends on the default Gallery app view setting).

imgcache.0 has been observed to contain a different number of images to mini.0 or micro.0. It is suspected this is due to how the images were viewed/previewed from within the Gallery app.

Other files were observed in the cache directory but their purpose remains unknown. eg imgcache.idx, micro.idx, mini.idx were all compromised mainly of zeroed data.

UPDATE 4AUG2016:
A device video was also created/saved on the test device and displayed via the Gallery app. A corresponding video thumbnail was consequently cached in the imgcache.0, mini.0 and micro.0 files. These video cache records were written in a slightly different format to the picture cache records.

The imgcache structure

Based on the supplied research document and test device observations, here's the record structure we observed for each Galaxy Note 4 “imgcache.0” picture record:
  • Record Size (4 Byte LE Integer) = Size in bytes from start of this field until just after the end of the JPG
  • Item Path String (UTF16-LE String) = eg /local/image/item/
  • Index Number (UTF16-LE String) =  eg 44
  • + separator (UTF16-LE String) = eg +
  • Unix Timestamp Seconds (UTF16-LE String) = eg 1469075274
  • + separator (UTF16-LE String) = eg +
  • Unknown Number String (UTF16-LE String) = eg 1
  • Cached JPG (Binary) = starts with 0xFFD8 ... ends with 0xFFD9
The cached JPG is a smaller version of the original picture.
The Unix Timestamp Seconds is referenced to UTC and should be adjusted for local time. We can use a program like DCode to translate it into a human readable format (eg 1469075274 = Thu, 21 July 2016 04:27:54. UTC).
The Index Number seems to increase for each new picture added to the cache and may help determine the order in which the picture was viewed.

There are typically 19 bytes between each imgcache.0 record. However, the first record in imgcache.0 usually has 20 bytes before the first record’s 4 byte Record Size.
The record structure shown above was also observed to be re-used in the “micro” and “mini” cache files.

UPDATE 4AUG2016:
Here's the record structure we observed for each Galaxy Note 4 “imgcache.0” video thumbnail record:

  •     Record Size (4 Byte LE Integer) = Size in bytes from start of this field until just after the end of the JPG
  •     Item Path String (UTF16-LE String) = eg /local/video/item/
  •     Index Number (UTF16-LE String) =  eg 44
  •     + separator (UTF16-LE String) = eg +
  •     Unix Timestamp Milliseconds (UTF16-LE String) = eg 1469075274000
  •     + separator (UTF16-LE String) = eg +
  •     Unknown Number String (UTF16-LE String) = eg 1
  •     Cached JPG (Binary) = starts with 0xFFD8 ... ends with 0xFFD9

The Unix Timestamp Milliseconds is referenced to UTC and should be adjusted for local time. We can use a program like DCode to translate it into a human readable format (eg 1469075274000 = Thu, 21 July 2016 04:27:54. UTC).

The item path string format did not appear to vary for a picture/video saved to the SD card versus internal phone memory.

The Samsung Note 4 file format documented above was NOT identical with other sample test devices including a Moto G (XT1033), a Samsung Galaxy Core Prime (SM-G360G) and a Samsung J1 (SM-J100Y).
The Moto G’s Gallery app cache record size did not include itself (ie 4 bytes smaller) and the Galaxy Core Prime / J1’s Gallery app cache record did not utilize a UTF16LE timestamp string. Instead, it used a LE 8 byte integer representing the Unix timestamp in milliseconds (for BOTH picture and video imgcache records). This was written between the end of the path string and the start of the cached JPG’s 0xFFxD8.
These differences imply that a scripted solution will probably require modifications on a per device/per app basis.

UPDATE 4AUG2016:
As a result of this testing, a second script (imgcache-parse-mod.py) was written to parse Galaxy S4 (GT-i9505)/ Galaxy Core Prime / J1 imgcache files which appear to share the same imgcache record structures. Please refer to the initial comments section of the imgcache-parse-mod.py script for a full description of that imgcache structure. This modified script will take the same input arguments as the original imgcache-parse.py script described in the next section.

Scripting

A Python 2 script (imgcache-parse.py) was written to extract JPGs from “imgcache”, “micro” and “mini” cache files to the same directory as the script.

UPDATE 4AUG2016:
The script searches the given cache file (eg imgcache.0) for the UTF16LE encoded "/local/image/item/" and/or “/local/video/item/” strings, finds the record size and then extracts the record's embedded JPG to a separate file. The script also outputs an HTML table containing the extracted JPGs and various metadata.

An example HTML output table looks like:

Example HTML output table for picture imgcache records

Example HTML output table entry for a video imgcache record


The extracted JPG filename is constructed as follows:

[Source-Cache-Filename]_pic_[Hex-Offset-of-JPG]_[Unix-Timestamp-sec]_[Human-Timestamp].jpg
OR
[Source-Cache-Filename]_vid_[Hex-Offset-of-JPG]_[Unix-Timestamp-ms]_[Human-Timestamp].jpg

The script also calculates the MD5 hash for each JPG (allowing for easier detection of duplicate images) and prints the filesize and the complete item path string.
Each HTML table record entry is printed in the same order as it appears in the input cache file. That is, the top row represents the first cache record and the bottom row represents the last cache record.

The script was validated with Android 5.1.1 and the Gallery3d app v2.0.8131802.
You can download it from my Github site here.

Here is the help for the script:
C:\Python27\python.exe imgcache-parse.py
Running imgcache-parse.py v2016-08-02

Usage:  imgcache-parse.py -f inputfile -o outputfile

Options:
  -h, --help   show this help message and exit
  -f FILENAME  imgcache file to be searched
  -o HTMLFILE  HTML table File
  -p           Parse cached picture only (do not use in conjunction with -v)
  -v           Parse cached video thumbnails only (do not use in conjunction with -p)

Here is an example of how to run the script (from Windows command line with the Python 2.7 default install). This will process/extract BOTH pictures and video cache records (default):

C:\Python27\python.exe imgcache-parse.py -f imgcache.0 -o opimg0.html
Running imgcache-parse.py v2016-08-02

Paths found = 14

/local/image/item/44+1469075274+1 from offset = 0X18
imgcache.0_pic_0X5A_1469075274_2016-07-21T04-27-54.jpg
JPG output size(bytes) = 28968 from offset = 0X5A

/local/image/item/43+1469073536+1 from offset = 0X7199
imgcache.0_pic_0X71DB_1469073536_2016-07-21T03-58-56.jpg
JPG output size(bytes) = 75324 from offset = 0X71DB

/local/image/item/41+1469054648+1 from offset = 0X1982E
imgcache.0_pic_0X19870_1469054648_2016-07-20T22-44-08.jpg
JPG output size(bytes) = 33245 from offset = 0X19870

/local/image/item/40+1469051675+1 from offset = 0X21A64
imgcache.0_pic_0X21AA6_1469051675_2016-07-20T21-54-35.jpg
JPG output size(bytes) = 40744 from offset = 0X21AA6

/local/image/item/39+1469051662+1 from offset = 0X2B9E5
imgcache.0_pic_0X2BA27_1469051662_2016-07-20T21-54-22.jpg
JPG output size(bytes) = 30698 from offset = 0X2BA27

/local/video/item/38+1469051577796+1 from offset = 0X33228
imgcache.0_vid_0X33270_1469051577796_2016-07-20T21-52-57.jpg
JPG output size(bytes) = 34931 from offset = 0X33270

/local/image/item/37+1469051566+1 from offset = 0X3BAFA
imgcache.0_pic_0X3BB3C_1469051566_2016-07-20T21-52-46.jpg
JPG output size(bytes) = 28460 from offset = 0X3BB3C

/local/image/item/27+1390351440+1 from offset = 0X42A7F
imgcache.0_pic_0X42AC1_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 97542 from offset = 0X42AC1

/local/image/item/28+1390351440+1 from offset = 0X5A7DE
imgcache.0_pic_0X5A820_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 122922 from offset = 0X5A820

/local/image/item/29+1390351440+1 from offset = 0X78861
imgcache.0_pic_0X788A3_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 127713 from offset = 0X788A3

/local/image/item/30+1390351440+1 from offset = 0X97B9B
imgcache.0_pic_0X97BDD_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 97100 from offset = 0X97BDD

/local/image/item/31+1390351440+1 from offset = 0XAF740
imgcache.0_pic_0XAF782_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 66576 from offset = 0XAF782

/local/image/item/32+1390351440+1 from offset = 0XBFBA9
imgcache.0_pic_0XBFBEB_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 34746 from offset = 0XBFBEB

/local/image/item/33+1390351440+1 from offset = 0XC83BC
imgcache.0_pic_0XC83FE_1390351440_2014-01-22T00-44-00.jpg
JPG output size(bytes) = 26865 from offset = 0XC83FE

Processed 14 cached pictures. Exiting ...

The above example output also printed the HTML table we saw previously.
Some further command line examples:
C:\Python27\python.exe imgcache-parse.py -f imgcache.0 -o output.html -p
(will parse/output picture cache items ONLY)

C:\Python27\python.exe imgcache-parse.py -f imgcache.0 -o output.html -v
(will parse/output video thumbnail cache items ONLY)

Testing

During testing of the Gallery app - device camera pictures, a screenshot and a picture saved from an Internet browser were viewed. Cached copies of these pictures were subsequently observed in the “imgcache.0”, “mini.0” and “micro.0” cache files.
From our testing, the Unix timestamp represents when the picture was taken/saved rather than the time it was browsed in the Gallery app.
This was tested for by taking camera picture 1 on the device, waiting one minute, then taking picture 2. We then waited another minute before viewing picture 1 in the Gallery app, waiting one minute and then viewing picture 2.
Running the imgcache-parse.py script and viewing the resultant output HTML table confirmed that the timestamp strings reflect the original picture’s created time and not the Gallery viewed time. The HTML table also displayed the order of the imgcache.0 file - picture 1 was written first, then picture 2.
We then cleared the Gallery app cache and viewed picture 2 in the Gallery app followed by picture 1.
Running the imgcache-parse.py script again and viewing the resultant output HTML table displayed the order of the imgcache.0 file. Picture 2 was written first, then picture 1.

UPDATE 4AUG2016:
A device video was also created (20160802_155401.mp4), uploaded to Dropbox (via app v2.4.4.8) and then downloaded and viewed in the Gallery app. The imgcache.0 record timestamp for the created video (1470117241703 = 05:54:01) differed to the imgcache.0 timestamp for the downloaded video (1470117253000 = 05:54:13). This difference of approximately 12 seconds was slightly longer than the 11 second video duration.
It is suspected that the created video’s imgcache timestamp represents when the original video was first being written and the downloaded video’s imgcache timestamp represents when the original video was finalised to the filesystem.
The video thumbnails displayed in the Gallery app and imgcache for each video were also different. The downloaded video thumbnail appeared to be from approximately 1 second into the video. The created video thumbnail seemed to be the first frame of the video. The MD5 hashes of both video files were identical.

As per LSB's helpful suggestion, rather than take a full image of the test phone for each acquisition of cache files, we plugged our test device into a PC and used Windows Explorer to browse to the Phone\Android\data\com.sec.android.gallery3d\cache folder and copy the cache files to our PC. This saved a significant amount of imaging time. To minimize any synchronization issues, the phone should be unplugged/re-plugged between file copies.


Final Thoughts

Depending on the device, it may be possible to determine the created timestamp of a picture viewed and cached from the Android Gallery app. The Gallery cache may also include pictures which are no longer available elsewhere on the device.
A Python script (imgcache-parse.py) was created to extract various metadata and the cached images from a Samsung Note 4 Gallery app’s (imgcache, micro and mini) cache files.
UPDATE 4AUG2016:A modified version of this script (imgcache-parse-mod.py) was also created to handle binary timestamps as observed on Galaxy S4 / J1 / Core Prime sample devices.

It is STRONGLY recommended that analysts validate their own device/app version combinations before running these scripts. Your mileage will vary!
For example, take a picture using the device camera and validate its YYYYMMDD_HHMMSS.JPG filename/metadata against the timestamp in the item path (if its there).
For case data, look for device images with date/time information in them (eg pictures of newspapers, receipts etc. or device screenshots) to increase the confidence level in extracted timestamps.

The Gallery app was not present in various Android 6.0 test devices that we looked at. It may have been usurped by the Google Photos app. However, we have seen the Gallery app on Android 5 and Android 4 devices which would still make up the majority of devices currently out there.

Monkey doesn't have the time/inclination but further areas of research could be:
- Decompiling the Gallery .apk and inspecting the Java code.
- Rollover functionality of the cache files (eg confirm how imgcache.1 get populated).
- Why there can be multiple copies of the same image (with same MD5) appearing at multiple offsets within the same imgcache file.
- Determining how the cache record index number is being calculated.
- Determining the “imgcache.idx”, “micro.idx”, “mini.idx” files purpose.

Anyhoo, it would be great to hear from you in the comments section (or via email) if you end up using these scripts for an actual case. Or if you have any further observations to add (don't forget to state your Android version and device please).

Sorry, but for mental health reasons I will NOT recover your dick pics for you. ie Requests for personal image recovery will be ignored. If you Google for "JPG file carver", you should find some programs that can help you recover/re-live those glorious tumescent moments.

Can you tell how working in forensics has affected my world view? ;)

Monday, 4 July 2016

Panel Beaten Monkey



FYI: A "Panel Beater" = Auto body mechanic in Monkeytown-ese
This Monkey was recently invited to shit himself sit on a SANS DFIR Summit panel discussing Innovation in Mobile Forensics with an All-Star cast of Andrew Hoog, Heather Mahalik, Cindy Murphy and Chris Crowley. While it rated well with the audience, personally (because its all about THIS monkey!) - it seemed that whenever I thought of something relevant, another panel member chirped up with a similar idea and/or the discussion moved on to the next question.
I felt it was kinda difficult to contribute something meaningful yet concise in a 30 second sound bite. Especially for my first open question speaking gig.
Monkey might need to decrease his deferential politeness and/or increase his use of assertive poo flinging in future panel discussions. Alternative suggestions are also welcome in the comments :)

Here's the synopsis of the panel from the DFIR Summit Program ...
Puzzle Solving and Science: The Secret Sauce of Innovation in Mobile Forensics
In today’s world, technology (especially mobile device technology) moves at a much faster pace than any of us can keep up with, and available training and research doesn’t always address the problems we encounter. As forensic examiners we face the daily challenges of new apps, new, updated and obscure operating systems, malware, secure apps, pass code and password protected phones, encoding and encryption problems, new artifacts, and broken hardware in order to obtain the evidence we need in a legally defensible and forensically sound manner.  In this session, learn from consistent and experienced innovators in the mobile forensics field the tips, tricks, and mindset that they bring to bear on the toughest problems and how to move beyond cookie cutter forensics towards an approach that allows you to successfully solve and own problems others might consider too hard to even try.


Anyhoo, the initial concept was to have several one word themed slides and discuss how these traits can help with innovation in mobile forensics.
Due to a panel format change, the original slides didn't get much play time so monkey thought he'd run through them now and present his thoughts with a focus on advice for those newer to mobile forensics. Some of the points made here may have been mentioned during the panel by other speakers but at least here I have time to elaborate and present my point of view. Bonus huh?

Now let's meet the panel ... Can you tell that we went for a superhero introductory theme?

Heather Mahalik!

Cindy Murphy!

Chris Crowley!


Andrew Hoog!

Cheeky4n6monkey!
 And now onto the rest of the slides ...

Curiosity


This is what attracts most of us to forensics. How does "Stuff" work and given a set of resultant data, how can we reconstruct what happened?
Documenting your curiosity (via blog post, white paper, journal article) is a great way of both sharing knowledge with the community and demonstrating your ability to research and think independently.
In mobile forensics, curiosity will usually lead to hex diving especially when hunting for new artifacts.
Curiosity naturally leads to "Squirrel chasing" where one interesting artifact can lead you to many others. So you might start out with one focus and end up discovering a bunch of cool artifacts.

Creativity


Our ability to create solutions depends on our paint set. The wider array of skills you have as a mobile forensic examiner, the more creative you can be - especially as mobile devices are a combination of both hardware and software.
For inspiration, background knowledge and anticipating future trends, read research papers, blogs, books, patents, mobile device service manuals/schematics and industry standards (eg eMMC JEDEC standard). Knowing the background details today will help you analyze tomorrow's device.
Start with a popular make/model and learn how a device works. Go to ifixit.com and the FCC website for pictures of device breakdowns. Read up on how eMMC Flash memory devices work. You don't have to be able to MacGyver a mobile device on a desert island but familiarize yourself with the fundamental concepts (eg eMMC memory has a NAND Controller acting as the interface to the actual NAND memory).
Look at how an SQLite database is structured. Most apps rely on these types of databases to store their data. The official website is a great place to start.
Develop/practise skills in soldering, chipoff, network forensics, malware reverse engineering, scripting for artifacts.
Know how to find/make/use automated tools. Tools can be used as intended/documented (eg NetworkMiner to read .pcaps) or in more novel ways (eg use an Android emulator to create app artifacts and save on rooting test devices/acquisition time).

Scientific Method

As mobile devices change (use of devices, underlying hardware, encryption, new apps/OS artifacts) we need to be able to record our observations in a structured, repeatable way and be able to communicate our findings to others.
The best way is to create your own data on a test device using a documented set of known actions. As Adam Savage from Mythbusters says: "Remember, kids, the only difference between screwing around and science is writing it down".
Also, as Mari Degrazia (and Meoware Kitty) showed us at the DFIR Summit, you should also "Trust But Verify" your tools.

Perseverance


Don't let failure discourage you if/when it comes.
You may need to use a different technique or change your assumptions. Or wait for new developments by someone else and revisit.
There may be more than one solution. Evaluate which is better or worse. The faster method is not always the most comprehensive.
You are not alone. Chances are someone else in the community may have the keys to your problem. Ask around Twitter, forensic forums and your professional network.

Teamwork/Collaboration


No one monkey knows ALL THE THINGS.
I find it helpful to email a trusted group of mobile forensic gurus and describe what I am seeing. Even if they are not able to help directly, it forces me to structure my thinking and help me question my approach.
Having a trusted group you can bounce ideas/findings off helps both yourself and potentially everyone in the group who may not have the time to otherwise investigate. The increased pool of experience and potential access to more varied test data are added bonuses as well. There is also an inherent double checking of your analysis.
Communicate your ideas often. Even if you start feeling like a spam monkey, realize that people can come up with amazing ideas/suggestions when prompted with the right stimulus.
Share your innovation with the community - they may be able to help you improve it and/or adapt it for another purpose that you never would have thought of.

Choose your team wisely though. There are some "One way transaction" types who you can help and then never hear from again. Be aware that it is a small community and word does get around about potential time wasters/bullshitters. 
Alternatively, you might be contacted by some rude farker after some free advice/labour - eg "You seem like you know what you are doing. Here's my problem ..."
Realize that being polite/considerate goes a long way to building the required level of trust. Recognize that you are probably asking someone to give up their free time for your cause.
Give team mates a default "opt out" of receiving your spam. For example, "If you wish to keep receiving these types of emails, please let me know. Otherwise, Thankyou for your time." and if you don't hear back, stop sending shit. Most people in forensics will be keen to discover new artifacts/research but be sure to try to organize your thoughts before hitting send.

Manage people's expectations. If you don't know or are not sure - it is better to under promise and over deliver later. Don't feel bad about saying "I don't know" or "I'm currently working on other things and don't have the time right now".

Luck


I believe that you can make your own "Luck" through being prepared when the opportunity presents itself.
For example, I had difficulties landing a forensics job after finishing my graduate studies in Forensic Computing. The market here in Monkeytown was relatively small compared to the US.
Through personal research projects that I blogged about and multiple US internships, I was able to land a rare and Monkeytown based forensic research dream job for which I am still counting my blessings. Having a documented prior body of work helped make the recruitment process so much easier (it also helped that there were technical people in charge of the recruiting).
Pure forensic research jobs seem to be rare in this industry - most positions seems to require a significant element of case work/billable hours. So I really appreciate the ability to pick an area or device and "research the shit out out of it".

On the other hand, occasionally in a case, you can have some plain old good fortune such as when Cindy Murphy and I were looking at a Windows Phone 8 device and we found an SMS stating "Da Code is ..." (which ended up being the PIN code for the phone).


Questions?



I just included this slide because I think it was one of my better 'toons in the slide deck :)

Final Thoughts

Physical fitness and rest are also important factors in staying creative. In the past, I've had some difficulties sleeping which obviously had an adverse affect on my work. A light regimen of regular exercise (15 minutes x 3 times per week) on the stationary bike has worked wonders on my tiredness levels and aerobic fitness. The paunch still remains a work in progress however ;)
For those interested, check out Dr Michael Mosely and Peta Bee's excellent research book on High Intensity Training (HIT) called FastExercise. It shows how you don't have to spend a huge amount of time at the gym to start seeing some immediate health benefits.

So long as you remain committed to learning, the innovation will come. Don't sweat about the non creative periods.

Learning to script is a good way of forcing you to understand how data is stored at the binary level. Python is a popular choice in forensics for its readability, many existing code libraries and large user base.

A library of "most likely to be encountered" test devices can help you to create before/after reference data sets to validate your research. These may be sourced privately from online (eg eBay) or from previous cases.

When public speaking, I have to learn to project my voice more. Elgin from the SANS AV crew kindly took the time after the panel to advise me to speak more from the diaphragm in the future. Concrete feedback like this is the best way to improve my speaking ability. Having said that, maybe monkey also needs to dose up on the caffeine before the next panel so he can react quicker/with more urgency. I'm guessing experience is the best teacher though.

The 2016 SANS DFIR Summit Presentation Slides are now available from here. Get them while they're hot!

Special Thanks to Jennifer Santiago (Director of Content Development & SANS Summit Speaker Wrangler) for her patience in dealing with this first time speaker/panellist.
Special Thanks also to my fellow panellists Andrew, Chris, Cindy and Heather for welcoming this monkey as a peer rather than a curiosity.

Not to get all heavy and philosophical on you but I found this quote that pretty much sums up my thoughts on innovation. It is from Nguyen Quyen who apparently was a Vietnamese Anti-French Colonist from the early part of the 20th Century. Ain't Google great?

"Successful innovation is not a single breakthrough. It is not a sprint. It is not an event for the solo runner. Successful innovation is a team sport, it's a relay race."

Good luck quoting that on a panel and not sounding like a complete wanker though ;)

If anyone has some suggestions for how I can improve my panel talking skills or would like to share some tips on innovation in mobile forensics, please leave a comment. Thanks!



Sunday, 15 May 2016

The Chimp That Pimps And An Introduction to e.MMC Flash Memory Forensics

Pimpin Ain't Easy?

SANS is offering the top 3 referrers to its DFIR Summit 2016 website, an Amazon Echo smart speaker.
As of 11 May 2016, this Chimpy McPimpy was number 5 on the list.
Chimpy would very much like to win an Echo (echo, echo) so he can take it apart and share what forensic artifacts are left on the device.

The Echo is a smart speaker that can listen out for voice commands, play music, search the Internet and control Internet Of Shitty Things. Apparently, more than 3 million have been sold in the US since 2014.

Here's a (pretty meh) Superbowl commercial demonstrating some of the Echo's capabilities:



And here's the Wikipedia entry for the Amazon Echo just so monkey doesn't have to regurgitate any further (I already have enough body image issues).

The folks at Champlain College have also recently blogged about their Amazon Echo forensic research (here, here and here).
They have a report due out this month (May 2016).
From what this monkey can ascertain, their research focuses on network captures and the Amazon Echo Android App side of things. They also mentioned looking into "chipping off" the device but I'm not sure if this was a core part of their research as it wasn't mentioned in later posts.

So Monkey is proposing this - (if you haven't already) please follow this link to the SANS DFIR Summit website and if monkey manages to win an Amazon Echo, he will blog about getting to that sweet, sweet, echoey data from the internal Flash memory. See here  and here  for some background on Flash memory.

How do we know it uses Flash memory?
The awesome folks at ifixit.com have already performed a teardown which you can see here.

From ifixit.com's picture of the logic board (below), we notice the Flash memory component bearing the text SanDisk SDIN7DP2-4G (highlighted in yellow).

Amazon Echo's Logic board

Searching for the Flash storage component(s) on most devices (eg phones, tablets, GPS, answering machines, voice recorders) starts with Googling the various integrated circuit (IC) chip identifiers. The Flash memory component is normally located adjacent to the CPU (minimizes interference/timing issues).
In this case, the ifixit.com peeps have helpfully identified/provided a link to the 4 GB SanDisk Flash memory chip.
But if we didn't have that link, we would try Googling for "SanDisk SDIN7DP2-4G" and/or "SanDisk SDIN7DP2-4G +datasheet" to find out what type of IC it was.
According to this link - for the 4th quarter of 2015, Samsung's NAND revenue (33.8%) led Toshiba (18.6%), SanDisk (15.8%), Micron (13.9%), SK Hynix (10.1%) and Intel (8%). Other (smaller) manufacturers such as Phison, Sony, Spansion were not mentioned. Not sure how accurate these figures are but if you see one of these manufacturers logos/name on a chip, you have probably found a NAND memory chip of some kind (eg Flash, RAM).

Anyhoo, from the link that ifixit.com provided we can see the following text:
SDIN7DP2-4G,153FBGA 11.5X13 e.MMC 4.51
Here's what it all means:
153 FBGA (Fine pitched, Ball Grid Array) means there are 153 pin pads arranged in a standard way.
The 11.5X13 refers to the chips dimensions in millimetres.
The e.MMC 4.51 tells us the chip adheres to the Embedded Multi-Media Card (e.MMC) standard (version 4.51) for NAND Flash chip interfacing. We will discuss the e.MMC standard a little further on.

To double check ifixit.com's data link, we did some Googling and found this link which seems to confirm from multiple sites that the SanDisk Flash chip is 153 FBGA and 11.5 x 13.
Ideally, we would have found the actual datasheet from SanDisk but sometimes you just gotta make do ...


It is also worth noting that not all Flash memory chips are e.MMC compatible. Some devices may use their own proprietary NAND interface. Some chips might be NOR Flash (eg Boot ROM) and thus not really relevant to our quest for user data.
Additionally, the latest Flash memory chips may follow a newer (faster, duplex) standard called Universal Flash Storage (UFS). See here for more details on UFS.
So while it appears the days of e.MMC chips are numbered, there's still a LOT of e.MMC storage devices out there that can be potentially read.

When reading Flash storage for forensics, some key considerations are:
- Does it follow the e.MMC standard?
- Chip pin arrangement (number of pins and spacing)
- Chip dimensions (typically in mm)

The e.MMC standard is used by Flash memory chip manufacturers to provide a common infrastructure / command set for communicating. This way a board manufacturer can (hopefully) substitute one brand of eMMC chip with another brand (probably cheaper) of the same capacity. The standard focuses on the external eMMC chip interfacing and not the internal NAND implementation (which would be manufacturer specific). Having a e.MMC Flash chip makes reading a whole lot easier.

But don't just listen to me, JEDEC - the folks responsible for the eMMC standard (and UFS), state :
"Designed for a wide range of applications in consumer electronics, mobile phones, handheld computers, navigational systems and other industrial uses, e.MMC is an embedded non-volatile memory system, comprised of both flash memory and a flash memory controller, which simplifies the application interface design and frees the host processor from low-level flash memory management. This benefits product developers by simplifying the non-volatile memory interface design and qualification process – resulting in a reduction in time-to-market as well as facilitating support for future flash device offerings. Small BGA package sizes and low power consumption make e.MMC a viable, low-cost memory solution for mobile and other space-constrained products."

To get a copy of the e.MMC standard (free registration required), check out this link.

The e.MMC standard document provides this helpful diagram:

JEDEC e.MMC Electrical Standard v5.1

From this we can see that a "Device controller" handles any interfacing with the actual NAND storage ("Memory Array"). This includes things like reading/writing to NAND, paging, TRIM, error correction, password protection.

There are 4 signals/pins required when reading an e.MMC memory:
- CLK = Synchronizes the signals between the e.MMC chip and the "Host Controller" (ie CPU of device)
- CMD = For issuing commands/receiving command replies from/to the "Host Controller"
- DATA0 = For receiving the data at the "Host Controller"
- VCC / VCCQ = Power for the NAND memory / Power to the Device Controller. In some cases, this can be the same voltage (1.8 V)
- GND / VSS = Ground

It is not a co-incidence that these connections are also required for In-System Programming (ISP) Forensics. But that is probably a topic more suitable for a Part 2 (hint, hint).

We can see these pins labelled in this ForensicsWiki diagram of a BGA 153 e.MMC chip
 
BGA-153 Layout


Note: ForensicsWiki have labelled it as BGA169 but it does not show the extra 16 (typically unused) pins. Count the number of pins (I dare you!) - there's only 153. At any rate, our target SanDisk chip should look like the BGA153 diagram above. Most of the pins are unused / irrelevant for our reading purposes.
The ever helpful GSMhosting site shows us what a full BGA 169 looks like:

BGA-169 Layout - the extra 16 pins comprise the 2 arcs above/below the concentric squares

Other pin arrangements we've seen include BGA162/186 and BGA/eMCP221. Some Flash chips are combined in the same package as the RAM. These are called eMCP (Multi-Chip Package).
Control-F Digital Forensics have blogged an example list which matches some common devices with their e.MMC pin arrangement/size. They also note that the pitch (spacing between pins) for the previously mentioned layouts is 0.5 mm.

So here's what BGA-162 looks like:

BGA-162 Layout (Source: http://forum.gsmhosting.com/vbb/11016505-post9.html)


And a BGA/eMCP221 looks like:

BGA/e.MCP221 Layout (Source: http://forum.gsmhosting.com/vbb/11260019-post6.html)

Final Thoughts

Due to e.MMC standardisation, reading the data off an e.MMC Flash chip should be straight forward and repeatable - which is great for forensics. Interpreting the subsequent data dump artifacts is usually a more challenging task.
The e.MMC Flash memory content discussed in this post applies equally to Smartphones, Tablets etc.

UPDATE: For even more details on Flash Memory Forensics, check out the following papers:
Forensic Data Recovery from Flash Memory

By Marcel Breeuwsma, Martien de Jongh, Coert Klaver, Ronald van der Knijff and Mark Roeloffs
SMALL SCALE DIGITAL DEVICE FORENSICS JOURNAL, VOL. 1, NO. 1, JUNE 2007

and

Theory and practice of flash memory mobile forensics (2009)
By Salvatore Fiorillo
Edith Cowan University, Western Australia


The paper by Breeuwsma et al. is probably THE paper on Flash memory Forensics.

Please don't forget to click on this link so Monkey can get his Precious Amazon Echo. You might like to do it from a VM if you're worried about security.
If, for whatever reason, monkey doesn't get an Echo - it's no big deal. Just thought it would make for an interesting exercise as we head towards the Internet of Lazy Fatties ... At the very least, we have learnt more about performing e.MMC Flash memory forensics.

In other news, in June 2016, this monkey will be:
- Attending his first SANS DFIR Summit
- Speaking on a "Innovation in Mobile Forensics" panel with Cindy Murphy, Heather Mahalik , Andrew Hoog and Chris Crowley. Monkey is still pinching himself about joining the collective brain power of that panel *GULP*
- Facilitating/Rockin' the Red Apron for SANS FOR585 Advanced Smartphone Forensics with Cindy Murphy (just after the DFIR Summit)

So if you see me around (probably hiding behind/near Cindy or Mari DeGrazia), feel free to say hello and let us know if this blog site has helped you ... I promise I'll try not to fling too much shit (while you're facing me anyway. Hint: Keep eye contact at all times!).

As always, please feel free to leave feedback regarding this post in the comments section below.