2

Suppose I have received a file (doesn't matter what it is - document, image, video, audio, etc). I know that the operating system and also programs that create the file like Office, and even hardware like digital cameras, etc. store a lot of metadata in the file.

Some of these files like MS Office contain some of the metadata in the file itself, while others it seems that Windows "knows" metadata about the file that is not contained in the file. Eg. I create a notepad document and it knows creation date, last access, etc.

I understand that some of this information is kept in the file system itself, but there are many things that I don't see where they are kept.

I have three questions:

  1. What are the different places that metadata is stored about a file?
  2. Is there a free/open source tool that can extract metadata from basically any file you give it (like VLC plays basically any media file)?
  3. Suppose I am doing a forensic analysis of a file, what are the steps I should follow to make sure I get maximum information about the file (especially from metadata)?
schroeder
  • 123,438
  • 55
  • 284
  • 319
ose
  • 143
  • 5
  • These are properties of the file in the OS, I think. You would use stat on them to get this extra information. – Lighthart Apr 27 '16 at 16:49
  • Open the file in a hex editor. You will see everything that isn't binary. Make a text file in notepad; you won't see anything besides the text. – multithr3at3d Jul 31 '19 at 23:54

4 Answers4

7
  1. Where metadata is stored will be up to the OS and the file that created it (as you say about Notepad and Word docs). Some file types even create a separate file just to hold the metadata.
  2. Because of #1, there is no free "give me all the metadata" tool. There are tools that can find the metadata of a wide range of well-known file types, though.
  3. Because of #1, it would take too long to try and lay out all the steps required to find the "maximum" amount of data.
schroeder
  • 123,438
  • 55
  • 284
  • 319
  • OK can you answer with respect to the main types of files one would expect to find in an investigation? Even let's say Images, docs, media. – ose Apr 27 '16 at 16:58
  • 1
    @ose the type of file is not the issue - it's how the file was created, and the OS on which it resides. – schroeder Apr 27 '16 at 17:28
  • It's also worth noting that to get all that metadata it's not sufficient to "receive the file" but you need to receive the *filesystem* where the file resides - either the hardware itself or an image of that filesystem. If you receive only "that file" (i.e. not *that* file but *a copy* of that file) then you don't get any metadata that's outside the file contents. – Peteris Jul 30 '19 at 22:25
4

You can use Apache Tika and create your own program to extract metadata, it is pretty easy to do and here is a tutorial on how to do that. As the other answer says there is no surefire way to extract metadata from every type of file but Tika covers an alright amount.

Ian
  • 156
  • 4
2

The Unix/Linux file command will extract a lot of metadata inside files, and if you are using Windows, you can install cygwin to gain access to that command or for recent Windows 10 versions, WSL (Windows Subsystem for Linux).

Some example output:

C:\Users\stewmark\ScreenShots>file *.png
ChangePW.png:                  PNG image data, 1167 x 1046, 8-bit/color RGB, non-interlaced
ChangePW_link.png:             PNG image data, 603 x 468, 8-bit/color RGB, non-interlaced
Color_Wheel.png:               PNG image data, 306 x 391, 8-bit/color RGB, non-interlaced

C:\Users\stewmark\>file *.xlsx
Project Plan_25March2016.xlsx:                 Microsoft Excel 2007+
Charges Preview SummaryClient_20160420.xlsx:   Microsoft OOXML
Invoice Details Report_20160414.xlsx:          Microsoft OOXML

C:\Users\stewmark\Music\Seal\Fly Like an Eagle>file *.mp3
01 Fly Like an Eagle [Radio Edit].mp3:   Audio file with ID3 version 2.3.0
02 Fly Like an Eagle [Instrumental].mp3: Audio file with ID3 version 2.3.0
Mark Stewart
  • 159
  • 1
  • 2
  • 15
  • Yes this would give the type of the file, but this is something I already know. What interests me is the metadata about the file such as when you right click a file in Windows Explorer and it shows various information in the "properties" window, like where a photo was taken, the type of camera used etc. – ose Apr 27 '16 at 19:37
  • 1
    http://www.sno.phy.queensu.ca/~phil/exiftool/ exiftool for Windows is great for images; that data you mention is actually in the data file; but other items in the Windows property window such as "Rating" "Tags" and "Assistant's Name" etc. are probably also file specific. But as was mentioned file size, access time, create time, and _I think_ whether an executable file was downloaded from the Internet are stored in the file system. – Mark Stewart Apr 27 '16 at 19:50
2

To complete #2 of this answer, there exists exiftool, which can show you the metadata (inside the file as well as file system metadata) of quite a range of file types, ranging from JPEG images over PDF files to Microsoft Word documents. It surely can't parse any file type, but for me it was able to extract metadata from files in most cases.

Example output:

$ exiftool /usr/share/texlive/texmf-dist/tex/latex/pdfslide/bg.jpg
ExifTool Version Number         : 10.40
File Name                       : bg.jpg
Directory                       : /usr/share/texlive/texmf-dist/tex/latex/pdfslide
File Size                       : 11 kB
File Modification Date/Time     : 2006:01:13 01:02:12+01:00
File Access Date/Time           : 2018:09:14 18:40:02+02:00
File Inode Change Date/Time     : 2017:03:20 12:29:01+01:00
File Permissions                : rw-r--r--
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
JFIF Version                    : 1.01
Resolution Unit                 : inches
X Resolution                    : 66
Y Resolution                    : 66
Image Width                     : 652
Image Height                    : 492
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
Image Size                      : 652x492
Megapixels                      : 0.321
$ exiftool /usr/share/texlive/texmf-dist/tex/latex/notes/info.pdf
ExifTool Version Number         : 10.40
File Name                       : info.pdf
Directory                       : /usr/share/texlive/texmf-dist/tex/latex/notes
File Size                       : 3.5 kB
File Modification Date/Time     : 2008:09:20 20:31:15+02:00
File Access Date/Time           : 2018:09:14 18:41:46+02:00
File Inode Change Date/Time     : 2017:03:20 12:29:01+01:00
File Permissions                : rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : No
Page Count                      : 1
XMP Toolkit                     : XMP toolkit 2.9.1-13, framework 1.6
About                           : cc6b5cda-bf5e-11e8-0000-fcfe446dd206
Producer                        : GPL Ghostscript 8.62
Modify Date                     : 2008:09:20 20:30:50+02:00
Create Date                     : 2008:09:20 20:30:50+02:00
Creator Tool                    : fig2dev Version 3.2 Patchlevel 4
Document ID                     : cc6b5cda-bf5e-11e8-0000-fcfe446dd206
Format                          : application/pdf
Title                           : info.fig
Creator                         : karl@tug.org \(Karl Berry\)
Author                          : karl@tug.org (Karl Berry)
$ exiftool /usr/share/clamav-testfiles/clam.ole.doc
ExifTool Version Number         : 10.40
File Name                       : clam.ole.doc
Directory                       : /usr/share/clamav-testfiles
File Size                       : 16 kB
File Modification Date/Time     : 2018:07:21 13:13:59+02:00
File Access Date/Time           : 2018:09:14 18:43:18+02:00
File Inode Change Date/Time     : 2018:08:01 06:51:25+02:00
File Permissions                : rw-r--r--
File Type                       : DOC
File Type Extension             : doc
MIME Type                       : application/msword
Title                           : 
Subject                         : 
Author                          : acab
Keywords                        : 
Comments                        : 
Template                        : Normal.dot
Last Modified By                : acab
Revision Number                 : 1
Software                        : Microsoft Office Word
Total Edit Time                 : 0
Create Date                     : 2008:08:03 22:09:00
Modify Date                     : 2008:08:03 22:09:00
Pages                           : 1
Words                           : 3
Characters                      : 18
Security                        : None
Code Page                       : Windows Latin 1 (Western European)
Company                         : 
Lines                           : 1
Paragraphs                      : 1
Char Count With Spaces          : 20
App Version                     : 11.5606
Scale Crop                      : No
Links Up To Date                : No
Shared Doc                      : No
Hyperlinks Changed              : No
Title Of Parts                  : 
Heading Pairs                   : Titolo, 1
Comp Obj User Type Len          : 35
Comp Obj User Type              : Documento di Microsoft Office Word

Edit: I just noticed that Mark Stewart already mentioned exiftool in his comment to this answer mentioning the file command.

Axel Beckert
  • 175
  • 9