Recover UTF16 Strings from Slack Space

Question

Using the disk image provided by the CFReDS project here, we are tasked with recovering deleted text, much in Russian, some in English, in UTF16BE. In the allocated space, this goes relatively quickly using fls and icat. However, some data seems to be in the unallocated (slack) space. If this were ASCII, we could use grep with "abi" parameters. But that's not working.

This is different than the question here which is more straight forward and can be solved with grep -abi - but, as this problem necessitates the recovery of UTF 16 the problem is a bit different and seems related more to questions such as this asking how to grep UTF16. Also, the answer provided here is a bit skimpy on details. For those seeking to simply recover ASCII text and deleted files from slack space, the walk through provided by Linux LEO here is much more detailed.

I've tried using XXD to dump the hex through sed to remove 'B9' and from hex back, with no luck. This issue of grepping non ASCII in slack space seems to be an issue that interests a few people, (cf here ). I tried looking at liblightgrep here, which unfortunately fails on an unspecified build dependency (right after libboost_options).

How can I recover non-ASCII text that has had the file headers purposefully removed from slack space?

Please see the explanation of how this is not a duplicate. – d-cubed Apr 15 '17 at 15:36 — d-cubed, Apr 15 '17 at 15:36

score 2 · Accepted Answer · edited Apr 14 '17 at 13:27

Maybe strings can be of help? With the -eb parameter it will extract UTF-16BE strings.

As for Russian characters not being found, I am not aware of a tool that will find them directly. But, Python to the rescue:

russian_str = u''
f = open('fakeslack', 'rb')

while True:
    c = f.read(2)
    if c == '':
        break
    if c[0] == '\x04':  # russian chars start with 04
        russian_str += (c[1] + c[0]).decode('utf16')  # big endian
    else:
        if russian_str != '':
            print(russian_str)
            russian_str = ''

This works for me on a fake slack file that I generated by pasting some words from Russian Wikipedia between some English words and whitespace:

$ cat fakeslack 
foobar
fewufew
fweu
ncd
Википедию кораблях  fweufweАйоваnwefwe
$ recode utf8..utf16be fakeslack 
python ru_strings.py
Википедию
кораблях
Айова

Cool. Now, I'll just need to tweak the script to show English and Russian characters. I was hoping that there was a way with just xxd, bbe, or other Linux tools, but it was looking increasingly as if just coding a solution would be easier. — d-cubed, Apr 14 '17 at 13:09

Recover UTF16 Strings from Slack Space

1 Answers1