Convert xxd output to shellcode

15

1

Taking some output from xxd and turning it into usable shellcode by hand is no fun, so your job is to automate the process.

Rules

Your submission can be a function, lambda, script, or any reasonable equivalent of those. You may print the result, or if your submission is a function/lambda then you may also return it.

You program must take three arguments, the first being a string containing the output of xxd, ran with no arguments other than a filename, like this: xxd some_file. Here's an example of what the the first argument will look like:

00000000: 31c0 b046 31db 31c9 cd80 eb16 5b31 c088  1..F1.1.....[1..
00000010: 4307 895b 0889 430c b00b 8d4b 088d 530c  C..[..C....K..S.
00000020: cd80 e8e5 ffff ff2f 6269 6e2f 7368 4e58  ......./bin/shNX
00000030: 5858 5859 5959 59                        XXXYYYY

Your need to take that middle section containing the bytes (the first 8 columns after the :) and turn it into shellcode by removing any whitespace, then putting a \x before each byte.

Here's what the output should be for the input above (ignoring any other arguments):

\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80\xeb\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x4e\x58\x58\x58\x58\x59\x59\x59\x59

You can assume the first argument will always be a valid xxd output, ran with no arguments other than the filename.

Your output should also be a string where the backslashes are literal backslashes, not used as escape characters. So when I say "\x65", I'm not talking about the byte 0x65, or even the letter "A". In code, it would be the string "\x65".

The second argument specifies where in the xxd output the shellcode should start, and the third specifies where it should end. If the third argument is -1, it will end at the end of xxd output. The second and third argument will also always be non negative, except for when the third is -1

Here are some test cases:

Argument 1:

00000000: 31c0 b046 31db 31c9 cd80 eb16 5b31 c088  1..F1.1.....[1..
00000010: 4307 895b 0889 430c b00b 8d4b 088d 530c  C..[..C....K..S.
00000020: cd80 e8e5 ffff ff2f 6269 6e2f 7368 4e58  ......./bin/shNX
00000030: 5858 5859 5959 59                        XXXYYYY

Argument 2: 7, Argument 3: e (these are both strings representing hexadecimal numbers)

Output: \xc9\xcd\x80\xeb\x16\x5b\x31\xc0

Argument 1:

00000000: 31c0 b046 31db 31c9 cd80 eb16 5b31 c088  1..F1.1.....[1..
00000010: 4307 895b 0889 430c b00b 8d4b 088d 530c  C..[..C....K..S.
00000020: cd80 e8e5 ffff ff2f 6269 6e2f 7368 4e58  ......./bin/shNX
00000030: 5858 5859 5959 59                        XXXYYYY

Argument 2: 0, Argument 3: 2e

Output: \x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80\xeb\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x4e

Argument 1:

00000000: 31c0 b046 31db 31c9 cd80 eb16 5b31 c088  1..F1.1.....[1..
00000010: 4307 895b 0889 430c b00b 8d4b 088d 530c  C..[..C....K..S.
00000020: cd80 e8e5 ffff ff2f 6269 6e2f 7368 4e58  ......./bin/shNX
00000030: 5858 5859 5959 59                        XXXYYYY

Argument 2: a, Argument 3: -1

Output: \xeb\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x4e\x58\x58\x58\x58\x59\x59\x59\x59

The code with the least bytes wins. The winner will be announced in seven days, on August 15, 2016 (but submissions after then are still appreciated).

Update

Congrats to @Adnan to winning the challenge!

addison

Posted 2016-08-08T14:03:50.000

Reputation: 993

Just to clarify, can entries return a string or must they print it? – Jordan – 2016-08-08T15:24:53.213

Returning a string is fine as long as it's a function, lambda, or something like that (I updated the rules to specify that after you asked). – addison – 2016-08-08T15:26:04.640

1Can we also return the regular ASCII codes when the code is printable? E.g. ~ instead of \x7e. And can we return \t instead of \x09? – orlp – 2016-08-08T15:38:00.673

@orlp Sorry no, it needs to be in a consistent format. – addison – 2016-08-08T16:05:34.533

Are the arguments required to be in hex? Also, the way you've given the second example, 7 looks like a zero-based index and e is a one-based index (e-7=7 but there are 8 hex codes in your output), or am I overlooking something? – Neil – 2016-08-08T18:22:21.097

@Neil e-7 doesn't account for the beginning byte of the range. The formula should be e-7+1. It's an inclusive range, so the beginning and ending indexes are both included. – addison – 2016-08-08T19:26:28.347

@addison make sure you explain all the components of your question in the question itself. For example, many golfers may not know what xxd code and shell code are – MayorMonty – 2016-08-09T00:40:03.240

You mention that start and end arguments in your examples are hexadecimal strings, but that isn't mentioned in the specification. If we submit a function, do we really have to parse two strings for the arguments? – Dennis – 2016-08-09T04:42:25.877

Answers

5

05AB1E, 39 38 bytes

Input in the form:

arg2
arg3
arg1

Code:

²\|vy9F¦}40£ðK}J2ô„\xì²H>²®Qi²}£¹HF¦}J

Explanation:

²\                                       # Get the first two inputs and discard them.
  |                                      # Take the rest of the input as an array.
   vy         }                          # For each line...
     9F¦}                                #   Ten times, remove the first character.
         40£                             #   Only remain the substring [0:40].
            ðK                           #   Remove spaces.
               J                         # Join the string.
                2ô                       # Split into pieces of 2.
                  „\xì                   # Prepend a "\x" at each string.
                      ²H                 # Convert the second line from hex to int.
                        >                # Increment by one.
                         ²               # Push the second input again.
                          ®Qi }          # If equal to -1...
                             ²           #   Push the second input again.
                               £         # Take the substring [0:(² + 1)].
                                ¹H       # Convert the first input from hex to int.
                                  F¦}    # Remove that many characters at the beginning.
                                     J   # Join the array and implicitly output.

Uses the CP-1252 encoding. Try it online!.

Adnan

Posted 2016-08-08T14:03:50.000

Reputation: 41 965

12

Bash + coreutils + xxd, 73 71 69 bytes

printf \\x%s `xxd -r|xxd -p -s0x$1 -l$[(e=1+0x$2)?e-0x$1:-1]|fold -2`

Expects the hexdump on STDIN and start/end as command-line arguments.

This prints some warnings to STDERR, which is allowed by default.

Dennis

Posted 2016-08-08T14:03:50.000

Reputation: 196 637

1I was hoping someone would actually use xxd in their solution! – addison – 2016-08-08T16:08:03.983

@addison I tried to, but my lang doesn't support newlines in commandline input. :c – Addison Crump – 2016-08-08T16:30:26.753

I am able to replace 16# with 0x? – Digital Trauma – 2016-08-08T21:30:57.883

@DigitalTrauma I thought it was an xxd thing, but it appears to work everywhere. – Dennis – 2016-08-08T22:10:26.697

1

Yes, bash parses 0xn hex style and 0m octal style numbers out of the box: https://www.gnu.org/software/bash/manual/bash.html#Shell-Arithmetic. echo $[0x2a] $[052].

– Digital Trauma – 2016-08-08T22:45:56.677

5

Ruby: 90 89 87 79 63 bytes

-2 bytes thanks to @addison
-8 bytes thanks to @PiersMainwaring

->s,x,y{'\x'+s.scan(/(?<=.{9})\w\w(?=.* )/)[x.hex..y.hex]*'\x'}

See the tests on repl.it: https://repl.it/Cknc/5

Jordan

Posted 2016-08-08T14:03:50.000

Reputation: 5 001

You can replace .join with *"" to save 2 bytes. – addison – 2016-08-08T19:50:38.307

You can replace .map{|h|h.to_i(16)} with .map(&:hex) to save 8 more bytes! – piersadrian – 2016-08-09T19:40:04.200

Thanks @PiersMainwaring! Silly of me to forget that. It actually saved me 16 because it turned out to be shorter to call .hex on the arguments individually! – Jordan – 2016-08-09T20:00:05.980

5

JavaScript, 84 bytes

(s,f,t,u)=>s.replace(/.*:|  .*\n?| /g,'').replace(/../g,'\\x$&').slice(f*4,++t*4||u)

Explanation: Removes all the unwanted parts of the dump, prepends \x to each hex pair, then extracts the desired portion of the result. ||u is used to convert the zero obtained by incrementing the -1 parameter into undefined which is a magic value that causes slice to slice to the end of the string. 101 bytes if f and t are strings of hex digits:

(s,f,t,u)=>s.replace(/.*:|  .*\n?| /g,``).replace(/../g,`\\x$&`).slice(`0x${f}`*4,t<0?u:`0x${t}`*4+4)

Neil

Posted 2016-08-08T14:03:50.000

Reputation: 95 035

Instead of (s,f,t,u)=>, you can do s=>f=>t=>u=>, to save a few bytes. – Ismael Miguel – 2016-08-09T08:30:37.623

@IsmaelMiguel Sorry, that only works for a function with exactly two actual parameters. In my specific case the u must be an additional parameter and can't be curried. – Neil – 2016-08-09T08:39:51.263

@IsmaelMiguel Also that's actually longer... – Jakob – 2018-06-17T01:40:16.063

4

Jelly, 48 44 bytes

ØhiЀɠ’ḅ¥®L’¤Ạ?⁴‘
ṣ⁷ṫ€⁵ḣ€40Fḟ⁶s©2ḣ¢ṫ¢[“\x”]p

This expects the hexdump as sole command-line argument, and the end and start points on STDIN, in that order, separated by a linefeed.

Try it online!

Dennis

Posted 2016-08-08T14:03:50.000

Reputation: 196 637

I'd love to see an explanation for this ;) – Conor O'Brien – 2016-08-09T00:17:45.860

I'll add one later, but I'll try to golf it a bit first. 51 bytes of Jelly vs 69 bytes of Bash can't be right... – Dennis – 2016-08-09T00:19:00.350

3

PowerShell v2+, 175 157 142 133 129 bytes

param($a,$b,$c)'\x'+(($z=$a-split"`n"|%{$_[10..48]-ne32-join''-split'(..)'-ne''})["0x$b"..(("0x$c",$z.count)[$c-eq-1])]-join'\x')

Takes input $a, $b, $c, with $a as either a literal newline-separated string, or with the PowerShell `n character separating the lines. We set helper string $z as the heavily processed $a as follows --

First we -split on newlines, then, for each line |%{...}, we slice the middle section [10..48], use the -ne32 to remove spaces, -join it back together into one long string, -split it on every two characters (keeping the two characters), and -ne'' to remove the empty elements. This results in an array of two-element strings, like ('31','c0','b0'...).

We then slice into that array based on $b cast with the hexadecimal operator up to the value of $c. We need to use a pseudo-ternary here that accounts for whether $c is -1 or not. If it is, we choose the .count (i.e., the end element) of $z. Otherwise, we just prepend the 0x hexadecimal operator with $c in a string. Note that this is zero-indexed.

That array slice has its elements -joined together with a literal \x to form one string. That's prepended with another literal \x and the result is left on the pipeline. Printing is implicit.

Example

PS C:\Tools\Scripts\golfing> .\xxd-output.ps1 "00000000: 31c0 b046 31db 31c9 cd80 eb16 5b31 c088  1..F1.1.....[1..
00000010: 4307 895b 0889 430c b00b 8d4b 088d 530c  C..[..C....K..S.
00000020: cd80 e8e5 ffff ff2f 6269 6e2f 7368 4e58  ......./bin/shNX
00000030: 5858 5859 5959 59                        XXXYYYY" a -1
\xeb\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x4e\x58\x58\x58\x58\x59\x59\x59\x59

AdmBorkBork

Posted 2016-08-08T14:03:50.000

Reputation: 41 581

Can you access shell with this language? – Addison Crump – 2016-08-08T15:40:57.643

@VTCAKAVSMoACE In theory, given the new Windows Subsystem for Linux, it should be possible to pipe things together and/or pass parameters via command line. Implementation is left as an exercise for the reader. ;-)

– AdmBorkBork – 2016-08-08T15:43:11.623

2

Jelly, 39 38 37 bytes

ØhiⱮɠ’ḅ¥ȷ9Ṃ?⁴‘
Ỵṫ€⁵ḣ€40Fḟ⁶s2ṭ€⁾\xḣ¢ṫ¢

Try it online!

Now beats 05AB1E! (despite the lack of "convert from hexadecimal" builtin)

Same input format as Dennis' solution.

Use , which is a new feature (short for Ѐ). Without it, this would take 38 bytes.

user202729

Posted 2016-08-08T14:03:50.000

Reputation: 14 620

Only works for input with len up to 1e9. – user202729 – 2018-06-15T07:20:49.350

But if it's on FAT32 (where input size is at most 2GB) it's enough. – user202729 – 2018-06-15T07:45:43.043

1

Perl, 114 bytes

($_,$a,$b)=@ARGV;s/^.*:|\S*$|\s//gm;@_=(m/../g);for(@_){s/^/\\x/}$"='';say substr"@_",4*$a,$b!=-1?4*($a+$b):2<<20;

Arguments given on the command line as a quoted string followed by two numbers. The numbers are taken in decimal (I know hex was used in the examples but it wasn't specified in the post)

Technically only works on inputs with up to 2^21 bytes since perl's substring method is silly

theLambGoat

Posted 2016-08-08T14:03:50.000

Reputation: 119

Apparently the range is inclusive, so for instance 7 to e should result in a string of length 32. – Neil – 2016-08-08T20:21:28.170

1

Python, 140 bytes

lambda O,a,b:''.join(sum([['\\x'+x[:2],('','\\x')[len(x)>2]+x[2:]]for x in O.split()if len(x)<5],[])[int(a,16):(int(b,16)+1,None)[b=='-1']])

https://repl.it/ClB3

Splits the original string and dumps the elements if they're less than five characters, prepends \x, and slices by the second and third arguments.

162 byte version if we need to handle other types of output not specified by the question:

import re
J=''.join
def f(x,a,b):r=J(J(re.findall(':(.*?)  ',x)).split());print J(['\\x'+i+j for i,j in zip(r,r[1:])][::2][int(a,16):(int(b,16)+1,None)[b=='-1']])

atlasologist

Posted 2016-08-08T14:03:50.000

Reputation: 2 945

This won't work if, e.g., the last line is something like 00000030: 5858 58 XXX since it'll pull out the last portion and you'll get something like \x58\x58\x58\xXX\xX. – AdmBorkBork – 2016-08-09T12:43:48.470

@TimmyD I didn't think that case needed to be handled, going off the specs of the challenge. – atlasologist – 2016-08-09T12:51:41.193

I read the challenge as the given first argument is just an example, so there could be other xxd output used as the argument instead. "Here's an example of what the the first argument will look like:" – AdmBorkBork – 2016-08-09T13:01:44.023

0

Python 2 and 3 - 164 162 150 146 134 150 bytes

Now accepts hex strings for second and third arguments.

j=''.join
def f(a,b,c):s=j(x[10:49].replace(' ','')for x in a.split('\n'));print(j('\\x'+s[i:i+2]for i in range(int(b,16)*2,1+2*int(c,16)%len(s),2))

ceilingcat

Posted 2016-08-08T14:03:50.000

Reputation: 5 503

0

Python 3.5, 125 bytes

import re
lambda s,b,e:re.sub(r'(\w\w)',r'\\x\1',re.sub(r'^.*?:|  .*$|\s','',s,0,8)[2*int(b,16):[2*int(e,16)+2,None][e<'0']])

Ungolfed:

def f(s,b,e):
    b = 2*int(b,16)
    e = [2*int(e,16)+2,None][e<'0']
    x = re.sub(r'''(?v)   # verbose (not in golfed version)
            ^.*?:     # match begining of line to the ':'
           |  .*$     # or match '  ' to end of line
           |\s        # or match whitespace
           ''',
           '',        # replacement
           s,
           0,         # replace all matches 
           re.M       # multiline mode
           )
    y = re.sub(r'(\w\w)', # match pairs of 'word' characters
           r'\\x\1',  # insert \x
            x[b:e])
    return y

RootTwo

Posted 2016-08-08T14:03:50.000

Reputation: 1 749