Escape control characters in a C string

1

C/C++ use double quotes "" for making string literals. If I want the double-quote itself, or certain other characters, to appear in the string, I have to escape them, like so:

char *s2 = "This is not \"good\".\nThis is awesome!\n";

Here, I used \" to represent a double-quote, and \n to represent the newline character. So, if my program prints the string, I see

This is not "good".
This is awesome!

However, if I examine the string in a debugger (Visual Studio or gdb), it shows me the escaped form - the one that would appear in the source code. This is good because it allows to unambiguously identify whether the string contained that \ backslash character verbatim or as part of an escape-sequence.

The goal of this challenge is to do the same job the debuggers do, but better - the escaped string must contain as few escaping characters as possible! The input to your program (or subroutine) is a string of arbitrary 8-bit bytes, and the output must be suitable for putting into the following boilerplate C code:

#include <stdio.h>
int main() {char str[] =
"### the escaped string goes here ###";
puts(""); // put a breakpoint here to examine the string
puts(str); return 0;}

For example:

Input (hexdump, 3-byte string): 64 0a 06

Output: d\n\6

Here, \006 or \x6 have the same meaning as \6, but longer and are therefore unacceptable. Likewise, \012 and \xa have the same meaning as \n, but only \n is an acceptable encoding for the byte with the value 10.


Another example:

Input (hexdump, 5-byte string): 61 0e 38 02 37

Output: a\168\0027

Here, code 0x0e is escaped as \16, and parsed successfully by the C compiler because 8 is not a valid octal digit. However, the code 0x02 is escaped as \002, not \2, to avoid confusion with the byte \27.

Output: a\168\2\67 - another possible form


See Wikipedia for the exact definition of escaping in C.

Additional (reasonable) rules for this challenge:

  • Output must be printable ASCII (no tab characters; no characters with codes above 126)
  • The input can be in the form of a string or an array of 8-bit elements. If your language doesn't support it, use the closest alternative. In any case, there should be no UTF encoding for input.
  • No trigraphs, so no need to escape the ? character
  • No need to handle the null character (code 0) in input - let it be an end-of-string marker, or a regular byte like others - whatever is more convenient
  • Code golf: shortest code is the best
  • You are not allowed to vandalize the Wikipedia article to bend the rules :)

P.S. Summary of the Wikipedia article:

Input byte | Output | Note
0x01       | \1     | octal escape
0x02       | \2     | octal escape
0x03       | \3     | octal escape
0x04       | \4     | octal escape
0x05       | \5     | octal escape
0x06       | \6     | octal escape
0x07       | \a     | alarm
0x08       | \b     | backspace
0x09       | \t     | tabulation
0x0a       | \n     | newline
0x0b       | \v     | vertical tabulation
0x0c       | \f     | form feed
0x0d       | \r     | return
0x0e       | \16    | octal escape
0x0f       | \17    | octal escape
...        |        | octal escape
0x1f       | \37    | octal escape
0x22       | \"     | escape with a backslash
0x5c       | \\     | escape with a backslash
0x7f       | \177   | octal escape
0x80       | \200   | octal escape
...        |        | octal escape
0xff       | \377   | octal escape
other      |        | no escape

anatolyg

Posted 2015-09-29T18:57:34.223

Reputation: 10 719

solutions can be in any language, or must be C ? – Jasen – 2015-09-29T22:14:22.860

Any language, of course! – anatolyg – 2015-09-30T06:23:12.107

In your hexdump example, I assume you mean 64 0A 06? – Alchymist – 2015-10-05T09:33:53.757

@Alchymist I do! Confusion fixed. – anatolyg – 2015-10-05T09:35:48.293

Answers

2

JavaScript (ES6) 152

First time ever that .reduceRight seems useful. (But probably the good old .map can do better). The point is, I need to see the next character before deciding whether to output a short or long octal escape.

f=s=>[...s].reduceRight((a,c)=>
((x=c.charCodeAt())==34|c==(q='\\')?q+c:x>31&x<127?c:q+('abtnvvr'[x-7]||(a>='0'&a<'8'?512+x:x).toString(8).slice(-3)))+a
)

edc65

Posted 2015-09-29T18:57:34.223

Reputation: 31 086