1
C/C++ use double quotes "" for making string literals. If I want the double-quote itself, or certain other characters, to appear in the string, I have to escape them, like so:
char *s2 = "This is not \"good\".\nThis is awesome!\n";
Here, I used \" to represent a double-quote, and \n to represent the newline character. So, if my program prints the string, I see
This is not "good". This is awesome!
However, if I examine the string in a debugger (Visual Studio or gdb), it shows me the escaped form - the one that would appear in the source code. This is good because it allows to unambiguously identify whether the string contained that \ backslash character verbatim or as part of an escape-sequence.
The goal of this challenge is to do the same job the debuggers do, but better - the escaped string must contain as few escaping characters as possible! The input to your program (or subroutine) is a string of arbitrary 8-bit bytes, and the output must be suitable for putting into the following boilerplate C code:
#include <stdio.h>
int main() {char str[] =
"### the escaped string goes here ###";
puts(""); // put a breakpoint here to examine the string
puts(str); return 0;}
For example:
Input (hexdump, 3-byte string): 64 0a 06
Output: d\n\6
Here, \006 or \x6 have the same meaning as \6, but longer and are therefore unacceptable. Likewise, \012 and \xa have the same meaning as \n, but only \n is an acceptable encoding for the byte with the value 10.
Another example:
Input (hexdump, 5-byte string): 61 0e 38 02 37
Output: a\168\0027
Here, code 0x0e is escaped as \16, and parsed successfully by the C compiler because 8 is not a valid octal digit. However, the code 0x02 is escaped as \002, not \2, to avoid confusion with the byte \27.
Output: a\168\2\67 - another possible form
See Wikipedia for the exact definition of escaping in C.
Additional (reasonable) rules for this challenge:
- Output must be printable ASCII (no tab characters; no characters with codes above 126)
- The input can be in the form of a string or an array of 8-bit elements. If your language doesn't support it, use the closest alternative. In any case, there should be no UTF encoding for input.
- No trigraphs, so no need to escape the
?character - No need to handle the null character (code 0) in input - let it be an end-of-string marker, or a regular byte like others - whatever is more convenient
- Code golf: shortest code is the best
- You are not allowed to vandalize the Wikipedia article to bend the rules :)
P.S. Summary of the Wikipedia article:
Input byte | Output | Note 0x01 | \1 | octal escape 0x02 | \2 | octal escape 0x03 | \3 | octal escape 0x04 | \4 | octal escape 0x05 | \5 | octal escape 0x06 | \6 | octal escape 0x07 | \a | alarm 0x08 | \b | backspace 0x09 | \t | tabulation 0x0a | \n | newline 0x0b | \v | vertical tabulation 0x0c | \f | form feed 0x0d | \r | return 0x0e | \16 | octal escape 0x0f | \17 | octal escape ... | | octal escape 0x1f | \37 | octal escape 0x22 | \" | escape with a backslash 0x5c | \\ | escape with a backslash 0x7f | \177 | octal escape 0x80 | \200 | octal escape ... | | octal escape 0xff | \377 | octal escape other | | no escape
solutions can be in any language, or must be C ? – Jasen – 2015-09-29T22:14:22.860
Any language, of course! – anatolyg – 2015-09-30T06:23:12.107
In your hexdump example, I assume you mean 64 0A 06? – Alchymist – 2015-10-05T09:33:53.757
@Alchymist I do! Confusion fixed. – anatolyg – 2015-10-05T09:35:48.293