-1

I'm trying to use the command base64 somefile.ext to covert files to text. The only problem is that the file size increases by 35%, and that becomes unacceptable for my larger files. I suspect that the files could be encoded in a way that makes their size smaller. Currently the encoding of the outputted file is us-ascii.

Is there an encoding that would make for a smaller file size?

  • It's not clear what are the requirement. Note that `base64` is usually used to represent binary files in an ASCII string format so it will always use extra space. – golja Aug 23 '12 at 02:20
  • It all depends on how far you're willing to stretch the definition of "text" (i.e. **why are you doing it**) and how standardized you want the format to be. – Alan Curry Aug 23 '12 at 02:39
  • @AlanCurry: I'm doing it so that I can store the binary file in text-based storage. @golja: But ASCII also includes characters like `null` and `CR`, but to my observations `base64` doesn't return any of those characters. – Kevin Johnson Aug 23 '12 at 03:19
  • What is "Text-based storage"? – Alan Curry Aug 23 '12 at 03:19
  • @AlanCurry, If you must know, I was going to try it out with Google Docs. – Kevin Johnson Aug 23 '12 at 03:23
  • Well, that would at least make it possible to answer the question, if someone knows what sort of text you can put in there. Let me ask the obvious question: you are compressing the files before textifying, right? – Alan Curry Aug 23 '12 at 03:29
  • I tried compressing them into .tar.gz, but the reduction in size was tiny. – Kevin Johnson Aug 23 '12 at 12:27
  • Then that would be your answer. The mathematical foundations behind data compression prove quite succinctly that you cannot compress already-compressed data any further - so if compressing your data offers no size advantage, nothing will. Can't be done (and is silly, to boot). – adaptr Aug 23 '12 at 12:35
  • This question displays a fundamental lack of understanding of the difference between binary and text files. I find it astonishing that this could be the case for anyone on this site. – John Gardeniers Aug 25 '12 at 22:33
  • @JohnGardeniers Well, I understand what a binary file is and I know that a text file is just a binary file with a certain encoding. My question was asking about if there was a certain encoding that only dealt with visible characters, as opposed to `null` and `CR`. This was proved redundant by @adaptr and @AlanCurry, as I wouldn't be able to chose an encoding while uploading to Google Docs. I'm sorry if I didn't make that clear initially. – Kevin Johnson Aug 26 '12 at 02:29

1 Answers1

1

Just compress before encoding.

  $ wc -c < /bin/ls
  114024
  $ < /bin/ls base64 | wc -c
  154033
  $ xz < /bin/ls | base64 | wc -c
  59878

(you can use, gzip, bzip2 or any compressor you want, but need to remember to uncompress on the receiving end)

There aren't many printable ascii characters. base64 uses 64 of them, which means 6bits of input make 8 bits of output. There aren't many more you can use.

sch
  • 560
  • 4
  • 13