Storing a DNS name as compactly as possible in memory (preferably in C#)

1

What is the most compact way to encode/save DNS names in memory?

For example, storing "www.google.com" as a string in .NET will result in the object being encoded as UTF-16, and it will consume twice as much memory as plain ASCII... but for the purpose of storing DNS names, ASCII is overkill.

The only characters that are needed are:

  • A..Z (case insensitive)
  • 0..9
  • Hyphen
  • underscore
  • period
  • Asterisk (not legal DNS, but used in ACLs such as *.google.com) always as a leading character, never elsewhere.

A total of 40 characters.. which fits within single byte with plenty room to spare.

My code golf question/challenge is what is the most compact way to store a string of characters in memory and maximizing in memory storage?

The inspiration for this challenge came from this blog entry.

goodguys_activate

Posted 2012-10-21T03:26:10.933

Reputation: 119

Question was closed 2012-10-21T19:07:27.560

1How do you win the competition? – beary605 – 2012-10-21T06:44:40.950

3You meant UTF-16 for the .NET framework. In UTF-8, this name would take up just as much as ASCII. – Mr Lister – 2012-10-21T07:00:35.277

Try http://cs.stackexchange.com/ - there's probably already a question about basic information theory which would tell you all you need to know.

– Peter Taylor – 2012-10-21T08:01:15.513

http://en.wikipedia.org/wiki/Kolmogorov_complexity? – beary605 – 2012-10-21T15:48:11.640

Actually, underscores are not allowed in DNS names. RFC 3696

– Mormegil – 2012-10-21T18:26:57.243

1As stated this looks like a question for Stack Overflow (that is, it is a question). CodeGolf.SE is a plce for playing certain programming games. It would be essentially trivial to turn this into a [code-golf]---you just add the tag (and probably the one @beary suggests as well) and resign yourself to getting answer many languages. But I am closing until that is done just to encourage you to read the FAQ before posting to a new Stack Exchange site. Flag when you are ready for this to be re-opened. – dmckee --- ex-moderator kitten – 2012-10-21T19:10:09.317

Answers

3

You could interpret it as a base 39 number. Since only the first character can be an asterisk, you can encode it as the sign. If i use the characters as digits in the order you named them, www.google.com would be 10903065870001232914011 in decimal and 24f0e6f41d8ecd3a65b in hex, which could be stored in 10 bytes.

*.google.com would be -310664672884413873 in decimal and fbb04c2040762a4f in hex, if you store it in 8 bytes.

quasimodo

Posted 2012-10-21T03:26:10.933

Reputation: 985

0

Oh, that's easy. You just zip it.

Mr Lister

Posted 2012-10-21T03:26:10.933

Reputation: 3 668

1gzip on a file containing www.google.com gives 42 bytes. – ugoren – 2012-10-21T11:11:56.903

Ah... when you use Windows' built in zipping routines, you get a 132 byte file. Oh well. – Mr Lister – 2012-10-21T14:33:51.953