Further golfing this C# code from Scramble words while preserving their outlines

7

I'll start off by saying I checked and checked again that this question should be on topic.

I will also clearly state that this is a question about further golfing a certain set of code not a challenge.


I recently answered the following challenge: Scramble words while preserving their outlines:

It is well known that a text can still be read while the innards of its words have been scrambled, as long as their first and last letters plus their overall outlines remain constant. Given a printable Ascii+Newline text, scramble each word according to these rules:

  1. Scrambling must be (pseudo) random.

  2. A word is a sequence of the Latin characters, A through Z.

  3. Only initial letters will ever be uppercase.

  4. The first and last letters must stay untouched.

  5. When scrambling, only letters within one of the following groups may exchange places:

    1. acemnorsuvwxz

    2. bdfhkl

    3. gpqy

    4. it

    5. j (stays in place)

My answer is in C# and coming in, currently, at 394 bytes:

namespace System{using Text.RegularExpressions;using Linq;s=>Regex.Replace(s,@"[A-Za-z](([acemnorsu-xz])|([bdfhkl])|([gpqy])|([it]))*?[a-z]?\b",m=>{var a=m.Value.ToCharArray();for(int i=1,j;++i<6;){var c=m.Groups[i].Captures;var n=c.Cast<Capture>().Select(p=>p.Index-m.Index).ToList();foreach(Capture p in c){a[j=n[new Random().Next(n.Count)]]=p.Value[0];n.Remove(j);}}return new string(a);});}

I think there's room for golfing in the Linq statement and foreach loop at least.

Can this code be further golfed down?

TheLethalCoder

Posted 2017-05-15T11:09:27.673

Reputation: 6 930

Off the bat, (i|t) is shorter than ([it]). And if you order your character classes differently, you can probably replace the really long one with [a-z] because all the other letters will have already been covered by earlier groups. I'm also not sure your code works correctly for words containing j. As far as I can tell those words wouldn't be matched (instead of just leaving the j untouched). – Martin Ender – 2017-05-15T11:22:14.770

@MartinEnder It wouldn't have handled the j correctly, good spot on that and with the re-ordering the regex is a bit shorter so that helps too! I'm not the best with regex if you hadn't realised... – TheLethalCoder – 2017-05-15T11:32:00.223

Since the input is limited to ASCII, you can also use \p{L} for [A-Za-z]. – Martin Ender – 2017-05-15T11:35:45.357

@MartinEnder I'd seen that suggestion on other answers and forgotten about it! Regex is now \p{L}(([bdfhkl])|([gpqy])|(i|t)|(j)|([a-z]))*?[a-z]?\b – TheLethalCoder – 2017-05-15T11:44:22.177

If you put the first group before the last one, it can be shortened to [bdf-l], because gij have already been taken care of. – Martin Ender – 2017-05-15T11:48:37.300

@MartinEnder All that has saved 10 bytes so far: \p{L}(([gpqy])|(i|t)|(j)|([bdf-l])|([a-z]))*?[a-z]?\b. Can you add an answer with the improvements so far? – TheLethalCoder – 2017-05-15T12:07:28.417

Answers

1

As I am using linq in my program, I can change ToCharArray to ToArray to save 4 bytes.

I can also change the namespace around to be either System.Linq or System.Text.RegularExpressions to save a further 6 bytes by removing the using for it:

namespace System.Text.RegularExpressions

As @MartinEnder♦︎ points out in the comments I can save 10 bytes by re-ordering the groups in the regex and changing [A-Za-z] to \p{L}:

\p{L}(([gpqy])|(i|t)|(j)|([bdf-l])|([a-z]))*?[a-z]?\b

TheLethalCoder

Posted 2017-05-15T11:09:27.673

Reputation: 6 930

Oh god that second person really messes with me. Talking about yourself as you? Do you mind if i change it? – caird coinheringaahing – 2017-06-14T14:58:12.903

@cairdcoinheringaahing Can if you want yeah. It's because I've done a few on SO where it makes more sense to use second person in a self answer. I don't suppose it matters either way though. – TheLethalCoder – 2017-06-14T15:02:03.093