End the tabs versus space war

24

1

End the tabs versus space war

So, there has been a great deal of debate of whether to use tabs or spaces to indent/format code. Can you help the university settle the dispute, by going to an incredibly crazy unique method of formatting.


Your job is to write a full program or function which expands all tabs into four spaces. And then replaces a run of n leading spaces with "/(n - two stars here)/". You will receive input over multiple lines in any reasonable format (single string array of strings for each new line. Columnar array etc.)

Sample input shamelessly stolen. Note that since tabs get automagically expanded to four spaces on SE I represent it as the "^" character, but you must handle tabs (codepoint 0x09) as well. All "^" characters represent a tabulation.
Calculate the value 256 and test if it's zero
If the interpreter errors on overflow this is where it'll happen
++++++++[>++++++++<-]>[<++++>-]
+<[>-<
    Not zero so multiply by 256 again to get 65536
    [>++++<-]>[<++++++++>-]<[>++++++++<-]
    +>[>
        # Print "32"
        ++++++++++[>+++++<-]>+.-.[-]<
    <[-]<->] <[>>
        # Print "16"
        +++++++[>+++++++<-]>.+++++.[-]<
<<-]] >[>
    # Print "8"
    ++++++++[>+++++++<-]>.[-]<
<-]<
# Print " bit cells\n"
+++++++++++[>+++>+++++++++>+++++++++>+<<<<-]>-.>-.+++++++.+++++++++++.<.
>>.++.+++++++..<-.>>-
Clean up used cells.
[[-]<]l
^this is preceded by a tab
^^two tabs
^^^three tabs etcetera! 

Sample output

Calculate the value 256 and test if it's zero
If the interpreter errors on overflow this is where it'll happen
++++++++[>++++++++<-]>[<++++>-]
+<[>-<
/**/Not zero so multiply by 256 again to get 65536
/**/[>++++<-]>[<++++++++>-]<[>++++++++<-]
/**/+>[>
/******/# Print "32"
/******/++++++++++[>+++++<-]>+.-.[-]<
/**/<[-]<->] <[>>
/******/# Print "16"
/******/+++++++[>+++++++<-]>.+++++.[-]<
<<-]] >[>
/**/# Print "8"
/**/++++++++[>+++++++<-]>.[-]<
<-]<
# Print " bit cells\n"
+++++++++++[>+++>+++++++++>+++++++++>+<<<<-]>-.>-.+++++++.+++++++++++.<.
>>.++.+++++++..<-.>>-
Clean up used cells.
[[-]<]l
/**/this is preceded by a tab
/******/two tabs
/**********/three tabs etcetera! 

Because the university needs space to download both Vim and Emacs, you are allowed very little storage for your code. Therefore this is and the shortest code wins. You may assume that input is well formed and lines with less than four spaces (after replacement of tabs) may result in undefined behavior.

Disclaimer

This "excellent" formatting strategy came courtesy of Geobits, and is reproduced with his permission. No programmers were harmed during the production of this challenge.

Rohan Jhunjhunwala

Posted 2016-08-31T22:28:48.663

Reputation: 2 569

1Will tabs only occur at the start of lines (i.e. as indentation)? Can lines have mixed indentation (tabs + spaces)? – Lynn – 2016-09-01T02:08:35.553

20

Someone please submit an answer written in Whitespace.

– GuitarPicker – 2016-09-01T04:23:30.417

2Should we consider lines starting with /*, or can that be assumed as not a "well formed input"? A C++ source file would have been a better test, because its multiline comment /* */ would possibly break some answers that replace first and last of the leading spaces with an /, and then proceed to fill spaces with *. – seshoumara – 2016-09-01T05:52:16.547

I'm guessing we can't assume the number of spaces is a multiple of four, right? – someonewithpc – 2016-09-01T09:53:55.390

@someonewithpc no you may not – Rohan Jhunjhunwala – 2016-09-01T20:35:07.030

@sesvounara this formatting implementation need not result in compilable code. – Rohan Jhunjhunwala – 2016-09-01T20:36:05.317

1

The war has ended: https://medium.com/@hoffa/400-000-github-repositories-1-billion-files-14-terabytes-of-code-spaces-or-tabs-7cfe0b5dd7fd#.pmnalkp87 (Unless you're programming in C, apparently.)

– beaker – 2016-09-01T21:03:37.680

@RohanJhunjhunwala Imagine this C++ line: [4 spaces]int c; /* */. According to your spec it should be transformed to /**/int c;/* */, but wrong solutions when searching the regex /* to fill spaces with asterisks between // might end up doing: /**/int c;/****/. Is this considered a wrong output? – seshoumara – 2016-09-01T21:48:14.133

@seshoumara yes that is incorrect output – Rohan Jhunjhunwala – 2016-09-01T23:33:59.340

1@RohanJhunjhunwala So now I ask my first question again, since it wasn't about compilable code. Imagine the same /* */ C++ code, but this time at the beginning of the line. According to your spec it should be left as is. Here the trap is, and spotted wrong answers already, that a regex like say /\** / used to fill those spaces between // with asterisks would turn the line into /***/. I've seen this conversion as well /*//*/. I assume both are incorrect. – seshoumara – 2016-09-01T23:51:29.800

Could the "multiple line" also mean "a string separated by \n"? – Vale – 2016-09-02T08:03:39.540

@Rohan, please can you add a test-case for /* */->/* */ (i.e. no change for a line beginning with /*), and for /* */->/**//* */? Thanks. Also for a and b (i.e. indentation less than 3 chars). – Toby Speight – 2016-09-02T14:26:59.987

@Vale yes that is correct – Rohan Jhunjhunwala – 2016-09-03T20:51:42.817

@Toby you may assume that there will either be no indentation or more than four lines – Rohan Jhunjhunwala – 2016-09-03T20:53:24.317

@seshoumara both are incorrect – Rohan Jhunjhunwala – 2016-09-03T20:53:52.627

Answers

2

V, 21, 20 bytes

Íô/    
Î^hr/hv0r*r/

Try it online!

This is literally just a direct port of my vim answer. The notable differences:

  • The Í command (Global substitute) automatically fills in the /g flag, which saves two bytes

  • ô is identical to \t

  • Î Is a mnemonic for :%norm, and it also fills in the necessary space between :%norm and the set of keystrokes.

  • The trailing carriage return at the end is implicitly added.

James

Posted 2016-08-31T22:28:48.663

Reputation: 54 537

27

Vim, 37, 34, 33, 32 bytes

:%s/\t/    /g|%norm ^hr/hv0r*r/

Try it online!

Note that this requires a trailing carriage return (enter) in vim, although not in the online interpreter.

This uses the V interpreter because it's backwards compatible. A very straightforward solution.

Here's a gif that lets you see the solution happen in real time. This uses a slightly older version, and I added some extra keystrokes to make it run slower so you can see what happens:

enter image description here

And here is the explanation of how it works:

:%s/\t/    /g           "Replace every tab with 4 spaces
|                       "AND
%norm                   "On every line:
      ^                 "  Move to the first non-whitespace char
       h                "  Move one character to the left. If there is none, the command will end here.
         r/             "  Replace it with a slash
           h            "  Move to the left
            v0          "  Visually select everything until the first column
              r*        "  Replace this selection with asterisks
                r/      "  Replace the first character with a slash

James

Posted 2016-08-31T22:28:48.663

Reputation: 54 537

I was gonna +1 for using g but then you edited to not use g :/ +1 anyway :D – Downgoat – 2016-08-31T23:35:33.747

@downgoat Haha, thanks! I'm actually much prouder of the version without :g because it abuses a lesser known feature: the norm command gets cancelled if ^F<space> fails. So :%norm ^F<space>foo is essentially the same thing as :g/^ /norm foo fun Vim hacks. :D – James – 2016-08-31T23:39:18.997

huh, I thought ^F was used to postition the screen. does it have different behavior inside norm? – Downgoat – 2016-08-31T23:41:02.243

1@downgoat Haha, no it's ^F, not <C-f> Silly Vim key notation. In this case it's ^, jump to first non-whitespace char, and F<space> Which is find the first space behind the cursor. – James – 2016-08-31T23:52:37.177

ohhh, that makes so much more sense now >_> – Downgoat – 2016-08-31T23:53:11.217

I think you could use :retab (haven't tested it though). – Loovjo – 2016-09-01T05:32:06.860

@Loovjo Yeah, I had thought the same thing, but you have to :set expandtab for that to work, so it's actually one byte longer: :se et<cr>:ret 4<cr> – James – 2016-09-01T05:33:44.263

@DJMcMayhem Oh, right, didn't think of that – Loovjo – 2016-09-01T05:34:37.143

11

Perl, 41 bytes

s,␉,    ,g;s,^  ( +),/@{[$1=~y| |*|r]}/,

Run with the -p flag, like so:

perl -pe 's,␉,    ,g;s,^  ( +),/@{[$1=~y| |*|r]}/,'
#     ↑   └───────────────────┬───────────────────┘
#     1 byte               40 bytes

Replace by a tab (in Bash, try typing Control-V Tab.)

Lynn

Posted 2016-08-31T22:28:48.663

Reputation: 55 648

1The way perl replaced that backreference on the spot, I wish sed had that too. – seshoumara – 2016-09-01T05:29:50.920

7

Cheddar, 60 57 56 bytes

Saved 3 bytes thanks to @Conor O'Brien

@.sub(/\t/g," "*4).sub(/^ +/gm,i->"/"+"*"*(i.len-2)+"/")

I wish Cheddar had better string formatting.

Try it online!

Explanation

This is a function. @ is a represents functionized property (e.g. ruby's &:) letting you do things like: `ar.map(@.head(-1))

@                      // Input
 .sub( /\t/g, " "*4)   // Replace tabs with four spaces
 .sub(
   /^ +/gm,            // Regex matches leading spaces
   i ->                // i is the matched leading spaces
     "/"+              // The / at the beginning
     "*"*(i.len-2)+    // Repeat *s i-2 times
     "/"                // The / at the end
 )

If you aren't familiar with regex the:

/^ +/gm

this basically matched one or more (+) spaces () at the beginning (^) of every (g) line (m).

Downgoat

Posted 2016-08-31T22:28:48.663

Reputation: 27 116

do literal tabs work in cheddar regexes? also, /^ +/ suffices as a regex, since we can assume that the leading spaces will be at least 4 in length. – Conor O'Brien – 2016-08-31T23:16:57.177

@ConorO'Brien I believe they do but I haven't tested – Downgoat – 2016-08-31T23:17:43.603

The tabs are supposed to be replaced before the transformation. – Conor O'Brien – 2016-09-01T00:47:00.517

@ConorO'Brien oh >_> I had it that way originally and then I changed it – Downgoat – 2016-09-01T00:47:42.967

6

Mathematica, 97 bytes

a=StringReplace;a[a[#,"\t"->"    "],StartOfLine~~b:" "..:>"/"<>Table["*",StringLength@b-2]<>"/"]&

Anonymous function. Takes a string as input and returns a string as output.

LegionMammal978

Posted 2016-08-31T22:28:48.663

Reputation: 15 731

5

Python 3, 124 bytes

Uses good ol' regex.

import re
lambda I:re.sub('^\s*(?m)',lambda m:'/'+'*'*len(m.group()[:-2])+'/',re.sub('\t+',lambda g:' '*4*len(g.group()),I))

Ideone it!

Beta Decay

Posted 2016-08-31T22:28:48.663

Reputation: 21 478

4

Java 210 207 bytes

This is the reference solution which implements it naively.

void a(String[]a){for(String s:a){s=s.replaceAll("\t", "    ");String x,y="";int j,i=s.length()-(x=s.replaceAll("^\\s+", "")).length();if(i>3){y="/";for(j=0;j++<i-1;)y+="*";y+="/";}System.out.println(y+x);}}

Rohan Jhunjhunwala

Posted 2016-08-31T22:28:48.663

Reputation: 2 569

6Vim: 37 bytes, Cheddar: 65 bytes, JavaScript: 75 bytes, then there's Java at 210 bytes :P why am I not surprised – Downgoat – 2016-08-31T23:19:45.943

1Very concise code in java :P – Rohan Jhunjhunwala – 2016-08-31T23:22:12.207

You can change the last for-loop to save 1 byte: for(int j=0;++j<i-1;). Also, you can remove the int before j, and put it after the already present int: int i=s.length()-(x=s.replaceAll("^\\s+", "")).length(),j; – Kevin Cruijssen – 2016-09-01T10:56:05.203

can't it be a lambda to shave bytes using (a)->{...}? – bunyaCloven – 2016-09-01T14:20:09.563

At least it's still readable and doesn't need further comments :o) – René – 2016-09-01T15:21:13.357

@bunyaclovem it could but I'll leave it a fully qualified function for ease of use – Rohan Jhunjhunwala – 2016-09-01T20:40:11.720

All suggestions implemented. Could someone test it out. I'm on mobile – Rohan Jhunjhunwala – 2016-09-01T20:40:40.853

3

JavaScript ES6, 75 bytes

s=>s.replace(/\t/g,"    ").replace(/^ +/gm,k=>`/${"*".repeat(k.length-2)}/`)

Replace \t with a literal tab in your code.

Conor O'Brien

Posted 2016-08-31T22:28:48.663

Reputation: 36 228

3

Java, 185 184 167 152 bytes

S->S.map(s->{s=s.replace("\t","    ");String t=s.replaceAll("^ +","");int n=s.length()-t.length();if(n>3){s="/";for(;n-->2;)s+="*";s+="/"+t;}return s;})

Given the very loose definition of string array given in the initial post, I've used Stream<String> which allows for some consequent byte savings.

I used different techniques than the RI to achieve the same goal. The algorithm itself is rather the same.

Testing and ungolfed:

import java.util.Arrays;
import java.util.stream.Stream;

public class Main {

  public static void main(String[] args) {
    StringStreamTransformer sst = lines -> lines.map(line -> {
      line = line.replace("\t","    ");
      String trimmed = line.replaceAll("^ +", "");
      int startingSpaces = line.length() - trimmed.length();
      if (startingSpaces > 3) {
        line = "/";
        for(;startingSpaces > 2; startingSpaces--) {
          line += "*";
        }
        line += "/" + trimmed;
      }
      return line;
    });


    Stream<String> lines = Arrays.stream(new String[]{
      "lots of spaces and tabs after\t\t    \t\t         \t\t\t\t\t",
      "no space",
      " 1 space",
      "  2 spaces",
      "   3 spaces",
      "    4 spaces",
      "     5 spaces",
      "      6 spaces",
      "       7 spaces",
      "        8 spaces",
      "\t1 tab",
      "\t\t2 tabs",
      "\t\t\t3 tabs"
    });
    sst.map(lines).map(s -> s.replace(" ", ".").replace("\t","-")).forEach(System.out::println);


  }
}

Olivier Grégoire

Posted 2016-08-31T22:28:48.663

Reputation: 10 647

2

Python, 125 111 bytes

lambda s:'\n'.join(('/'+(len(L.replace('\t',' '*4))-len(L.strip())-2)*'*'+'/'+L.strip(),L)[L[0]>' ']for L in s)

https://repl.it/DGyh/2

atlasologist

Posted 2016-08-31T22:28:48.663

Reputation: 2 945

2

Retina, 25 bytes

The \t should be replaced with an actual tab character (0x09).

\t
4$* 
%`^  ( +)
/$.1$**/

Try it online!

Explanation

\t
4$* 

Replace each tab with four spaces.

%`^  ( +)
/$.1$**/

Transform each line separately (%) by matching 2+N spaces at the beginning of the line and replacing it with /.../ where ... is N copies of *.

Martin Ender

Posted 2016-08-31T22:28:48.663

Reputation: 184 808

2

SED (56 + 1 for -r) 57

s/⇥/    /g;tr;:r;s,^ ( *) ,/\1/,;T;:l;s,^(/\**) ,\1*,;tl

Where is a tab
1. replaces tabs with spaces.
2. replaces the first and last leading space with /.
3. replaces the first space after / and 0+ *s with a * until there isn't a match.

Riley

Posted 2016-08-31T22:28:48.663

Reputation: 11 345

Since sed is specified, no single quotes are needed around the code, same with removing -r'' from your other sed answers, because you can consider the script as being stored in a source file that you run with -f. Any extra flags used like n or r should be counted as one byte each. Thus here, you save 2 bytes. – seshoumara – 2016-09-01T23:14:32.510

That's what I thought, but I want sure. Thanks. – Riley – 2016-09-01T23:15:53.520

The ; after the t command is not necessary either. As for the code itself, you need an ^ at the beginning of the third s command, otherwise an input like this "3 / 5" is turned into "3 /*5". In the first s command you actually have a tab there, but it's not correctly shown and misleading, so either use \t or specify after, that char was a tab. – seshoumara – 2016-09-01T23:21:12.203

@seshoumara Thanks, I'm trying to post from my phone... It's not the easiest thing to do. – Riley – 2016-09-01T23:23:53.787

I think I've spent more time editing this answer than all the others combined. Thanks for the help! – Riley – 2016-09-02T01:51:01.197

Upvoted you, can't press proportional to your edit work :) Only 1 byte down in the end, but errors were solved as well. I'm a sed fan, so I go through many posts using it and I can be a tough customer. – seshoumara – 2016-09-02T02:01:22.040

Can you confirm that this doesn't change a line that already begins with /*? It looks like you need a Tend after that first replacement. – Toby Speight – 2016-09-02T14:28:40.253

@TobySpeight You're right. I'll fix it as soon as i can – Riley – 2016-09-02T14:36:36.867

@TobySpeight Fixed. I Should have tested this better... – Riley – 2016-09-02T15:00:53.567

1

GNU sed, 66 64 + 1(r flag) = 65 bytes

Edit: 1 byte less thanks to Riley's suggestion.

s/\t/    /g
s,^ ( *) ,/\1\n,
:
s,^(/\**) ( *\n),\1*\2,
t
s,\n,/,

Run: sed -rf formatter.sed input_file

The reason for separating with a \n the leading spaces from the rest of the text on that line, is because otherwise a C++ line starting with a comment like this /* */ would be turn into /*****/ by a simpler line 4 like s,^(/\**) ,\1*, or even s,^(/\**) ( */),\1*\2,. Since sed executes the script for each input line, no \n is introduced in the pattern space on reading.

seshoumara

Posted 2016-08-31T22:28:48.663

Reputation: 2 878

You can save a byte by not putting in the closing / until you replace the \n. That saves you having to match it in line 4. – Riley – 2016-09-01T17:58:46.707

@Riley Good catch. Updated the code. – seshoumara – 2016-09-01T19:08:11.770

You can save another by replacing \twith a tab character. – Riley – 2016-09-01T19:17:47.423

@Riley That is true, but since it won't be printed as a tab here, I'm in doubt. I will keep this in mind for future sed answers with byte counts more competitive. – seshoumara – 2016-09-02T00:55:43.473

1

Ruby, 52 47 + 1 (p flag) = 48 bytes

Edit: saved whole 5 bytes, thanks to Value Ink

ruby -pe 'gsub ?\t," "*4;sub(/^ +/){?/+?**($&.size-2)+?/}'

michau

Posted 2016-08-31T22:28:48.663

Reputation: 111

1Can you use the p flag to take advantage of the fact that (g)sub modifies $_ and thus changes the printed value? ruby -pe 'gsub ?\t," "*4;sub(/^ +/){?/+?**($&.size-2)+?/}' – Value Ink – 2016-09-01T22:03:41.047

Thanks, I didn't know (g)sub without bang can modify $_ here. – michau – 2016-09-02T08:33:16.027

1

The university should consider allowing a bit more space for programs in Emacs Lisp (or default to tabify and untabify alone), as they get even more verbose than Java. It should also pay close attention to students (or teachers) whose identation size is smaller than four or who happen to code in some non-C-like language.

The following solution has 206 bytes

(lambda (b e)(let((tab-width 4))(untabify b e)(goto-char b)(while(re-search-forward"^ +"e t)(replace-match(format"/%s/"(apply'concat(mapcar(lambda(x)"*")(number-sequence 1(-(length(match-string 0))2)))))))))

Assuming that tab-width needs not to be explicitly set, we can save 20 of them.

(lambda(b e)(untabify b e)(goto-char b)(while(re-search-forward"^ +"e t)(replace-match(format"/%s/"(apply'concat(mapcar(lambda(x)"*")(number-sequence 1(-(length(match-string 0))2))))))))

And ungolfed version would look like this

(defun end-tab-war (beg end)
  (let ((tab-width 4))
    (untabify beg end)
    (goto-char beg)
    (while (re-search-forward "^ +" end t)
      (replace-match
       (format
        "/%s/"
        (apply 'concat
               (mapcar (lambda(x) "*")
                       (number-sequence 1
                                        (- (length (match-string 0))
                                           2)))))))))

We first untabify the region before jumping to its start. Then, while we see whitespace at the beginning of a line, we replace it with a comment that is as long as said whitespace. To be exact, the comment to be inserted is constructed by

 (format"/%s/"(apply'concat(mapcar(lambda(x)"*")(number-sequence 1(-(length(match-string 0))2)))))

which itself takes up 97 bytes. A shorter solution to copy some string n times is highly appreciated.

Lord Yuuma

Posted 2016-08-31T22:28:48.663

Reputation: 587