Recursively count lines of code, excluding empty lines and comments

7

3

Requirement: Write a program (in any language) that counts the number of lines of code in files matching *.sh in the directory tree starting from the directory that the program is executed in, excluding lines that are empty, only contain whitespace, or are just single-line comments (a line which the first character that is not a whitespace character is #). Files will only contain printable ASCII characters.

Output: A single integer representing the total count (can be followed by a newline too).

Here is an expanded solution in Python:

import os, sys

def recurse(dir = './'):
    count = 0

    for file in os.listdir(dir):
        if not os.path.isfile(dir + file):
            count += recurse(dir + file + '/')
        elif file.endswith('.sh'):
            with open(dir + file, 'r') as f:
                for line in f.read().split('\n'):
                    if (not line.strip().startswith('#')) and (not line.strip() == ''):
                        count += 1

    return count

sys.stdout.write(recurse())

Aaron Esau

Posted 2018-09-25T00:35:19.580

Reputation: 197

Files will only contain printable ASCII characters. -- can a file contain a /? – Jonathan Frech – 2018-09-26T18:18:18.550

Answers

3

Powershell 3+, 40 byte

(ls *.sh -r|sls '^\s*(#|$)' -a -n).Count

ls *.sh -r gets a file names from the directory tree. sls (alias for Select-String) gets all strings (-a is shortcut for -AllMatches) that not mathces (-n is alias for -NotMatch) to the pattern '^\s*(#|$)'.

mazzy

Posted 2018-09-25T00:35:19.580

Reputation: 4 832

Side note: ().Count was introduced in PowerShell 3; in PowerShell 2 you can use |measure instead although that also tries to give you other measurements not relevant here. – Neil – 2018-09-26T08:18:34.747

I'm agree (....|measure).Count. But Powershell 2 is deprecated since the summer 2017. I'm not sure the version 3+ requires an explicit statement. In any case, I've specified the version. Thanks.

– mazzy – 2018-09-26T08:27:38.733

Ah! I keep forgetting about Select-String! Nice solution. – AdmBorkBork – 2018-09-26T12:39:11.810

4

PowerShell, 47 46 bytes

ls *.sh -r|gc|%{$o+=!($_-match'^\s*(#|$)')};$o

Try it online! (will always return 0 since there aren't any files)

Try it online! (here's a link that populates a dummy file so you can see the process)

-1 byte thanks to Neil

ls is alias for Get-ChildItem, specifying *.sh with the -recurse parameter, then we get-content of files. For each of those lines |%{...}, we accumulate into our $output a one if the Boolean !(...) statement is truthy. Inside the statement is a simple regex -match against the whitespace-only/comment/blank lines. Finally we leave $o on the pipeline.

The implicit Write-Output that happens at program completion adds a trailing newline, but that shouldn't matter in this case because the variable $o itself doesn't have a trailing newline nor does the actual return variable. It's a quirk of the shell, not a quirk of the program. For example, saving this to a script and executing that script in a pipeline will not have a newline.

AdmBorkBork

Posted 2018-09-25T00:35:19.580

Reputation: 41 581

You can write $-notmatch'^\s*(#|$)' instead !($_-match'^\s*(#|$)'). The length is same. But expression is more clear. – mazzy – 2018-09-26T06:12:59.067

1ls *.sh -r|gc| saves a byte. – Neil – 2018-09-26T08:13:27.083

Thanks @Neil - nice catch. – AdmBorkBork – 2018-09-26T12:38:22.007

2

R, 70 64 bytes

sum(!grepl("^(#|$)",unlist(lapply(dir(,".sh$",r=T),readLines))))

Explanation:

The dir function has the recursive flag set.

readLines returns the lines of a file in a vector, which are then flattened with unlist.

ngm

Posted 2018-09-25T00:35:19.580

Reputation: 3 974

I haven't tested it... but can you save bytes by using grepl, adding a ! and getting rid of |1 and of inv=T? – JayCe – 2018-09-28T19:25:07.050

2

Linux Shell, 30 60 bytes

confirmed in dash

cat `find . -name \*.sh`|tr -d " \t"|grep .|grep -v ^#|wc -l
  1. find . -name \*.sh: find files matching the pattern, list with path
  2. cat '': list contents of these files
  3. tr -d " \t": trim tabs and spaces
  4. grep .: remove empty lines
  5. grep -v ^#: remove comments
  6. wc -l: count lines of output

Titus

Posted 2018-09-25T00:35:19.580

Reputation: 13 814

the comment # can be found after spaces. – mazzy – 2018-09-26T09:29:51.160

1@mazzy, nope, after tr -d " \t" they can not. – manatwork – 2018-09-26T09:46:29.987

2

Haskell, 211 210 bytes

import System.Directory
f x=listDirectory x>>=fmap sum.mapM(\d->doesFileExist d>>=(#d)).map((x++"/")++)
p#d|p=do c<-readFile d;pure$sum[1|take 3(reverse d)=="hs.",(q:_):_<-map words$lines c,q/='#']|1<2=f d
f"."

Oh dear, without shell glob and regex you have to do all the work by yourself. Maybe there's somewhere a module for it. Also, IO code in Haskell requires some overhead to get the types right.

f x =                             -- main function, expects a directory 'x'
    listDirectory x >>=           -- read content of directory (without "." and "..")
              map((x++"/")++)     --   for each entry: prepend current directory 'x' and a slash
          mapM(\d->doesFileExist d>>=(#d))
                                  --   for each entry: call function '#' with
                                  --        first parameter: a boolean, True if it's a regular file, False if it's a directory
                                  --        second parameter: the filename itself
                                  --        '#' returns a list of valid lines for each file
     fmap sum                     --   sum this list

p#d
   |p                             -- if 'p' is True (i.e. 'd' is regular file)
     do c<-readFile d;            --   read the content 'c' of file 'd' and 
        pure$sum[1|      ]        --   return the number of lines
              take 3(reverse d)=="hs."
                                  --     (only if the file end with .sh)
              (q:_):_<-map words$lines c,q/='#'
                                  --     where the first word doesn't start with a hash sign
                                  --       (function 'words' strips leading whitespace)
   |1<2                           -- else ('d' is a directory)
       =f d                       --   examine d

f"."                              -- start with current directory

nimi

Posted 2018-09-25T00:35:19.580

Reputation: 34 639

1

Bash + GNU utilities, 55

find . -name \*.sh -exec cat {} +|grep -Evc '^\s*(#|$)'

Try it online!

Digital Trauma

Posted 2018-09-25T00:35:19.580

Reputation: 64 644

Does find . -name \*.sh|xargs cat|grep -Evc '^\s*(#|$)' work? – ovs – 2018-09-25T13:29:04.400

Or even cat **/*.sh|..., though it needs globstar to be turned on (or a different shell such as zsh, csh/tcsh).. Not sure if you'd need to include it in your bytecount. – ბიმო – 2018-09-25T18:34:05.383

1

Batch + Internal tools, 80 76 Bytes

@for /F %%F IN ('findstr/SV "^\s*# ^\s*$" *.sh^|find/C":"')DO @set/P=%%F<NUL

Uses the builtin findstr to fetch the lines, then counts these line using find /C.

This however produces output with a newline, so we need to convert that into an output without. This is done by using for /F to fetch the output and then use <NUL set /P to output without the trailing newline.

user83079

Posted 2018-09-25T00:35:19.580

Reputation:

1

Python 2, 149 146 144 140 149 146 116 bytes

import os
print sum(l.strip()[:1]not in'#'for a,b,c in os.walk(".")for n in c for l in open(a+"/"+n)if'.sh'==n[-3:])

Try it online!

Reports 0 on TIO but works locally. Probably not any .sh files in whatever is the current directory.

I think it now works correctly on TIO and fixed a bug for +6, both thanks to @JonathanAllen

-30 with thanks to @ovs

Alternative for 139 bytes but only works on Windows.

import os
os.system('dir/s/b *.sh>f')
print sum(sum((0,1)[x.strip()>''and'#'!=x.strip()[0]]for x in open(l[2:].strip()))for l in open("f"))

Creates a temporary file f to store the results for the dir command.

ElPedro

Posted 2018-09-25T00:35:19.580

Reputation: 5 301

You can create some files at TIO. Unfortunately that shows up a bug - I believe you'd need to open(os.path.join(a_path_prefix, n)) for it to actually work (although just using + will of course be golfier!))

– Jonathan Allan – 2018-09-25T17:34:54.290

(0,1)[l.strip()>''and'#'!=l.strip()[0]] can become all([l.strip()>'','#'!=l.strip()[0:]]) to save 1 byte. – mypetlion – 2018-09-25T17:55:16.037

Thanks @JonathanAllan. Will delete for now as I don't have time to fix. – ElPedro – 2018-09-25T18:04:04.677

@mypetlion - thanks but temporarily deleted due to a bug spotted by JonathanAllan – ElPedro – 2018-09-25T18:06:38.493

That's impressive, nice job. I tried to cut off a byte or two, but couldn't think of any way to do it. – Aaron Esau – 2018-09-29T03:00:48.463

Thanks. Some serious help from @ovs to lose 30 bytes at the end tho so credit and respect due. – ElPedro – 2018-09-29T07:09:20.830

1

Ruby, 61 bytes

p Dir.glob("**/*.sh").sum{|x|open(x).grep(/^\s*[^#\s]/).size}

Try it online!

G B

Posted 2018-09-25T00:35:19.580

Reputation: 11 099

1

Röda + find, 98 bytes

{bufferedExec"find",".","-name","*.sh"|{|x|try readLines x}_|{|x|x~=`\s|#.*`,"";[1]if[#x>0]}_|sum}

Try it online!

Pure Röda, 129 bytes

{["."]|[_]if isFile(_1)else unpull(x)for x in[ls(_1)]|{|x|try readLines x if[x=~`.*\.sh`]}_|{|l|l~=`\s|#.*`,"";[1]if[#l>0]}_|sum}

Try it online!

Explanation:

{
["."]|                     /* Push "." to the stream */
                           /* For each _1 in the stream: */
[_]if isFile(_1)           /*   If _1 is a file, push it to the output stream */
else                       /*   Else (if _1 is a directory): */
unpull(x)for x in[ls(_1)]| /*     Push each file/dir in _1 to the *input stream* */
{|x|                       /* For each x in the stream: */
  try readLines x          /*   Push lines of x to the stream ignoring errors */
    if[x=~`.*\.sh`]        /*     if x ends in .sh */
}_|
{|l|                       /* For each l in the stream: */
  l~=`\s|#.*`,"";          /*   Remove whitespace and comments from l */
  [1]if[#l>0]              /*   Push 1 to the stream if l is not empty */
}_|
sum                        /* Sum all numbers in the stream */
}

fergusq

Posted 2018-09-25T00:35:19.580

Reputation: 4 867

0

C (clang), 248 209 bytes

#import <regex.h>
*l,i,j;*k;regex_t r;f(n,s,t){if(fnmatch("*.sh",n,0))return 0;k=fopen(n,"r");for(regcomp(&r,"^\\s*(#|$)",1);i=getline(&l,&i,k)>0;)regexec(&r,l,0,0,0)&&j++;}main(){ftw(".",f,1);printf("%d",j);}

Try it online!

Logern

Posted 2018-09-25T00:35:19.580

Reputation: 845

Save one byte char i,j,*l; ==> char*l,i,j; – cleblanc – 2018-09-26T15:37:14.023

189 Bytes here

– cleblanc – 2018-09-26T16:15:55.257

Why are you not counting your definition and include bytes? – Jonathan Frech – 2018-09-26T17:43:58.000

@JonathanFrech, because it's just an argument, it could be any directory – Logern – 2018-09-26T18:02:10.953

I do not think that is legal I/O; some inclusions you perform are necessary and not counted. – Jonathan Frech – 2018-09-26T18:03:47.560

See this default I/O consensus.

– Jonathan Frech – 2018-09-26T18:16:59.720

1Suggest #import<regex.h> instead of #include <regex.h>. Also, can ditch ftw.h and fnmatch.h – ceilingcat – 2018-09-26T22:58:23.490

0

PHP, 105 140 174 186 184 172 bytes

function f($d){foreach(glob("$d/*")as$n)$n[strlen($d)+1]^_^A&&$s+=!is_dir($n)?fnmatch("*.sh",$n)?preg_match_all("/^\s*[^#]/m",join(file($n))):0:f($n);return$s;}echo f(".");

skips hidden directories and files (name starts with a dot). Run with -nr.

breakdown

function f($d)
{
    # loop through directory entries
    foreach(glob("$d/*")as$n)
        # if name does not start with a dot
        $n[strlen($d)+1]^_^A
        # then increase sum:
        &&$s+=
            # not a directory?
            !is_dir($n)
                # filename matches pattern?
                ?fnmatch("*.sh",$n)
                    # count lines that are neither empty nor begin with a "#"
                    ?preg_match_all("/^\s*[^#]/m",join(file($n)))
                    # else 0
                    :0
                # is a directory: recurse
                :f($n)
        ;
    return$s;
}
echo f(".");

I could save two more bytes with fnmatch()*preg_match_all() instead of fnmatch()?preg_match_all():0; and then another one with is_dir()?B:A instead of !is_dir()?A:B; but that could make it insanely slow.

Titus

Posted 2018-09-25T00:35:19.580

Reputation: 13 814

1substr($n,-3)=='.sh' would be shorter than preg_match("/\.sh$/",$n). – manatwork – 2018-09-26T09:06:02.837

1Sorry, actually fnmatch('*.sh',$n) is even shorter. – manatwork – 2018-09-26T09:49:14.433

@manatwork Wow I didn´t even know that function! – Titus – 2018-09-26T09:54:19.507

Yepp, weird thing. I only found out about it after used the similar File.fnmatch in Ruby for years.

– manatwork – 2018-09-26T10:06:29.683

0

Zsh, 57 bytes

for f (**/*.sh(D))a+=(${${(f)"$(<$f)"//[ 	]}%%#*})
<<<$#a

Try it online!

Almost beat bash+coreutils with native zsh constructs. -o globdots would save 3 bytes, and -o globstarshort would save 2 more (the glob would be **.sh instead of **/*.sh(D).

for f (**/*.sh(D))                    # the (D) flag enables globdots
   a+=(${${(f)"$(<$f)"//[   ]}%%#*})
 #       ${(f)"$(<$f)"       }        # read in file, split on newlines
 #       ${           //[   ]}        # remove all spaces and tabs
 #     ${                     %%#*}   # remove longest trailing #comment
 # a+=(                            )  # if append non-empty lines to array
<<<$#a                                # print array length

GammaFunction

Posted 2018-09-25T00:35:19.580

Reputation: 2 838