0

Since I am new to Linux, when writing scripts I always followed the rule "the less code, the less attack surface", so I try to write scripts with privileged access (sudo, root, etc.) in sh and use less third-party programs (grep, sed, etc.).

But for some reason many people talk only about POSIX compatibility, but not about possible vulnerabilities due to larger code. The size of the compiled dash file is 10 times smaller than bash. The dash binary is even smaller than grep.

Tell me, am I wrong? Is it a bad idea to focus only on code size?

Update: I understand that the quality of the code (who and how the code is written) plays a major role, but I am interested in the specific example of BASH vs SH (dash). SH is used as a shell for root. I am curious to know if this is only due to POSIX compatibility or if there are security issues involved?

NewLinux
  • 625
  • 3
  • 8
  • 1
    I would suppose Zoom has essentially less lines of code than MS Teams, but you know what disastrous security problems Zoom had ;-) Focusing on code size is interesting idea. But your question will get opinion based answers. That's why I suggest to close it. – mentallurg Jan 20 '22 at 07:05
  • 2
    Code size is relevant but cannot be taken as the only criteria - insecure code can be easily written with a few lines. Code complexity is relevant too, which is not the same as lines of code. History is relevant as does code cruft. Knowledge of developers, support in terms of money, criticality in terms of user base, ... . I agree with mentallurg that this will primarily lead to opinion based answers and therefore should be closed. – Steffen Ullrich Jan 20 '22 at 07:22
  • @SteffenUllrich Obviously, who and how the code is written and so on also plays an important role. But I am interested in the comparison between BASH and SH (dash) because SH is used as default shell for root and scripts are also run via sh. I want to know if this is only due to POSIX compatibility or if security also plays a role. – NewLinux Jan 20 '22 at 10:08
  • @NewLinux: *"because SH is used as default shell for root and scripts are also run via sh"* - this depends on the system. AFAIK /bin/sh is dash on Debian based systems (Ubuntu etc) and bash on Redhat systems (CentOS, Fedora, ...). And in https://wiki.ubuntu.com/DashAsBinSh you can read on the reason, why Ubuntu/Debian has changed to dash (was bash in the past too): efficiency, startup time. – Steffen Ullrich Jan 20 '22 at 11:41

1 Answers1

5

When a measure becomes a target, it ceases to be a good measure.

--Charles Goodhart

Your general reasoning that smaller codebases offer less attack surface than larger codebases and thus tend to be easier to audit is correct. So we can say "In general, the larger a codebase becomes, the more likely it is that it contains vulnerabilities."

However, code does not magically become more secure by becoming smaller. Once you take "small code size" as a target, then your code will look like what you see on Code Golf, and I assure you, that code does not include any security measures whatsoever.

An Example

Compare the following two code snippets:

char name[20];

printf("Enter name: ");
gets(name);

and

#include <stdio.h>
#include <string.h>

#define OK       0
#define NO_INPUT 1
#define TOO_LONG 2
static int getLine (char *prmpt, char *buff, size_t sz) {
    int ch, extra;

    // Get line with buffer overrun protection.
    if (prmpt != NULL) {
        printf ("%s", prmpt);
        fflush (stdout);
    }
    if (fgets (buff, sz, stdin) == NULL)
        return NO_INPUT;

    // If it was too long, there'll be no newline. In that case, we flush
    // to end of line so that excess doesn't affect the next call.
    if (buff[strlen(buff)-1] != '\n') {
        extra = 0;
        while (((ch = getchar()) != '\n') && (ch != EOF))
            extra = 1;
        return (extra == 1) ? TOO_LONG : OK;
    }

    // Otherwise remove newline and give string back to caller.
    buff[strlen(buff)-1] = '\0';
    return OK;
}

According to the simple metric "Smaller is more secure", the bottom snippet should be magnitudes more vulnerable, shouldn't it? But I am certain you have already guessed by my framing that this is not the case.

bash vs. sh vs. dash

Now, let's get to the meat of your question. Is it "safer" to write smaller scripts? Again, it depends. Just the fact that some binary is smaller than another really has no bearing on its security whatsoever.

You also seem to believe that sh or dash is used for root shells because it's smaller than bash, and thus more secure. The reason, as you stated in your question, is actually compliance.

You see, not every shell works the same. Meet with a group of Linux enthusiasts, say that (whichever) shell is clearly the best and you will have a heated debate lasting for hours, and probably one or two new GitHub repositories being created soon after. The reason is that more "modern" shells like zsh and bash (yes, it's more modern than sh) contain features, which other shells do not include. Or, they include features, which other shells do include, but which behave slightly differently. Or, they include features, which other shells also include, but have implemented with a slightly different syntax.

What you absolutely do not want under any circumstance is a sysadmin logging into some production system, trying to run a maintenance script and then seeing there was a syntax error on line 5192. As a result, root shells generally remain as sh (or compliant shells).

It's not a matter of security, but of usability.

Further, you mentioned dash specifically, which is a very small, POSIX-compliant implementation of sh. And indeed, small size is one way that dash reaches its three stated project goals:

  • Speed of Execution.

    By having fewer features, the parser can be simplified and thus run faster.

  • Availability with Limited Resources

    By having fewer features, the binary itself can be smaller, and can also have a smaller memory footprint.

  • Security

    By having fewer features, the existing features are easier to audit.

My wording here is very deliberate: It's not the byte-size of the binary, but the number of features in the program.


As a comment, you wrote that you use "less third party programs, such as sed or grep". That is generally a bad idea. These programs have existed for decades, have been audited countless times. The chance of sed containing a vulnerability is magnitudes lower than your code, which you had to write in order to do what sed already does.

  • I'm not saying that fewer lines of code are necessarily better. For example, if in a C program you save a few lines of code on initializing pointer variables with NULL values or checking them, that's bad. About the number of functions rather than code size, I agree, it's more correct. – NewLinux Jan 20 '22 at 13:34
  • About grep, what if I don't need all the functionality to solve a simple problem? For example, I need to check an argument to be only a number: Via grep: `echo "$1" | grep -Evq '^[+-]?[0-9]+$'` Through the built-in case: `case "${1#[+-]}" in ''|*[!0-9]*) exit ;; esac` What's wrong with the 'case' example? And I don't need to use an extra external program in my case. Certainly, in a more complex case the use is justified. – NewLinux Jan 20 '22 at 13:34
  • @NewLinux Because grep is the completely wrong tool for the job. If you are trying to parse command-line arguments, use [`getopt`](https://linux.die.net/man/1/getopt). –  Jan 20 '22 at 14:13