154

The Jurassic Park scene referenced in the title is infamous for how ludicrous it sounds to those who are tech literate. But it also illustrates what seems to me to be a glaringly huge hole in web security, particularly IoT devices--as soon as attackers find out a server or camera or baby monitor is running linux, they instantly know volumes about how it works. They know that commands like sudo are big juicy targets and they know that shell access will bring with it gobs of useful tools like ls and cat.

So why isn't OS obfuscation more of a thing? I'm not talking about just hiding the version in web headers. Similar to JavaScript minification or obfuscation, I'm talking about changing the names of binaries and filepaths in the OS itself. Wouldn't entire classes of attacks be practically useless if the OS had ha7TrUO and RRI6e29 commands instead of sudo and ls? Imagine a hacker that somehow gained remote root access--what are they even going to do if they don't know any commands?

Implementation would be fairly easy for compilers. Take the simplest case of "rename this function and all calls to it." You could give an OS compiler and an application compiler the same randomized names and they'd be able to talk to each other. But even if the application has poor security and is vulnerable to bash injection, such attacks would be fruitless.

Obviously this technique can't be used in all scenarios. Setting aside scenarios like servers maintained by human sysadmins, it seems to me that any device or server managed by automation is a prime candidate for this defense.

I guess the question(s) needs to be a bit more concrete:

  1. Is OS obfuscation as described used widely and I just haven't encountered it?
  2. If not used widely, what are the practical or technical barriers to usage?
Indigenuity
  • 1,323
  • 2
  • 7
  • 13
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/102384/discussion-on-question-by-indigenuity-why-is-this-defense-against-its-a-unix-s). – Rory Alsop Dec 19 '19 at 17:34
  • 1
    One could design a build environment which did all this symbol renaming automatically, and randomized, every build. As long as you had source for everything, it would be novel, but not research level hard. And if you did it once, you'd have it forever. Debugging would be hard. But in a way, I like the idea. It would be a similar thought as https://en.wikipedia.org/wiki/Address_space_layout_randomization but at compile time instead of runtime and much more ambitious. The more I think about it, the more I respect the idea. It WOULD increase security for embedded devices. (No silver bullet tho.) – Prof. Falken Mar 09 '20 at 09:30

14 Answers14

239

Before I tear your idea apart, let me say that it's a really interesting idea and it was super fun to think about.

Please continue to think outside the box and ask interesting questions!

Alright, let's do this!


Let's take a step back and ask why that baby monitor is running Linux in the first place? What if there was no operating system and the application was written in bare microcontroller code (think arduino code)? Then there would be no sudo or ls or even a shell for the attacker to use, right?

I'm not an expert here, but I expect that we, as an industry, have gravitated towards putting Linux on anything big enough to run it largely for developer convenience:

  1. Reducing dev time: When building your WiFi-and-bluetooth-capable web-administered cloud-syncing self-patching whizpopper with companion Android and iOS apps, Linux comes with all the libraries, utilities, and drivers you need to do that.
  2. Increasing testability: If the device is running bash or busybox with an SSH port, then it's super easy to connect in and figure out what went wrong during your product testing phase.

For your obfuscation idea to work, you'd need to obfuscate not only the names of command-line utilities like sudo and ls, but also every Linux kernel API to prevent the attacker from dropping in their own compiled binary that calls the kernel directly. So let's take another look at your idea:

Implementation would be fairly easy for compilers. Take the simplest case of "rename this function and all calls to it." You could give an OS compiler and an application compiler the same randomized names and they'd be able to talk to each other.

You'll need to do this randomized compilation yourself; otherwise someone could look up the mappings on google.

So, you'll need to build the kernel from source with your "obfuscating compiler" so that only you know the mappings of the obfuscated kernel APIs. (ever built the linux kernel from source? It's certainly more of a chore than docker pull alpine, which is the direction that dev culture seems to be going).

But an operating system is more than just the kernel. You want drivers for the Broadcom BCM2837 wifi chip that comes on that mini-pc device? You'll need to build that driver against your obfuscated kernel with your compiler, if Broadcom will even give you the source code. Then you'll need to build the entire GNU wifi and networking software stacks. How many other things will you need to find source for and add to your build pipeline before you have a functioning OS?

Oh, and if the upstream repos of any of those things issues a patch, you're now responsible for re-building it (assuming you saved the compiler obfuscation mapping files that match your kernel binary) and pushing it out to your devices because - by design - your devices can not use patch binaries produced by the vendor.

Oh, and in order to foil hackers, there'll be none of this "Here's the binary files for Whizpopper 1.4.7", oh no, you'll need to build a uniquely obfuscated version of everything from the kernel up per device that you ship.


So to your questions:

  1. Is OS obfuscation as described used widely and I just haven't encountered it?
  2. If not used widely, what are the practical or technical barriers to usage?

I think the answer is that what you're describing pretty much completely defeats the purpose of using pre-existing software components if you need to find and build literally everything from source. It might actually be less effort to ditch the operating system entirely, pretend it's 1960, and write your application directly in CPU microcode.

I like security more than most developers, but like f* that.

Pang
  • 185
  • 6
Mike Ounsworth
  • 57,707
  • 21
  • 150
  • 207
  • 48
    You would also need to rebuild every single tool, package and script which uses these commands and tools so that it makes the 'obfuscated' call. That's a *lot* of work. – Teun Vink Dec 17 '19 at 11:58
  • 13
    Days of compile time on Core i7? CLFS compiles much faster than this: several hours at most. Even faster if all you need is kernel+busybox+several_small_utilities. What will really take time is doing this tedious function renaming and syscall-number-remapping, trying not to break anything. And why is it a chore to build Linux the kernel from source? Do you try to avoid its `Makefile`s? – Ruslan Dec 17 '19 at 15:20
  • 4
    @Ruslan lol, I should have known I would get into this debate. I'll just say that given the pace and style of today's dev culture, the effort of setting up a custom linux build pipeline >> grabbing COTS CentOS and `yum install nodejs`. – Mike Ounsworth Dec 17 '19 at 15:43
  • Great answer, thanks for your time answering! I'm not entirely convinced that dev time would have to be sacrificed to allow for a special compile step during a "release." That aside, your point about device drivers I think is a rather large barrier. I doubt that can be overcome without sizable changes to the OS. Though now I wonder about just leaving kernel space alone and making some user space obfuscation... – Indigenuity Dec 17 '19 at 16:06
  • 5
    @Indigenuity, regarding just obfuscating the userspace: there are much more effective ways to secure commodity consumer electronics, such as setting a per-device random admin password and following well established best practices. Also, if a manufacturer isn't going to bother using anything other than "admin123", they're also not interested in obfuscating the commands. – Ghedipunk Dec 17 '19 at 16:12
  • There was a time when baby monitors were purely analog devices, with no code or computer at all... – Phil Frost Dec 17 '19 at 18:30
  • 5
    @MikeOunsworth, most people developing IoT build everything that goes on the device including kernel and bootloader and often also the cross-compilation toolchain. It is what the embedded development tools do. Yocto starts with just host C compiler and Python, Ptxdist usually gets the cross-compiler pre-built. ARM boards require board-specific setup as they don't have any standard BIOS, so some specific steps are required, and for most boards you just don't get pre-built binaries. So there is one toolchain to modify if somebody wanted. The real problem would be debugging it. – Jan Hudec Dec 17 '19 at 20:31
  • 15
    Even if OS system calls/IOCTLs were obfuscated, known binaries could be used to guess at the obfuscation pattern. If I saw a binary that invoked syscall 7551 and then printed using a format string that looked like the output of `stat`, I could guess pretty reasonably that it was the `stat` binary and the syscall nr of `fstat` or friends was 7551. – nanofarad Dec 17 '19 at 20:32
  • 3
    Based on my experience running Gentoo, you could probably compile a complete Whizpopper stack in four hours or so, less if you're using lightweight packages such as uclibc, busybox, and lighttpd rather than glibc, the GNU stack, and apache. – Mark Dec 17 '19 at 22:02
  • 3
    "It might actually be less effort to ditch the operating system entirely, pretend it's 1960, and write your application directly in CPU microcode." - Well, people have actually thought about this and released frameworks like [includeos](https://www.includeos.org/) (no affiliation) that basically enable you to do that (the intended target seems to be VMs or similar containers on servers, stripped of everything but the necessities to get the job done). I'm not sure however on the practicality of distributing such minimal systems to the end user... – hoffmale Dec 17 '19 at 23:04
  • @hoffmale The research term for such systems I'm aware of is "unikernel"; essentially a (set of) static library/ies to be compiled against that provides all of the OS functionality desired by the application(s) in use. – JAB Dec 18 '19 at 01:20
  • 2
    Pardon me, but will it work even if we assume "ideal obfuscation" (i.e. all was re-compiled)? Hackers are people and they will be able to match calls and potentially rebuild the mapping, at least partially. See that thing calls that thing? Let's connect these dots. And again, and again. I got myself a graph! What does this look like? Let's compare with some graph of readable OS / scripts calls.. Oh, now I get it, `RRI6e29` is actually `ls`! (well, maybe not this blunt but you get the idea) – Alma Do Dec 18 '19 at 08:50
  • 15
    I disagree with your statement that linux is used mostly for developer convenience. Writing an OS from scratch for your internet connected camera would require to implement quite a lot of supporting tools and libraries. Any layer is hard to implement and chances are your home-developed system would have tons of vulnerabilities as you are not an expert in all of those layers. Taking a light-weight linux and adding support for your custom camera driver plus a layer for convenient web configuration is safer. (and still a lot of companies fail to do the configuration in a safe way!) – Manziel Dec 18 '19 at 10:01
  • 1
    "you're looking at hours and hours (possibly days?) of compile time here even on an i7" I think threadripper is now the benchmark for project compilation (especially the ones that can be heavily parallelized). cc @Ruslan – Braiam Dec 18 '19 at 19:37
  • 14
    +1 for encouraging to keep thinking differently – Clockwork Dec 18 '19 at 19:56
  • 2
    In particular if you used Linux you would have to release all of your source code or make it available upon request: `The “Corresponding Source” for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities.` Which I think would include your mappings... – Jason Goemaat Dec 18 '19 at 22:56
  • @MikeOunsworth I have a question, why not have a password before all calls `(like sudo)`. System setup ask if you want calls to be password protected, and what that password would be. This prevents changing dev time, and testability. Naturally I am not saying we develop it, just asking if you think it could be feasible. – Nightwolf Dec 19 '19 at 05:19
  • 3
    @Nightwolf That's actually a step BACKWARDS from `sudo`. Before `sudo`, if you wanted to run root commands you had to know the machine's root password and run them using `su`. `sudo` allows users to run root commands (or, in theory, run commands as any other user, its config is extremely powerful and flexible) by typing _their own_ password, because it turns out having a single central password that provides access to root powers and is known to everyone who needs to run privileged commands is terrible for security. – FeRD Dec 19 '19 at 07:07
  • @FeRD `That's actually a step BACKWARDS from sudo` . You are assuming that the password removes other security. No what I mean is all existing security in addition with obfuscation in the form of a password. Alternatively you could call it a salt instead of a password. – Nightwolf Dec 20 '19 at 15:41
  • One could design a build environment which did all this symbol renaming automatically, and randomized, every build. As long as you had source for everything, it would be novel, but not research level hard. And if you did it once, you'd have it forever. Debugging would be hard. But in a way, I like the idea. It would be a similar thought as https://en.wikipedia.org/wiki/Address_space_layout_randomization but at compile time instead of runtime and much more ambitious. The more I think about it, the more I respect the idea. It WOULD increase security for embedded devices. (No silver bullet tho.) – Prof. Falken Mar 09 '20 at 09:29
91

Mike's answer says basically everything I have to offer about why this is a bad idea from a development perspective (and, as Ghedipunk's comment says, an unusable security feature provides no security). So instead, I'm going to talk about why from a security perspective, you would never do this.

The answer is actually surprisingly simple: it's a waste of time, and there are strictly better options. Every stupid IoT doodad (remember, the "s" in "IoT" stands for Secure) that doesn't bother to implement such features sure as hell wouldn't go with an approach like you suggest.

  1. The whole idea just won't work for restricting system calls. An attacker can just set a few registers and invoke an opcode and boom, they're in the kernel executing the syscall of their choice; who cares what the symbolic name of it is? Sure, you can tamper with the syscall table to complicate this (if you don't mind needing to recompile everything and making debugging your custom kernel a form of utter hell) but it's like obfuscating the OS in use; why bother when there are so few candidates, anyhow? Even if the attacker doesn't want to reverse engineer existing code on the system, brute-forcing should be possible unless the search space of usable call indices is wider than I've ever seen on an embedded system.
  2. Why obfuscate command names when you can just make them entirely inaccessible? A chroot won't work for shell built-ins if you're running a shell, but it works fine for everything else and really, why would your custom-built single-purpose app-in-a-box be running a shell? I mean, dev units would have one installed, for test purposes, and maybe that doesn't get removed in your retail image because you're lazy or think you'll need it again. But an attacker wouldn't be able to run it from the context that its app runs as. A simple chroot (or a more complicated sandbox/jail/container) can make the program unable to run - or even access - any files beyond those required for its job.
  3. Why obfuscate the kernel APIs when you can just remove access to them? There are a number of sandboxing systems that can restrict what calls a process (or its descendants... if it's even allowed to create any) can make. See https://stackoverflow.com/questions/2146059/limiting-syscall-access-for-a-linux-application
CBHacking
  • 40,303
  • 3
  • 74
  • 98
  • 10
    What's more, a widely tested security approach, like chroot on a popular Linux or OpenBSD distro, is likely to be more robust in the wild than an entirely new, intentionally hard-to-test, approach. Closing unneeded ports:;also good. – O. Jones Dec 17 '19 at 23:58
  • 67
    +1 for ‘remember, the “s” in “IoT” stands for Secure’ X^D – ErikE Dec 18 '19 at 17:40
  • If you control the distro, you can scramble the order of the kernel syscalls. You just rebuild glibc accodingly and so on. – Kaz Dec 20 '19 at 07:59
  • 3
    Regarding issue 1, the PlayStation 4 operating system actually did this. It is based on FreeBSD and the syscall numbers themselves are randomized. Of course, it was not difficult for a hacker to discover what syscall numbers referred to what syscalls, since each syscall has a more-or-less unique behavior when given various arguments: https://cturt.github.io/ps4.html – forest Dec 21 '19 at 11:33
30

If your objective is to deprive an attacker of ls and cat, there's an even better alternative to obfuscation: just don't install those utilities.

While I wouldn't say this is a widely implement approach, it is at least implement. For example consider distroless, a collection of docker images with pretty much nothing in them. Some of them (like the one for go) literally have nothing in them. An attack on a system running in such a container can't get shell access because there isn't any shell to run.

The only way then to get shell access by attacking an application in such a container is then to circumvent the container runtime, which is designed to prevent precisely that.

While I gave an example of a docker image, the same concept can be applied to operating systems generally. For example, ls and cat are part of the coreutils package in Debian. You could run apt-get remove coreutils and be confident an attacker won't be able to use ls or cat as part of an attack. Of course this means you can't use them either, and there's probably a lot of other stuff that depends on coreutils that will also have to be removed, but for an embedded device or a server that does just one thing that may be OK.

The general principle is reducing "attack surface": the more "stuff" a target has, the easier it is to compromise. The stuff could be open network ports, lines of code, or binaries installed. If increasing security is the objective, then removing all unnecessary "stuff" is a good start.

Phil Frost
  • 725
  • 4
  • 10
  • 3
    Arguably if I had an installation with ls, cat, docker and not much else on, the first thing I'd want to remove to improve security would be docker. Most sane IoT stuff is using busybox ls, cat if they have it installed which is *the same binary* for every tool. – abligh Dec 17 '19 at 19:41
  • 3
    Thanks for introducing me to _distroless_, interesting concept! – ElectricWarr Dec 18 '19 at 15:09
  • Good luck diagnosing issues when random scripts break because they can't find `cat`... – Federico Poloni Dec 19 '19 at 12:02
  • @FedericoPoloni: `set -euxo pipefail` and `shellcheck` are a good start...and tbh, most Unix scripting felines are candidates for That Notorious Award. – Piskvor left the building Dec 19 '19 at 12:48
  • 5
    Years ago Bruce Schneier gave a talk at a tech conference in which he explained that the systems his company installed on their customers' networks ran a stripped-down Linux kernel without `bash` or many of other tools that could be used by an attacker. He gave the analogy of someone leaving a tool box under the outside of their bedroom window, which would be a convenience for a burglar. The basic rule is that if you don't _need_ it, you don't _install_ it, thus reducing the contents of that tool box. – Monty Harder Dec 19 '19 at 20:50
  • Around 15 years ago I worked on interfacing with some 3rd-party equipment. Part of their system was a "real" embedded system, part was a Windows box that talked to the other system on one side and (essentially) remote API calls for my system on the other side. One time they needed some help debugging based on the test system they sent to my office. Had me hook up the Windows system and check stuff for them - **and it had everything, games and all**. There was physical protection (locked box) but if it were not truly locked down it would have been a major security hole. – manassehkatz-Moving 2 Codidact Dec 20 '19 at 01:44
13

Because obfuscation isn't security and because OS obfuscation is basically nonsense.

There's only so many common OSes, and so many ways to make educated guesses. If you detect that I'm running IIS or MSSQL Server, you have one guess at what OS is running underneath.

Even if I somehow manage to run a stack that tells you nothing about my underlying OS, and also mask every other hint at it (fingerprinting is a thing), I still haven't won much.

Security-wise, you knowing that I'm running Linux, or even that I'm running Debian 8, doesn't give you much to work on nor me much to worry about. If I'm properly hardened and patched, you could know whatever you want. If I'm on a patch level that has applied to the software museum yesterday, your suite of attacks will break me wide open simply through trying them all, and obfuscating my OS only forces you to try a couple useless exploits more. In an automated attack, it'll slow you down all of a few seconds.

Obfuscation doesn't work. People can scan you, fingerprint you or just throw their entire library of exploits to see what works.

If you waste time on that which you could've spent on actual hardening, you're actually harming your security.

Proper hardening works. I said above that "you could know whatever you want". I've posted my IP address and root password at security conferences where I gave a speech. SSH with remote root login and a bunch of services enabled. Seriously hardened SELinux machine. Nobody has ever managed to interrupt my speech, though one person once managed to drop a text file into the root directory when my policy wasn't yet perfect.


Addendum: Your starting point was a movie. Obfuscation is a great movie device because a revelation like that tells the viewers that the hero (or villain) found some piece of information and is thus making progress. It's the computer equivalent to finding out where the safe is, even though you still need the code. It doesn't matter if it's factually correct, if it conveys the proper emotional message to the audience.

Tom
  • 10,124
  • 18
  • 51
  • You are running... Linux with Wine? Anyway, the JP system was not *intentionally* obfuscated; there was no reason it would be, any more than your desktop OS is intentionally obfuscated. Security in JP would rely on authentication (and *physical* security), just like on modern desktop systems. – Matthew Dec 18 '19 at 17:50
  • Minor quibble: SQL Server 2019 does actually run on Linux (and also Powershell, and ASP.NET). I don't know of anyone who actually does that... but it is possible. – GrandOpener Dec 18 '19 at 17:51
7

For an extremely widespread example, Intel Management Engine, which has its own OS, does something like that, where there exists a firmware update mechanism, but the firmware has to be in a very specific format the details of which are confidential. It appears to involve Huffman encoding with unknown parameters. Similarly to your proposal (which is basically symmetric encryption of firmware), ME requires specific modifications at firmware preparation time and has a matching deviation from standard mechanisms at execution time.

Roman Odaisky
  • 254
  • 1
  • 3
  • Hmm. I thought Intel Firmware was required to be code-signed by the Intel CA. Do they need more protection on top of that? What is the threat that this obfuscation prevents? – Mike Ounsworth Dec 17 '19 at 15:40
  • 10
    The obfuscation tries to prevent reverse engineering, so attackers cannot find even more security problems in their implementation. – eckes Dec 17 '19 at 16:41
  • 8
    Or it’s not impossible Intel themselves are doing something shady they’d like to conceal. – Roman Odaisky Dec 17 '19 at 17:40
  • 3
    Worth to note that a lot of [security vulnerabilities](https://en.wikipedia.org/wiki/Intel_Management_Engine#Security_vulnerabilities) existed and very likely still exist in the ME. And the whole obfuscation also makes it harder for security researchers to analyze it. – Josef Dec 18 '19 at 11:28
  • 1
    Intel ME is a sick joke – john doe Dec 18 '19 at 22:02
  • 1
    @johndoe I take it you've never heard of the newer Intel "Innovation Engine". :P – forest Dec 21 '19 at 11:35
5

What you are describing is called "Security through Obscurity" and is a widely known security antipattern. Meaning it's not a new idea, rather it's an old, bad idea, that people who are uninitiated in security philosophy must be educated so as not to fall into.

Imagine you were designing a house to be secure, and thought obscurity was a promising strategy. All houses are constructed out of hallways and rooms, doors with doorknobs, light switches, etc. Since these are commonly known, you might reason this is insecure since an intruder could easily navigate the house by these commonly known construction elements. You might try to redesign the principles of construction from scratch, replace door knobs with rubiks cubes, make half height ceilings to confuse the intruder etc. You end up with a house that is terrible to live in, cannot be maintained because no contractor wants anything to do with it, and worst of all it's for nothing because any intruder, once inside, can look around with his eyes and use his brain to figure it out. Hackers are puzzle addicts at heart.

The solution to securing your house is not to have a non-standard construction, it is to have better locks, secured windows, a proper security system, etc. You don't want to hide your gold in a secret passageway, because eventually Abbot and Costello will lean on that candlestick and open it accidentally. Put your gold in a safe with a good secret combination and a tamper alarm. The computer equivalent is having restricted credentialed access by public key cryptography, limited user access roles, monitoring systems, reducing exposed surface area, entry threat vector mitigation, etc.

Efforts to make a computer system more obscure only make your system farther from support, harder to get security patches, harder to use in a secure way.

Edit: Sometimes, some security through obscurity is acceptable, when used in addition to actual security. A common example is running SSH on a high port like 30000 something. SSH is encrypted, and access is behind a credentialed authentication, that's your real security. But having it on a high port just makes it less noticeable to begin with if someone is doing quick scans. Anything more convoluted than this, such as trying to obfuscate your OS, would just make it an operational nightmare, that would make actual security measures (like being up to date on patches) more difficult.

user1169420
  • 159
  • 3
  • 1
    "some security through obscurity is acceptable": your example of using a non-default port is not *security*, that's *convenience*: in this case, "I don't want my ssh logs full of failed attempts." No bearing on security at all. – Piskvor left the building Dec 19 '19 at 12:38
  • You are correct. Poor wording on my part. – user1169420 Dec 19 '19 at 16:41
  • For more discussion on security through obscurity, see [Isn't all security through obscurity?](/a/44096/129883) and [The valid role of obscurity](/a/2431/129883). The comments are insightful on that second link. As for changing default ports, there's some caveats, as discussed in [Should I change the default SSH port on linux servers?](/a/32311/129883). Again, some useful stuff in the comments. – Fire Quacker Dec 19 '19 at 17:22
  • @Piskvor Using a non-default port does contribute to security by keeping out the casual riff-raff. My father taught me when I was very young: Locks are to keep out the average person, not the hardened criminal. He'll pick your lock,. break it, smash the window and gain entry that way. That having been said, rather than changing the SSH port on a server, simply don't route SSH connections through your firewall to them (except for an SFTP server that is designed to be Internet-facing). Make legit admins get on your VPN to gain remote access. – Monty Harder Dec 19 '19 at 21:28
  • 2
    **Don't put SSH on a high port.** You can put it on a non-default port, sure, but putting it on a high port allows a non-root local process to crash the SSH server and bind to that high port in its place. A low port is important for servers because it requires root (well, certain capabilities that come with root) to bind. – forest Dec 21 '19 at 11:36
  • @MontyHarder That's exactly what I wrote: less spam in the failed logins log. Secure against my kid sister trying root:toor? And there was much rejoicing. (Yaay.) – Piskvor left the building Dec 23 '19 at 20:09
4

Think usability

  • Try to explain to your new sysadmin that he have to hit ha7TrUO instead of sudo, RRI6e29 instead of ls, and so on... Yes, there is a list of translation you just have to learn!

  • Browsing local network must not be possible too: you have to maintain paper list in order to keep an overall view!?

  • If you rename all critical commands, you will have to drive this for making system upgrade!

  • And as Nick2253 comments out, if you try to build embedded system, then do obfuscation before publishing end product, you will have to test final product.

    • You have to create home made script to bind everything for obfuscation.
    • If something goes wrong, debugging will become tricky.
    • You have to create home made deobfuscation scripts, in order to be able to do some debugging.
    • Custom feedback (with log files) will have to be deobfuscated too.

Doing so, you will add a layer of important work with new potential bugs.

For very few security improvement, see further.

Obscurity could harm yourself.

Think lower level

  • Even if you try to rename files, function at application and OS level, all this will use standard libraries who will be called directly if an attacker sends binary executable files. You could even consider obfuscating all standard libraries! (Including filesystem, network protocols...)

  • While you use unmodified kernel, they will use standard variables and namespaces. So, you will have to create obfuscated version of kernel...

... Then bind everything together!

Well, even when everything is obfuscated, the application will have to deal with the internet. Obfuscation won't prevent from conceptual bugs about this!

Obscurity, if not at all levels, won't improve security.

Full obscurity is not really possible, due to the need of using standard protocols in order to be able to deal with the internet.

Think license

If you plan to distribute GNU/Linux, you have to share source code as described in GNU GENERAL PUBLIC LICENSE GPLv3 and/or GNU LESSER GENERAL PUBLIC LICENSE LGPLv3 (depending on application and libraries used). This will imply publication of obfuscation method along with every distributed product.

Strictly answering to your two questions:

I'm something an expert, but with a very small focus. Working on many different small business infrastructures, I've done some choices some years ago. Evolution and history seem to confirm my choices, but it's just my personal point of view.

Is OS obfuscation as described used widely and I just haven't encountered it?

I really don't know in which proportion obfuscation is widely used, but I don't use security by obscurity and recommend not to.

If not used widely, what are the practical or technical barriers to usage?

No barriers! Just it's counter-productive. Time cost quickly become gigantic and security improvement is a lure.

Of course, some minimal things are to do:

  • Don't start ssh server if not needed.
  • Don't install sudo. Use correct permissions, groups and coherent structure.
  • Keep your global infrastructure up-to-date !!!

Little sample when light come over obscurity

  • Securing ssh: two way.

    • Move ssh port from 22 to 34567

      • Light and quickly done, but

      If an attacker found them, they could engage smooth brute force against this port a lot of time until end user discover them.

    • Install firewall

      • Stronger, require more knowledge, but

      More secure while up-to-date.

Pang
  • 185
  • 6
  • 1
    This doesn't really answer @OP's question. They specifically set aside systems maintained by humans, so the usability arguments don't really apply. – Nick2253 Dec 18 '19 at 13:17
  • Even embed system could be maintained, edited and upgraded. – F. Hauri - Give Up GitHub Dec 18 '19 at 13:22
  • Of course. And I would assume, based on OP's question, that those manually maintained systems would be excluded. In particular, OP calls out systems "managed by automation" for this type of obfuscation. Obviously, people are involved in all system management at some level, but I took this to mean systems that are directly managed by automation; or, put another way, where automation tools abstract away the management to the point where OS-level obfuscation would be irrelevant. – Nick2253 Dec 18 '19 at 13:34
  • @Nick2253 I've added some more from my point of vue. – F. Hauri - Give Up GitHub Dec 20 '19 at 11:16
4

Wouldn't entire classes of attacks be practically useless if the OS had ha7TrUO and RRI6e29 commands instead of sudo and ls? Imagine a hacker that somehow gained remote root access--what are they even going to do if they don't know any commands?

A better alternative would be to simply not install these commands in the first place. I don't believe you actually need a shell to run a Linux kernel, although this implies that your start-up process is something other than sysV-init or systemd.

As others have noted, however, it's a trade-off between security and ease of development. Unfortunately, many device makers care much more about the latter than the former.

Implementation would be fairly easy for compilers. Take the simplest case of "rename this function and all calls to it." You could give an OS compiler and an application compiler the same randomized names and they'd be able to talk to each other. But even if the application has poor security and is vulnerable to bash injection, such attacks would be fruitless.

I'm not sure you need the compiler's help at all. I think you can mostly achieve this by modifying the linker¹ to apply a randomization to all symbol names. Probably just applying a hash function with a salt that is known only to the manufacturer would suffice. I'm not sure you even need source code to do this, i.e. I think you could apply the mangling to already compiled code.

(¹ I think this is something of a lie. You may need to modify the object code in order to replace the symbol names, but this is still more like the assembler level, which is to say, a) you aren't doing all the hard bits of compiling C/C++/etc. code and b) you don't need the C/C++/whatever source code. OTOH I'm not sure you can do this sort of thing at all if your device code is instead something like Python.)

There is still some possibility to reverse-engineer the process, however, and moreover this may violate the GPL² (especially GPLv3) unless you give out the salt, which would defeat the purpose.

In fact, "because of the GPL" is probably the main reason why you don't see this; it would be hard to implement in a way that is actually useful aside from making each device different. OTOH, that would at least mean that attackers can only target specific devices rather than being able to exploit a vulnerability on "any device running Linux x.y.z".

(² For simplicity, I will just use "GPL" throughout, but note that this generally applies even for LGPL'd stuff.)

That said, note that the GPL doesn't require you to publish the salt because you "modified" the sources. If the symbol name randomization happens at compile time, you haven't modified the sources. The reason you would need to publish the salt is because the GPL requires that users may substitute their own versions of a GPL'd library, which they can't do unless they know the salt. (As noted, you may be able to weasel out of this with GPLv2, but the technical effects are similar to "only run signed software", which GPLv3 was specifically written to address.)


Ultimately, there could be some advantage here, but you aren't going to make any particular system notably more secure (it's well known that "security through obscurity" generally is no security at all). What you can accomplish is making it harder to target many systems via a single vector.

Matthew
  • 423
  • 2
  • 8
  • "I think you could apply the mangling to already compiled code." Is a very interesting thought that would affect many other answers given here. I'm not familiar with the idea of separating linking from compilation, at least not outside of university classrooms. – Indigenuity Dec 19 '19 at 15:40
4

Fair Disclosure: I'm the CTO of a company that builds just this. De-bias for that all you want.

It IS indeed possible to completely build a system kernel-up, device drivers, packages, the whole stack PER host, multiple times per day, and it can be quite effective. That's exactly what we do, and we help our customers do. We're not the only ones - we can infer that at least Google does this for all their machines as well (reference coming soon.)

Now if you had the ability to rebuild per-machine from scratch what are some of the things you can change?

The Kernel Self Protection Project already allows for this through random reordering of kernel structures: https://lwn.net/Articles/722293/. This article also points to the famous exchange where Linus Torvalds calls it security theatre, but the author of the project (who works at Google) comments, "Well, Facebook and Google don't publish their kernel builds. :)"

This leads credence to the inference that at least Google does this and considers it useful. Can we do MORE types of scrambling over a closed set? At the most fundamental level why doesn't a Windows virus run on Linux or a Mac? Because the formats are different. It's all x86 underneath, and yet it's not the same. Well what if two Linuxes were different in a similar fashion? Windows vs Linux isn't "obfuscation" merely because we don't have a lot of ways to make Linuxes as different as Windows is to Linux. But it's not impossible, and it's not really even that hard. Take the KSPP's approach and apply it to syscalls, then recompile everything on top against those syscalls. That's going to be hella difficult to break - at least not in a flyby fashion.

Your question though was about symbol renaming (names of executables, libraries, etc.) This has two aspects: (a) is it useful? (b) Can it be done reliably?

We were looking for a way to solve PHP code injections once and for all. Despite what HackerNews would have you believe, PHP code injections are not a solved problem outside of internet message boards. Real life PHP developers, admins and users are exposed to a continuous number of code injections.

So we set out to try Polyscripting (permissively MIT-licensed Open Source): https://github.com/polyverse/polyscripted-php

We share this publicly to solicit feedback, and we run two websites, polyscripted.com and nonpolyscripted.com, which do exactly what you'd expect based on their names. We also hired pentesters to try to break it.

And I'm just beginning to experiment with executable, shared libraries and exported-symbol renaming over a closed set (in a docker container, for instance). I don't personally think this adds as much value, but will it add a little? I think so. So it comes down to cost. If you can get little value for even littler cost, why not just do it? That's exactly why we all have ASLR - it's not exactly the greatest defense since sliced bread, but if you already have relocatable code, and you're going to reorder it anyway, why NOT push it to the limit and randomize it?

In short, the approach you're describing is being attempted by many people (including Google, Facebook, the Linux kernel, etc.), along with many academicians under the field of Moving Target Defense, and a handful of companies like ours who're trying to make it trivially consumable like ASLR.

  • 2
    This would be improved if it was edited to sound less like a press release and more like a generic answer with a "we have experience doing this" slant. Can you focus more on the concept and less on your company? – schroeder Dec 20 '19 at 07:17
1

I would tell how I see this from hacker point of view.

Definitely, if you will rename all utilities - it will make things much harder and less comfortable for me, but what I could do?

  1. If I already got access to shell on system, I would just upload either busybox (all basic utilities in one binary) or full set of binaries I need for basic operations.

  2. To escalate privileges further, I may need to look for suid-root binaries, and there is just few of them (sudo, mount, etc.). Hacker cannot upload such files (unless he is already root). My simple small linux system has just 24 suid binaries. Very easy to try each of it manually.

But anyway, I agree, this would make hacking this box harder. It would be pain to use (hack) this system, but possible. And hacker works on system for hour or day or month... but admin/user may work on system for years and cannot remember ha7TrUO and RRI6e29 names. Maybe nobody will even try to hack system, but admin will suffer every day.

But... if you make security too high but uncomfortable for user/admin - often you're making security lower in fact. Like if you enforce complex passwords with 20+ characters - most likely passwords will be written on post-it notes on monitor. If you will make system such obfuscated - it will be very hard to work on it for good users. And they will have to do some things to make their life easier. For example, they can upload their busybox binary. Temprorary. And they will forget about it or will leave it there intentionally (because they plan to use it little later).

yaroslaff
  • 59
  • 3
0

It actually does happen, because ssh connections are encrypted. Changing the names of some core binaries doesn't accomplish anything more than this. Also it would cause a lot of chaos: e.g. any scripts relying on these utilities are either rendered unusable, or you'll need to have some kind of proxy "deobfuscator", which would surely end up as an attack vector. However I've seen some servers in the wild that don't allow root login, forcing to use sudo instead. The usernames of sudoers remain somewhat secret. I'm not sure how secure it is, but I'm no expert.

0

Why don't all doors have ten keyholes, only one of which actually worked for keys or lock picks? Answer: mostly the bother isn't worth the ten extra seconds of a thief's time.

If there were millions of unique created in isolation source code operating systems, obscurement might be common. As a practical matter though, we have roughly four unique code bases in common use - all leak info about themselves by event timing, and for several low level OS functions where chip manufactures give prescribed machine bitcode sequences to activate or access certain cpu features, they all likely have short sequences containing exactly the same code.

Dusty
  • 11
-2

If you obfuscate GNU/Linux on an IoT device distributed to the public you will be forced to publish your code because it is under the GPL, or take the device off the market.

Obfuscating GPLed software — a strategy dependent on secrecy — is not a viable idea.

Not sure why this is downvoted; but if the objections are in the vein of Charles' comments: Simply renaming the core utilities will not work (for a simple start, you'll have to edit the GPLed init scripts). I strongly suspect that you'll also have to re-compile many binaries which interact with other binaries like the gcc compiler driver, ld.so, sh/bash etc. All shell scripts and many configuration files need to know the names and locations of binaries, so they have to be changed.

All this is GPLed and you'll be forced to publish the changed sources, scripts and config files, thus frustrating the obfuscation efforts. If you use the device only in-house there is no obligation to publish the sources, but at the same time the case for obfuscation is much weaker: You'll frustrate mostly yourself.

  • 4
    You're certainly not required to publish your filenames -- and I've never ever seen anyone claim that, say, an author publishing `someprog.tar.gz.sig` (even when there are other copyright holders) is required to publish their private signing key to let others reproduce the signature. Such a claim would be laughed out of the courtroom. – Charles Duffy Dec 18 '19 at 21:33
  • ...moreover, if you patch something for use *only in your own datacenter*, you don't need to publish that patch at all; it's only distribution that triggers the license requirements. – Charles Duffy Dec 18 '19 at 21:36
  • 2
    @CharlesDuffy If you change the Linux kernel and gnu tools source and GPL'ed shell scripts (which is what needs to be done if I understand the OP correctly), compile it and put it on embedded devices which you ship to your customers, you surely need to publish your source code which will reveal all the laboriously obfuscated system calls and gnu binaries. I did not talk about signing keys, nor did the OP, I think. You don't have to publish changed file names, true; but any script or program *using* that renamed binary or script needs to be changed, and its source published. – Peter - Reinstate Monica Dec 18 '19 at 22:00
  • 1
    @CharlesDuffy If a binary or script is simply renamed the new name probably does not need to be published; but any *caller* will have to be modified. If the program or script is never used on the embedded device, it's probably a good idea to not include it in the shipped image in the first place. – Peter - Reinstate Monica Dec 18 '19 at 22:06
  • Sure. "Ship to your customers" is a different thing from "use internally". Something can be widely implemented even if it's not widely implemented *and shipped to external clients*. – Charles Duffy Dec 19 '19 at 16:38
  • @CharlesDuffy The question is specifically about IoT devices -- "web security, [....] server or camera or baby monitor", implying distribution. – Peter - Reinstate Monica Dec 19 '19 at 16:42
  • Yup -- I'll give you scope on the camera or baby monitor, certainly. – Charles Duffy Dec 19 '19 at 17:19
  • a) you don't need to ship the source until requested. b) even if you shipped it, why should the receiver give it to a hacker? – Thomas Weller Dec 19 '19 at 20:33
  • @ThomasWeller (a) The hacker requests it. (b) You give it to the hacker because they requested it. – Peter - Reinstate Monica Dec 19 '19 at 22:09
  • If not the original client requests the source, you can inform the client that the system is already compromised. If it were not compromised, he would never had the possibility to read the GPL copyright notice and thus never had the opportunity to request the source. – Thomas Weller Dec 19 '19 at 22:40
  • @ThomasWeller It is not unusual that third parties receive information about suspected GPL violations and investigate. I suppose it is detectable that a Linux is running on a device without hacking it. There are harder side channel attacks than that. But my point was that a wide-spread device running an undisclosed Linux will be found out and will be forced to publish the sources or take it off the market. Obfuscating Gnu/Linux for IoT is not a viable idea. – Peter - Reinstate Monica Dec 19 '19 at 23:13
  • While this is a legal, licensing-related answer, and applies specifically to Linux and other copy-left software, Unix is not Linux (usually), and this answer doesn't apply to operating systems in general, which seems to be the main gist of the question (though it does name Unix specifically as an example). Infosec and legality are not the same thing. – Ghedipunk Dec 20 '19 at 17:49
  • @Ghedipunk The OP said "as soon as attackers find out a server or camera or baby monitor is running **linux...".** I also would be amazed if you could obfuscate a closed-source OS because as I explained you would at least need to patch ld.so or equivalent, and probably a host of other processes. – Peter - Reinstate Monica Dec 20 '19 at 22:16
  • Ehhh, just trying to give a suggestion as to why you've gotten 5 downvotes: You're addressing a security question from a licensing standpoint. – Ghedipunk Dec 20 '19 at 22:20
  • @Ghedipunk The OP did not think through what their idea entails. It is not viable. The legal problems are a consequence of the fact that much more is needed than just `mv /bin /nib && mv /nib/bash /nib/hsab` etc. – Peter - Reinstate Monica Dec 20 '19 at 22:52
  • I do not think GPL is a factor in any of this. And even if you did need to publish the result, it would not defeat the protection. Because who, in their right mind, would hardcode the new randomised filenames? You would create a translation function, just like ***any*** obfuscation process. You do not have to publish, under GPL, the result, and even if you did, do you publish each, individual randomised result? No, and even if you did, an attacker would have to find and sort out which version they are working with. – schroeder Dec 24 '19 at 11:46
-2

Compilers (actually, linkers) do this already. It's one of the main reasons reverse engineering is hard.

In your source code you might have these functions:

void print_hello() {
    printf("Hello!\n");
}

void print_hello_twice() {
    print_hello();
    print_hello();
}

Here's how they look after they're compiled into a program:

0x400582 push rbp
0x400583 mov rbp,rsp
0x400586 mov edi,0x400634
0x40058b call 0x400490
0x400590 nop
0x400591 pop rbp
0x400592 ret
0x400593 push rbp
0x400594 mov rbp,rsp
0x400597 call 0x400582
0x40059c call 0x400582
0x4005a1 nop
0x4005a2 pop rbp
0x4005a3 ret

As you can "clearly" see, print_hello is now called 0x400582, printf is now called 0x400490 and print_hello_twice is now called 0x400593. All the calls were also given the same randomized names, so that functions are able to call each other.

However, the linker is only able to do this within a program, since it links one program at a time.

user253751
  • 3,885
  • 3
  • 19
  • 15
  • 1
    Compilation is not obfuscation: [Obfuscation and Decompilation](https://jonskeet.uk/csharp/obfuscation.html), [Obfuscating C-based binaries to avoid decompilation](https://stackoverflow.com/a/2273676/1765658). – F. Hauri - Give Up GitHub Dec 20 '19 at 13:11
  • @F.Hauri It does what the asker is asking for, does it not? But, only within the boundaries of a single program. – user253751 Dec 20 '19 at 14:56
  • This is *not* what the OP is asking for. Your answer is that "this is done already", which, if in fact true, would make the question moot. The context is: "I'm talking about changing the names of binaries and filepaths in the OS itself" and the commands mentioned in the post. – schroeder Dec 24 '19 at 11:41