20

We have an AI model which needs to be deployed on premise. The hardware will be provided by us, so we can do what ever we want on the device. The device is a mini PC running Ubuntu 18.04.

The UI is launched in kiosk mode so only our program will be running. The user should be able to start the machine, provide the username+password to login and the software will start automatically.

This device should be locked down. I.e. we need to:

  1. Encrypt the hard-disk so that nobody can take the hard disk and copy the files by plugging into a different computer.
  2. Enable the user to login and use the application.
  3. Store the AI model, use it in a way that user will not be able to copy it or use it with any other software than ours.

To enable this, there are some issues that I found out. For user to use the system, we need to share the encryption password. Hence he will always be able to plug the hard disk to any other system and use it (or copy the model).

How can I overcome this issue?

Hari
  • 311
  • 2
  • 6
  • 3
    Is it a requirement that the end user be able to restart the system, or are you able to provide it with sufficient battery backup and reliability to prevent the need for an end-user re-start for most likely circumstances? – Slartibartfast Nov 06 '19 at 08:37
  • 12
    I think you should reconsider your design. As you say, you do not control the disk, the end user does. Does the machine require internet access? Can you send a key at runtime to decrypt the code (instead of the disk)? – schroeder Nov 06 '19 at 09:23
  • @Slartibartfast : Ye end user will be able to restart the system – Hari Nov 06 '19 at 09:39
  • @schroeder : The machine do have internet access. However it may not have internet access all the time. So we cannot depend on sending keys at runtime. – Hari Nov 06 '19 at 09:40
  • 11
    In short, FDE is not the control that meets your needs. You need another control. – schroeder Nov 06 '19 at 09:44
  • 12
    If you are afraid that customers will take your AI model and use it in some other software, other than yours, then the capitalist solution would be to have the better software. The more realistic approach is to consider your AI model as copyrightable good, as long as it is not merely a collection of publicly available data. In any case, you are probably better of from a Law angle than a security angle. –  Nov 06 '19 at 16:01
  • 2
    This is a legal question, not a security question. – erickson Nov 06 '19 at 18:40
  • 1
    @erickson It can be both. For example, "How can I make sure contents in my mail are not read?" Can be answered legally ("It is illegal to open letters that are not your own") but that alone won't *stop* people from trying. – Captain Man Nov 06 '19 at 19:42
  • Re-examine your constraints is my best advice. If you can wiggle on one or more, the problem might be tractable. Otherwise, you're putting sensitive information into hostile territory with tissue-paper armor. – Slartibartfast Nov 06 '19 at 19:47
  • @schroeder How would you prevent an attacker (malicious customer) from spoofing the hardware and stealing the decryption key when you send it? Any authentication key you put in the device could be hijacked by the attacker. For that matter, if you're only encrypting the product data, not the whole OS, the attacker could simply modify the disk image to backdoor the network unlock component (or entire OS), boot the system, and wait for you to unlock the prize. – CBHacking Nov 07 '19 at 01:34
  • 2
    Would locking this AI model behind a web API tjat could be secured be feasible or would this software only function best on a client controlled system? That is the **only** decent solution I can see. You cannot trust the client. – zero298 Nov 07 '19 at 02:04
  • @MechMK1 : Legal option is always there. We just want to secure technically too. – Hari Nov 07 '19 at 05:33
  • @zero298 : That is an angle that we are looking at. However there are business constraints that does not allow that for now. – Hari Nov 07 '19 at 05:34
  • You can do what hardware crypto wallets do (like Trezor or Ledger) - delete the AI model if the user opens the case. But this requires rigging up some hardware to detect opening of case (or removal of hard drive). – slebetman Nov 07 '19 at 06:13
  • @CBHacking that's a risk too, but the level of sophistication for that is much, much higher than the proposed solution. I'm not suggesting silver bullets here. – schroeder Nov 07 '19 at 07:25
  • @MechMK1 DRM is the capitalist solution. – user253751 Nov 07 '19 at 14:35
  • @user253751 I'm talking about the idealist "free market" solution, in which people would stick to my product because it is the best product. DRM is quite the opposite of "free market", because for instance with DRM'd music, I would not be able to buy music in one store and listen to it on another device. –  Nov 07 '19 at 14:52
  • 1
    @MechMK1 In a free market there will be both DRMed and non-DRMed music. It's up to you to buy the non-DRMed music. Oh, and in a free market it's up to particular artists to choose whether their music is DRMed. If you want to listen to a particular artist, and they chose DRM, then your choices are to take it or leave it. – user253751 Nov 07 '19 at 16:51

5 Answers5

22

It comes to this: You want to implement a DRM scheme. Many before you have tried, all of them failed. It is not possible to give something (hardware, data) to users and prevent them from using it in an unintended way or copy it. You can make it harder, but you cannot prevent it. Others with more resources than you have tried (e.g. Sony, Microsoft, Nintendo to prevent pirating of console games) and in the end it all was broken.

Josef
  • 5,903
  • 25
  • 33
  • 5
    Not really sure this is about DRM. Most of the DRM problems come because it has to be deployed on hardware a customer owns. The OP owns the hardware/software, it just has to be deployed in a network/location that isn't 100% trusted. Sure, it won't be possible to protect it 100%, but having the proper hardware combined with good software, configuration, and monitoring may be enough to minimize the risk. – Zoredache Nov 06 '19 at 21:37
  • 4
    @Zoredache what difference does ownership make if the hardware is installed at a customer location? If you assume the customer doesn't open the hardware, its trivial to secure it. But why not just assume the customers don't steal the AI model then? – Josef Nov 06 '19 at 23:15
  • Perhaps not that different from the stand point of securing it, but IMO a pretty big difference from legal/ethical standpoint. Messing around with your own personally owned hardware is a lot different from attacking hardware you don't own, or have any permission to touch. Most DRM systems have to work on hardware someone personally owns. `If you assume the customer doesn't open the hardware` that the assumption that I believe was clear in the question, which is why I think this is not like most DRM issues. – Zoredache Nov 07 '19 at 00:28
  • 1
    @Zoredache : If there is an option to secure it even against opening the hardware, that great. However the "does not open hardware" is a acceptable if there is no other way – Hari Nov 07 '19 at 05:36
  • 4
    But if you assume the customer doesn't open the hardware you don't have to encrypt the disk. Because the customer wont open the hardware and connect the disk to another device! If you feel the need to encrypt the disk it proves that the constraint "customer doesn't open the hardware" doesn't actually exist! – Josef Nov 07 '19 at 08:48
19

According to my previous answer, suggestion by @logneck and a bit of google-fu, I came to the conclusion that it is possible to implement the scheme requested in the question with available tools and a lot of patience/expertise. This approach should satisfy the OP's request but is only a design guidance, as it has a number of drawbacks/pitfalls. It will help the OP defend from the naive attacker trying to copy data from the encrypted disk.

My previous idea to use a TPM chip remains valid. This guide, which I never tried, displays how to encrypt the disk without having to type a password and by preventing anyone to decrypt the data on another machine.

The TPM is normally embedded in the motherboard, so cannot be moved to other hardware. TPM performs hardware attestation, so that you can verify the hardware environment is compliant, and no third party card has been plugged into your mobo.

If hardware attestation passes, the TPM unlocks itself. Then LUKS module can request the TPM for the encryption key of the disk, which is stored inside the unlocked TPM.

If disk is moved to another machine, you won't have the key with you.

As described in the guide, the procss involves:

  • Installing the distro of your choice
  • Take ownership of the TPM chip using trousers and tpm-tools
  • Install trustedgrub2 and use it as bootloader
  • Add the LUKS decryption key to the TPM
  • Seal the TPM

Sealing the TPM means doing hardware attestation. The guoide I linked speaks about BIOS boot instead of UEFI (UEFI uers are familiar with the concept of secure boot). Basically trustedgrub2 will measure the software. This can be simplified by checksumming the kernel to make sure it is not altered. TPM will also measure hardware to check that no other PCI card or similar was installed since TPM was sealed.

During the boot process, if hardware has not been touched/tampered with, and the booting kernel is the same who sealed the TPM, then TPM issues the secret LUKS key to the system, so that the system can decrypt itself.

Result: 1) the disk is encrypted, 2) no password is required at boot time, so the user can reboot the machine any time, and 3) the key cannot be recovered by a user because it lives in the hardware storage.

Note that this solution, similar to what Microsoft BitLocker, is not perfect and shares the same security pitfalls as Microsoft's solution. In fact, BitLocker powered only by TPM and not PIN was repeatedly criticized for its weaker security, which I will not be discussing here.

usr-local-ΕΨΗΕΛΩΝ
  • 5,310
  • 2
  • 17
  • 35
  • 2
    I might be reading this incorrectly, but doesn't this solution mean that you are effectively locked into a specific Ubuntu version with that exact kernel, and as such can never again install security updates that affect the kernel? – Nzall Nov 06 '19 at 21:45
  • 1
    @Nzall I think so, but there might be a way to have a backup key (not shared with the client) that can re-seal the kernel after an update. – Michael Nov 06 '19 at 21:58
  • 1
    You will need to reseal the TPM after each update that affects it. From the TPM POV, a kernel update that patches a vulnerability is no different than a new kernel that added a backdoor. If you trust the live machine, you could add a hook that automatically re-sealed the TPM on every kernel update. Doing unattended updates, with machines that verify the running kernel is not without peril, though. – Ángel Nov 07 '19 at 00:30
  • This is essentially how Bitlocker-with-TPM works. The Windows Update process (any time the boot-time attestation would change) temporarily disables BitLocker (writes a key to the disk) at reboot time, and then after rebooting re-seals the TPM and removes the key. I would hope LUKS-with-TPM does the same, but I have not tried. Even if it does, if the attacker knows when this process is happening, they'd be able to interrupt the reboot and steal the key. – CBHacking Nov 07 '19 at 01:16
  • 4
    Note that this approach is going to be vulnerable to the same threats as TPM-only BitLocker. In particular, an attacker with physical access to the RAM (by tapping the pins directly, or by freezing the RAM and moving it to another machine before the state decays) ***will*** be able to extract the disk encryption key! You might be able to make the RAM tamper-resistant (and of course disable all DMA interfaces), but typical SODIMMs or similar are easy. – CBHacking Nov 07 '19 at 01:19
  • 5
    Also note that [TPM](https://dl.acm.org/citation.cfm?id=2442472) [are](https://www.sciencedirect.com/science/article/pii/S0898122112004634) [not actually that secure](https://pulsesecurity.co.nz/articles/TPM-sniffing) – Josef Nov 07 '19 at 09:42
  • My post is **freely** and **intentionally** inspired to BitLocker. I am fully aware of the security weaknesses of TPM, but at least this provides answer to the OP's question. I want to provide *guidance* to the OP, not a full out-of-the-box solution that is government-proof. Better to add clarification – usr-local-ΕΨΗΕΛΩΝ Nov 07 '19 at 16:17
  • Stressing what Josef said. TPM are not that secure. It is entirely a matter of how much gain there is in breaking your encryption. Links shared by Josef refer mostly to the tip of the iceberg: a quick look makes me think no acid or micro probes are used, nor any fancy microscope. Oh by the way, I do "secure" myself machines on customer premises using TPM ;) – user1532080 Nov 08 '19 at 06:00
  • @user1532080 for some TPM it is even possible to extract the keys without any hardware modifications in software only. But afaik the specifics aren't public yet. – Josef Nov 08 '19 at 09:57
  • This works to the extent that your BIOS vendor got their secure boot implementation right (at least, right enough that it can't be subverted without an incorrect hash being loaded into a register first). That's... not always a safe assumption. There are some very, *very* bad UEFI implementations floating around, with big-name vendors behind them. – Charles Duffy Nov 08 '19 at 18:48
0

The TPM solution above is not 100% safe (see @Josef comment). If you give the good to the customer, you should expect your data is communicated.

For a safe solution, you'll need your gizmo to phone home to a server that's under your own scrutiny.

Typically, each time your software has to compute something, it'll ask for a computation token with some way to authenticate the client. This token is generated by your server (so not under your customer hands) and it checked as valid before performing the computation.

You can use any signature scheme here, you'll need to sign the message plus a timestamp. The token is accepted if the signature match with the public key of your server and if the timestamp is not too far in the past.

If you absolutely don't want your AI model disclosed, also embed a key to decrypt the AI model file in this token (the key used for encryption must be per-device to avoid global leakage in case one is disclosed).

Ideally, upgrade your AI model from time to time and include the maximum allowed timestamp for using it. If someone succeed in capturing the "private" key to decrypt it, it'll be useless few days/week later so the damage will be limited.

Finally, you can also encrypt the classes your AI result will gives (if your AI is about classifying data), so that even with the model, it's useless/hard to make use of the model, unless you're connected to a server giving your the decryption keys.

Once you've done all of this, and spent weeks/months on this, you'll realize your competitors spent their time actually improving their products and it's now better than yours.

xryl669
  • 119
  • 2
  • If the AI model is decrypted on the device at the customers location, it's still possible to extract the unencrypted version. The only way to absolutely prevent this is to never put the model on hardware you don't control. (e.g. only compute results on your own server, located in your own facility) – Josef Nov 08 '19 at 19:55
  • You're right that there is no limit to a *unlimited power* hacker. Yet, the scheme above makes the leak much less interesting because it would be: 1. time limited (model evolves and is timestamp constrained), 2. not replicable (the effort to get one model needs to be spent again for another model), 3. not discloseable (the extracted model is linked to the hacked target, that can be blacklisted) – xryl669 Nov 12 '19 at 10:43
  • 1
    My point is that there is no way to solve the actual question. It's just not possible. What one has to do now is not to do some arbitrary stuff and say it's "good enough" but instead do a **threat modeling**. That means actually analyze in a structured way which threats exist, which can be mitigated and which residual risks are accepted. – Josef Nov 12 '19 at 10:49
  • Please notice that if you split your AI model in 2 (the last part being smaller, and only stored and run on your **secure** server), then capturing the model will be useless. – xryl669 Nov 12 '19 at 10:54
  • @Josef You are right, as stated in my first sentence. Yet, if the model is **on the device**, there is a compromise to made between having the model in clear (so it's a 2mn job to capture it) and having it encrypted with many authentication steps to get (more complex to capture but not impossible). This trade-off has to be estimated compared to the value of this model to see if it's worth it; that's my last sentence in my post. – xryl669 Nov 12 '19 at 10:57
-1

Edit: this answer is based on wrong assumption but contains interesting elements of discussion

If Linux is a requirement, you'll have a hard time.

I dont' think that with available tools (Ubuntu etc) you can implement that on the fly. LUKS helps you perform FDE.

If you could use MS Windows for your application, you have BitLocker and Secure Boot to help. Ultimately, you could hack up a host node running MS Windows and emulating/virtualizing/dockerizing your application. We are focusing on FDE.

If you have resources to implement kernel code and customize the boot process of Ubuntu, feel free to consider re-implementing your own BitLocker using a hardware token, such as TPM, along with secure boot.

I will give you an answer based on MS BitLocker. As described in the Wikipedia article, TPM serves two main purposes: hardware attestation and key management (for FDE).

The idea is to build your own (uhmmmm.....) Linux module. I don't have record of an available solution. The TPM is normally embedded in the motherboard, so cannot be moved to other hardware. TPM performs hardware attestation, so that you can verify the hardware environment is compliant, and no third party card has been plugged into your mobo.

If hardware attestation passes, the TPM unlocks itself. Then LUKS module can request the TPM for the encryption key of the disk, which is stored inside the unlocked TPM.

If disk is moved to another machine, you won't have the key with you.

Note that this solution, similar (again) to what Microsoft does with BitLocker, is not perfect and could have security pitfalls. In fact, BitLocker powered only by TPM and not PIN was repeatedly criticized for its weaker security.

usr-local-ΕΨΗΕΛΩΝ
  • 5,310
  • 2
  • 17
  • 35
  • Thanks for the answer. However linux is a "requirement". – Hari Nov 06 '19 at 09:35
  • 14
    Using the TPM to escrow the LUKS key in a secure boot environment on Linux has been possible for years. Why are you talking about writing a custom "Linux module"? – longneck Nov 06 '19 at 17:00
  • 1
    @longneck please, could you provide reference? I am not up to date with Linux – usr-local-ΕΨΗΕΛΩΝ Nov 06 '19 at 21:11
  • 3
    Note that TPM-only BitLocker+Secure Boot is not fully secure in this scenario either. TPM-only BitLocker is vulnerable to various hardware attacks; I know of no reason the same wouldn't apply to LUKS. It raises the bar for an attacker, but not beyond the capabilities of a motivated and skilled individual, and if you use removable RAM modules then I don't know any way to meaningfully mitigate such attacks. – CBHacking Nov 07 '19 at 01:25
  • https://security.stackexchange.com/questions/124338/right-way-to-use-the-tpm-for-full-disk-encryption – longneck Nov 07 '19 at 02:59
-1

The use of TPM is a good option, but assuming that doesn't work for you, you could also consider some kind if network-based unlock.

Assuming you will have some kind of remote SSH access to the machine, you can add a small SSH daemon like dropbear to the initrd image, and configure the networking in your initrd. With that in place you can SSH and provide the password/key as needed.

You could also purchase hardware with some kind of remote management controller built in. Basically giving you an IP-KVM to remotely access the machine. With that in place you can remote in and provide credentials as needed. Securing and maintaining patches for the remote management controller becomes pretty important if you take this route.

Zoredache
  • 633
  • 1
  • 6
  • 14
  • 4
    The problem with this idea is that, if the SSH server is running pre-decryption (and it must be, since the decryption key is sent via SSH), that means the server key cannot be stored under encryption. If the server key is stored in plain text, the attacker can extract the key and set up their own box in place of yours and signal that they've just rebooted, and when you SSH in and send over the encryption key they can steal it and use it to decrypt the whole disk. **In general**, there's no way to remotely authenticate hardware if the attacker physically controls that hardware. – CBHacking Nov 07 '19 at 01:28