Examples of custom key restriction policies for HSMs

Question

Based on my understanding, some types of HSMs allow custom software to be developed that can run within the HSM's secure enclave. This capability allowed Square to do something very clever to boost the security of their cryptocurrency cold wallets.

In a nutshell, the HSMs that control the private keys for their cryptocurrency cold wallets were programmed to never sign any cryptocurrency transaction, unless the destination wallet of that transaction was one of the hot wallets owned by Square. This means that even if a hacker were to compromise a server with access to the HSMs, it would still be impossible to steal all of the cryptocurrency from the cold wallets. The most that the hacker could accomplish would be to transfer some or all of the cryptocurrency to a hot wallet. The hacker would have to compromise the hot wallet as well in order to successfully steal the cryptocurrency.

As far as I can tell, this level of security can only be achieved by developing some sort of custom code that understands the details of a particular cryptographic system. Does anyone know of any other examples of application-specific protections that can be implemented as custom software within an HSM?

I've spent quite a bit of time searching on Google, DuckDuckGo and so on, and haven't found anything so far. Any pointers to blog posts, papers or books would be appreciated. Thanks!

rip... · Accepted Answer · 2022-03-30T17:10:17.213

There are a couple of enterprise GP HSMs that provide for the capability of running use-case specific, custom firmware inside the hardware of the HSM.

For my street cred: I'm the principal member of technical staff in the Americas for this use-case, at one of the major HSM vendors. If you contact us commercially, it will end up on my desk. Not gonna straight up tell you which, you should do your own due-diligence.

One vendor will do the work for you, as professional services, the other will sell you the SDK and give you training and then (if that is what you want), walk away. The second ensures that any IP generated by you for your use-case does not end up in mainline branch and end up being sold to everyone who buys the HSM.

There will also be licensing issues, e.g., is there a per-HSM fee to pay to run your custom firmware?

Which way you would choose to go would be relevant to your corporate needs/requirements/cash flow/risk assessments, etc. Call these your 'non-functional requirements'.

So.

There are two use-cases where custom firmware is interesting. Workflow, and Crypto.

Workflow: When you want to do a complicated series of cryptographic steps (derive a key, generate a hash from some data, encrypt that data with the key, wrap the derived key with a public key of the recipient, containerize all that plus other metadata which might or might not require security ... each of those steps is a call to the HSM. Workflowing it turns all that into a single call to the HSM, ie pack all the discrete inputs into one message, and get back your container at the end. See below for the benefits.
Crypto: GP HSM vendors try to pack all the "modern" "common" crypto algorithms, mechanisms, etc into their HSMs, but they will never be able to provide all of it. There might be regional algorithms (South Korea, France, China...) that it doesn't make sense for the vendor to support as the market is too small and won't cover the development costs, or other reasons. There might also be "new" crypto that is * cough * ahead of the curve -- the various PQ algorithms come to mind, but also things like ecvrf-edwards25519-sha512-elligator2 or hash-to-curve, which are ietf draft status, but maybe you want to take advantage of them now, rather than after they've been finalized and then later implemented by the vendor.

Now you know what you can do, what are the benefits? As the person controlling the custom firmware payload:

First, performance. When you combine that workflow above into a single command to the HSM (send all the various inputs, get back the secure transport container), you cut out all the network and stack latency. Six or seven calls (sometimes with duplicated inputs) drops down to 1 call. The threading model is easier to understand and (host-side application) programming is easier.
Second, security. In that workflow above, there are certain transient artifacts, that when implemented in-HSM, never leave the HSM. The security around those artifacts is controlled by you, on a system that the malware can't get to. Malware can't observe the derived key (yes, I'm aware that this derived key is probably encrypted so the malware won't see the key material, but it does open up analysis-based side channel attacks, ie, kind of key, key size, etc).
Third, functionality: When you add ecvrf or hash-to-curve/test-and-increment, or Walnut or ..., because the vendor doesn't support it out of the box.

You could also target additional security, such as in your Square case, above.

Using custom firmware, it is possible to implement what I call a "policy engine". Let's say that your HSM admins have configured the HSM to allow you, as the credentialed user(s) to use a key to sign something. That's straightforward use-case.

But what if your key should only be used from a certain location (well, IP address/range), or during a certain time window (from 8-noon, M, W, F). Or maybe you are limited to how many times you can use the key (if you are in EMEA, this is relevant to EIDAS and signature activation modules). There are other 'claims' (think oauth, and the Square use case) that might need to be checked/validated/considered, before you are allowed to use that key.

A policy engine can be implemented as part of the workflow, however there are some constraints that make this slightly more interesting from a code-monkey point of view (ie, insert code-level/OS-neepery jargon-laden commentary here).

One other thing to mention is that it is absolutely possible to combine "workflow" with "crypto". They are orthogonal to each other. "My workflow contains custom crypto" is perfectly legit.

Downside:

As soon as you insert your custom firmware, you've changed the environment and the HSM can no longer run in FIPS validated mode. At most you might get "FIPS--Restrictions Applied" but you can't make the 'certified' claim, only 'capable'. It is certainly possible to recertify the module with your code in it -- $$$. You'll need to work with both the HSM vendor, and the certification lab, ie $$$ to both. The vendor will be able to quote that for you. So, again, non-functional requirements will decide.

Happy to address any questions you might have.

Follow up, in response to a question in the comments

So my question would be: is this a common strategy for organizations that use HSMs? Sending challenges out to some external database or API that can possibly identify a bad/fraudulent request to the HSM? What kinds of data might be retrieved from what kinds of databases for applications outside of cryptocurrency?

Actually, that's ... bad.

The HSM should not contain any code that would allow it to "call out to ...". HSMs should only react to an incoming request, perform the requirements of the request, and return the result.

The issue with an HSM calling out to something means that rather than attacking the HSM to get something, maybe now you can attack a resource that the HSM is using. Your HSM is now held hostage by the security of those other systems.

It might be that what they are describing is a secure server that does all the 'other' work, and then uses an HSM (network attached, or directly connected PCIe) for either standard, or custom requirements within their process. So the description you read elsewhere may be overly-simplified a bit, and treats someplace higher in the stack as the "HSM" (ie, a lot of people point at a network appliance form-factor HSM and say "HSM" when really the HSM is only the PCIe card inside the network appliance.)

But the rest of the question isn't related to the HSM - It's related to credential and attack surface management.

The instances I'm familiar with, where the HSM is doing policy checks, the HSM "database" of relevant info was stored internally, on the HSM. Updating the policy engine configuration and internal check data required 6-eyes separation of control. It was implemented that way on purpose. And, since they controlled all of the config, the data and the firmware module code, they were reasonably confident in their security around it.

Keep in mind that this was after the "higher level stack" applications had already done their own due-diligence. Even if the apps were cracked, and someone gained access to the HSM capabilities it was requesting, the HSM would still provide that last wall of defense.

But, I digress.

So my question would be: is this a common strategy for organizations that use HSMs? Sending challenges out to some external database or API that can possibly identify a bad/fraudulent request to the HSM? What kinds of data might be retrieved from what kinds of databases for applications outside of cryptocurrency?

If this were the HSM, no it is not common.

If it were someplace higher in the stack, it might be common, and it might not be.

But whether you want to implement it locally, remember: "Non-functional requirements", ie, What do you need.

Which of that can be managed on the HSM, what data can be used to identify fraudulent use, and what steps/policies (dual control, MFA, etc) are required to manage/make use of it?

It eventually comes down to Convenience vs. Security.

Thanks so much for the incredibly detailed and informative answer! This is honestly one of the best answers that I've ever seen on any of the Stack Exchange sites. Anyway, I found another [blog post about securing hot wallets with HSMs](https://www.ledger.com/how-to-properly-secure-cryptocurrencies-exchanges), and it describes an architecture where the HSM sends out challenges to other systems before signing transactions. Specifically, one challenge is sent to the system that keeps track of each user's balance, and another challenge is sent to a 2FA app on the user's phone. — bnsmith, Jan 09 '22 at 04:23
So my question would be: is this a common strategy for organizations that use HSMs? Sending challenges out to some external database or API that can possibly identify a bad/fraudulent request to the HSM? What kinds of data might be retrieved from what kinds of databases for applications outside of cryptocurrency? — bnsmith, Jan 09 '22 at 04:47
I've spent some time thinking about it, and I think you're probably right about the diagram from the blog being oversimplified and misleading. The diagram has a user's mobile phone connecting directly to the inside of the HSM for 2FA. Does that mean that the secure enclave is directly connected to the open Internet so that it can form an HTTP connection to the phone of each of the exchange's users? I'd say, probably not. At least, I sure hope not! :) — bnsmith, Jan 11 '22 at 01:46
Perhaps a more accurate description of the workflow would be for a separate 2FA server to ask the HSM for an encrypted challenge to send to the user's phone, after which the user would approve the transaction, causing the challenge to be encrypted such that only the HSM can decrypt it. Then the 2FA server would send the encrypted data that it receives from the phone into the HSM, which would then sign the cryptocurrency transaction if it's all valid. — bnsmith, Jan 11 '22 at 01:52
The more complicated part of the design is the "exchange business logic" part of the diagram, that apparently checks the user's balances, withdrawal limits, and so on. They do mention that this part of the system is apparently vulnerable to attack: "Still, injecting false market data into the HSM would require the attacker to proceed carefully — if the HSM fails its periodic consistency checks, it will shut down the signing plugin until reactivated by an administrator." — bnsmith, Jan 11 '22 at 02:02
I'm curious about exactly what "consistency checks" they perform inside the HSM, but I suppose that they probably aren't going to just reveal all of that in a blog post. They've got to keep some trade secrets or nobody will buy their product! — bnsmith, Jan 11 '22 at 02:11
Yes, someplace in the stack a service will request a challenge from the HSM, it will provide the challenge to the 2FA device, which will sign it. The HSM should have the 2FA public key inside -- no point in accepting it from the service, since if the service is cracked, it could sign the challenge itself and provide that public key to the HSM. — rip..., Jan 13 '22 at 02:39
"Exchange business logic" can be seen as "register math". The registers for the wallet (balance, accumulator, decrementer registers) are a single, encrypted blob (and that blob may also include the public key for the 2FA). The blob and the signed challenge are handed to the HSM, which decrypts the blob (some other key), extracts the public key, validates the signed challenge, and if good, manipulates the registers based on what the operation is. Reencrypt the new register values and spit it back out as the account. Service stores that. — rip..., Jan 13 '22 at 02:41
That (register math) is similar to what the USPS uses for 3d party IBI (digital stamps) providers' customers, to keep account balances etc. Anyway: consistency checks are built into the register values (cf homomorphic encryption as a different way to do this), but also there are the HSM's own internal, periodic self-tests. Failing those would cause the HSM to halt right there and stop servicing requests. Lots of stuff going on. — rip..., Jan 13 '22 at 02:43

Examples of custom key restriction policies for HSMs

1 Answers1

Follow up, in response to a question in the comments

Linked