There are a couple of enterprise GP HSMs that provide for the capability of running use-case specific, custom firmware inside the hardware of the HSM.
For my street cred: I'm the principal member of technical staff in the Americas for this use-case, at one of the major HSM vendors. If you contact us commercially, it will end up on my desk. Not gonna straight up tell you which, you should do your own due-diligence.
One vendor will do the work for you, as professional services, the other will sell you the SDK and give you training and then (if that is what you want), walk away. The second ensures that any IP generated by you for your use-case does not end up in mainline branch and end up being sold to everyone who buys the HSM.
There will also be licensing issues, e.g., is there a per-HSM fee to pay to run your custom firmware?
Which way you would choose to go would be relevant to your corporate needs/requirements/cash flow/risk assessments, etc. Call these your 'non-functional requirements'.
So.
There are two use-cases where custom firmware is interesting. Workflow, and Crypto.
Workflow: When you want to do a complicated series of cryptographic steps (derive a key, generate a hash from some data, encrypt that data with the key, wrap the derived key with a public key of the recipient, containerize all that plus other metadata which might or might not require security ... each of those steps is a call to the HSM. Workflowing it turns all that into a single call to the HSM, ie pack all the discrete inputs into one message, and get back your container at the end. See below for the benefits.
Crypto: GP HSM vendors try to pack all the "modern" "common" crypto algorithms, mechanisms, etc into their HSMs, but they will never be able to provide all of it. There might be regional algorithms (South Korea, France, China...) that it doesn't make sense for the vendor to support as the market is too small and won't cover the development costs, or other reasons. There might also be "new" crypto that is * cough * ahead of the curve -- the various PQ algorithms come to mind, but also things like ecvrf-edwards25519-sha512-elligator2 or hash-to-curve, which are ietf draft status, but maybe you want to take advantage of them now, rather than after they've been finalized and then later implemented by the vendor.
Now you know what you can do, what are the benefits? As the person controlling the custom firmware payload:
First, performance. When you combine that workflow above into a single command to the HSM (send all the various inputs, get back the secure transport container), you cut out all the network and stack latency. Six or seven calls (sometimes with duplicated inputs) drops down to 1 call. The threading model is easier to understand and (host-side application) programming is easier.
Second, security. In that workflow above, there are certain transient artifacts, that when implemented in-HSM, never leave the HSM. The security around those artifacts is controlled by you, on a system that the malware can't get to. Malware can't observe the derived key (yes, I'm aware that this derived key is probably encrypted so the malware won't see the key material, but it does open up analysis-based side channel attacks, ie, kind of key, key size, etc).
Third, functionality: When you add ecvrf or hash-to-curve/test-and-increment, or Walnut or ..., because the vendor doesn't support it out of the box.
You could also target additional security, such as in your Square case, above.
Using custom firmware, it is possible to implement what I call a "policy engine". Let's say that your HSM admins have configured the HSM to allow you, as the credentialed user(s) to use a key to sign something. That's straightforward use-case.
But what if your key should only be used from a certain location (well, IP address/range), or during a certain time window (from 8-noon, M, W, F). Or maybe you are limited to how many times you can use the key (if you are in EMEA, this is relevant to EIDAS and signature activation modules). There are other 'claims' (think oauth, and the Square use case) that might need to be checked/validated/considered, before you are allowed to use that key.
A policy engine can be implemented as part of the workflow, however there are some constraints that make this slightly more interesting from a code-monkey point of view (ie, insert code-level/OS-neepery jargon-laden commentary here).
One other thing to mention is that it is absolutely possible to combine "workflow" with "crypto". They are orthogonal to each other. "My workflow contains custom crypto" is perfectly legit.
Downside:
As soon as you insert your custom firmware, you've changed the environment and the HSM can no longer run in FIPS validated mode. At most you might get "FIPS--Restrictions Applied" but you can't make the 'certified' claim, only 'capable'. It is certainly possible to recertify the module with your code in it -- $$$. You'll need to work with both the HSM vendor, and the certification lab, ie $$$ to both. The vendor will be able to quote that for you. So, again, non-functional requirements will decide.
Happy to address any questions you might have.
Follow up, in response to a question in the comments
So my question would be: is this a common strategy for organizations that use HSMs? Sending challenges out to some external database or API that can possibly identify a bad/fraudulent request to the HSM? What kinds of data might be retrieved from what kinds of databases for applications outside of cryptocurrency?
Actually, that's ... bad.
The HSM should not contain any code that would allow it to "call out to ...". HSMs should only react to an incoming request, perform the requirements of the request, and return the result.
The issue with an HSM calling out to something means that rather than attacking the HSM to get something, maybe now you can attack a resource that the HSM is using. Your HSM is now held hostage by the security of those other systems.
It might be that what they are describing is a secure server that does all the 'other' work, and then uses an HSM (network attached, or directly connected PCIe) for either standard, or custom requirements within their process. So the description you read elsewhere may be overly-simplified a bit, and treats someplace higher in the stack as the "HSM" (ie, a lot of people point at a network appliance form-factor HSM and say "HSM" when really the HSM is only the PCIe card inside the network appliance.)
But the rest of the question isn't related to the HSM - It's related to credential and attack surface management.
The instances I'm familiar with, where the HSM is doing policy checks, the HSM "database" of relevant info was stored internally, on the HSM. Updating the policy engine configuration and internal check data required 6-eyes separation of control. It was implemented that way on purpose. And, since they controlled all of the config, the data and the firmware module code, they were reasonably confident in their security around it.
Keep in mind that this was after the "higher level stack" applications had already done their own due-diligence. Even if the apps were cracked, and someone gained access to the HSM capabilities it was requesting, the HSM would still provide that last wall of defense.
But, I digress.
So my question would be: is this a common strategy for organizations that use HSMs? Sending challenges out to some external database or API that can possibly identify a bad/fraudulent request to the HSM? What kinds of data might be retrieved from what kinds of databases for applications outside of cryptocurrency?
If this were the HSM, no it is not common.
If it were someplace higher in the stack, it might be common, and it might not be.
But whether you want to implement it locally, remember: "Non-functional requirements", ie, What do you need.
Which of that can be managed on the HSM, what data can be used to identify fraudulent use, and what steps/policies (dual control, MFA, etc) are required to manage/make use of it?
It eventually comes down to Convenience vs. Security.