Imagine a system architecture where an API server is able to send a request to an HSM, and the HSM is able to decrypt some data for a particular user/customer, in order to serve some hypothetical purpose. In this case, if the API server is compromised, a hacker would be able to make many requests to the HSM to decrypt all of the data for all of the customers.
Typically, people suggest configuring the HSM with something like a rate limit to reduce the damage that would come from a server compromise, but I'm curious if it is possible to take this idea a step further and implement a method for controlling the HSM that involves requests from multiple servers.
For example, if we imagine that the API servers are behind a load balancer, then would it be possible to have the load balancer send a message to the HSM to notify it that an API server will shortly be sending a request for a particular customer? In that case, the HSM would first receive a notification from the load balancer, followed soon after by a request from the API server. If the HSM could be programmed to require both of those things before decrypting anything, then a hacker that compromised only the API server (and not the load balancer as well) would not be able to steal any data at all unless a legitimate request from a particular customer came in. This kind of "pseudo-quorum" architecture seems like it would be very secure to me, but I'm not an expert so there could easily be something obvious that I'm missing. Would this actually be a secure architecture for an API? And are there any HSMs that can be programmed to do something like this?