How does TrueVault's de-identification process work?

Question

I'm thinking of using TrueVault, but I'm not entirely sure about the sequence of events involved in de-identifying data and re-identifying it. More info here. Here is the process, as I understand it, but I don't know exactly in what order:

PHI is stored in a HIPAA-compliant TrueVault server. TrueVault returns an opaque identifier that can be stored alongside de-identified data in AWS.
To fetch this data, the data is returned from two different sources: the TrueVault server and AWS, where analyses on the data-sets are being performed. Does this happen simultaneously? What happens first?
To pull de-identified, a request is made to the AWS server – which would also return the opaque identifier. Then, a request would be made to the TrueVault server to identify the user associated with the opaque identifier. Is this the right order of events?
The data would be re-identified in the client application when requested by an authenticated user.

What is the sequence of events here? Is PHI data first sent to the TV server, and then an opaque identifier is sent to AWS along with the separated data? What information does the client application have at any one time? Clarity on the process would be appreciated.

score 2 · Accepted Answer · edited Jun 16 '20 at 09:49

TrueVault recommends doing data de-identification and re-identification client side (within your web or mobile app logic). This makes two interesting flows:

De-Identification

Data is entered into your application by the user.
Your application (e.g. JavaScript code in a browser or Swift code on an iOS device) will split that data into what is identifying (names, phone numbers, etc) and what is not identifying (other health data).
The identifying information is sent directly to TrueVault from the client application through the TrueVault API. Ideally the end-user is authenticated as a TrueVault user, so the API permissions can be granularly defined. The API returns an opaque identifier that can be associated with the de-identified data and stored on your server.
The de-identified health data (along with the TrueVault id) is sent to your server. Your server is not handling identifying data & health data, so it doesn't fall under the purview of HIPAA regulations.

Re-identification

Let's say your end-users want to search for data based on criteria stored on your server (e.g. blood pressure) but they want to see identifying data in the result set. This re-identification should happen on the client. Something like:

Client device makes a request to your server to retrieve information matching the query.
Client device pulls TrueVault ids out of the server results. These ids are opaque, they are not identifying, but they can be used to retrieve identifying data from the TrueVault API (if properly authenticated).
Client device requests identifying data from TrueVault API corresponding to the TrueVault ids returned by your server. Ideally the end-user is authenticated as a TrueVault user, so the API permissions can be granularly defined.
Client device merges the datasets so the UI shows a re-identified view of the data.

Compliance

Following these steps, you can create a seamless user experience. There's no reason the end-user should know that you're de-identifying and re-identifying data.

This process is designed to keep your server infrastructure free from HIPAA Regulations, so you can host this de-identified data in a non-compliant manner if you chose. This is all above-board with HHS, they even have specific guidance on it: https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html.

By avoiding storing Identified PHI on your servers, you can skip the physical and technical safeguards that apply to server-side infrastructure. Please understand that you don't escape the administrative requirements of HIPAA entirely. For example, in the system described above there are still human beings looking at Identified PHI. Those humans must be trained and their access must be limited to the minimal amount needed for them to perform their jobs. All of this work is a hard requirement if you're making an application that must be HIPAA Compliant, whether you use TrueVault or not. What TrueVault saves you is all of the work to have secure and compliant infrastructure for storing data, maintaining audit logs, secure backups, high availability, controlling access, managing users, and lots more.

How does TrueVault's de-identification process work?

1 Answers1

De-Identification

Re-identification

Compliance