Is it okay to store this health data in a public cloud?

Question

Is it okay to store relative health data in the public cloud, accessible by all users of my mobile app? Put another way, is this not considered PHI?

For example:

Bob has a resting heart rate of 40 BPM. When he did a plank for 3 minutes, his heart rate rose to 80 BPM. The percentage of which his heart rate increased (100%) is stored in the public cloud, while his actual heart rates are stored in an HIPAA-compliant private cloud.
Joe has a resting heart rate of 70 BPM. When he did a plank for 3 minutes, his heart rate rose to 100 BPM. The percentage of which his heart rate increased (~42%) is stored in the public cloud, while his actual heart rates are stored in an HIPAA-compliant private cloud.

Another example:

Bob has asthma. Bob's best peak flow is 700 L/min. This morning, Bob took his peak flow, and he got 660 L/min. The percentage of how close he got to his best peak flow (~94%) and his location¹ are stored in the public cloud, while his actual peak flows are stored in an HIPAA-compliant private cloud.

All data in the public cloud will be stored anonymously. Bob's name and its association with his heart rate rising by 100% and location are not stored.

Is this kind of data allowed to be stored and shared with other users? The idea is to see if those in your general location are experiencing the same percentage increase/decrease.

^{1_{First 3 digits of the zip code if population is greater than 20,000?}}

Sure, you've added a few technical and possibly security related keywords, but as *dotproi* accurately instructed, this is a forum for security related topics and what you need is a lawyer. You question itself asks `Is this kind of data allowed to be ...`. Now, instead of asking people in a general location if they are experiencing the same percentage of increase/decrease, why not simply calculate it on the backend, figure out if it is true and inform users. You might want to rethink the design. — theabhinavdas, Aug 22 '16 at 17:52
@0x23212f Yes, that's the idea. The percentage is uploaded and then the users get back if others are experiencing the same. — tktsubota, Aug 22 '16 at 17:55

score 2 · Accepted Answer · edited Jun 16 '20 at 09:49

Might be OK. Some due diligence is required, and you might have to hire a statistician, and will need signoff from your legal dept. Here's the rule:

De-Identified Information

De-identified data (e.g., aggregate statistical data or data stripped of individual identifiers) require no individual privacy protections and are not covered by the Privacy Rule. De-identifying can be conducted through

statistical de-identification --- a properly qualified statistician using accepted analytic techniques concludes the risk is substantially limited that the information might be used, alone or in combination with other reasonably available information, to identify the subject of the information [45 CFR § 164.514(b)]; or the

Safe-harbor method --- a covered entity or its business associate de-identifies information by removing 18 identifiers (Box 2) and the covered entity does not have actual knowledge that the remaining information can be used alone or in combination with other data to identify the subject [45 CFR § 164.514(b)].

Aria · Answer 2 · 2016-08-22T17:45:23.087

The problem with sharing health data is very concrete problem.

First of all, if the data isn't identifiable to particular person and location then yes it is OK to do it.

If the sharing is based on the location, I'd make the region much wider and make it a bit more fuzzy, so instead of City range, I would do State range and include also people from other states on random basis.

The way it is "fuzzied" might be a bit tricky. So for example person from San Francisco can be stored as person from San Jose. This method is used very often. It is called Bloom Filter however other methods can be used.

Another thing is storing the location. When using geospartial information one can fuzzy it with adding random numbers but regarding health the cities, states, countries would be much better and for that Bloom Filter is better is applied well.

This works the way that the relevant information is stored one way, so it's possible to tell if the person is from SF but not for sure.

When the location is fuzzy enough, then it's OK to include people of similar age and sex and this would give very good accuracy while making data anonymous.

The set of data which are people of the same sex, age, state and some from the same country can be then processed to remove any bogus info (problem with measurement) thus making it even more accurate and this way it can raise the alarm if something is not right. Another thing is that sharing the data of oneself and comparing with other data can also help with making sure the measurements are being taken OK and that device and software is used in the right way.

This sounds to me like the location data is more sensitive than my main concern. Is that right? In the worst case I can follow the HIPAA rule of using the first 3 digits of the zip code if the population covering that area is greater than 20,000. — tktsubota, Aug 22 '16 at 18:14

score 0 · Answer 3 · answered Aug 22 '16 at 16:52

0

Assuming you are based in the US or work and/or are associated to american institutions in general, you should take a close look at HIPAA.

Assuming the above this link might be of assistance.

answered Aug 22 '16 at 16:52

dotproi

346
1
5

I have looked at the HIPAA (see the last link in my question), but I couldn't find any information about this _relative_ sort of data. – tktsubota Aug 22 '16 at 17:00
1

I see. My apologies. Even if there is a tangencial infosec component to this question I feel that most of your inquiries might be best addressed by legal minds. Have you considered migrating this particular question to the "Law" stack exchange site? law.stackexchange.com – dotproi Aug 22 '16 at 17:05
That might be the better place, but I decided to ask here because I saw other questions like this and there was even a tag here for HIPAA. Thanks for letting me know, though. – tktsubota Aug 22 '16 at 18:07

score 0 · Answer 4 · answered Aug 22 '16 at 19:30

The crux of the question is about relative measures, and that would make me uncomfortable. I can measure someone's resting heart rate without them knowing, for instance, so the relative number is not necessarily a form of fuzzying. Similarly, if the relative data can be combined with other information (service provider, type of phone, association with a particular healthcare provider), its being relative may not be of any benefit.

Aggregating by truncated zipcode may provide useful fuzziness, though are you sure you have enough entries for each 3-digit area?

Is it okay to store this health data in a public cloud?

4 Answers4