How we can PoC our product that needs training data for our machine learning algorithm of the the bank clients?

Question

Recently, we have had a challenge with potential future clients which are the bank. Our product requires to gather static data (e.g. address, loans, last 50 transactions, etc) of banks clients. These banks do their PoC in the public cloud, the banks are OK us to work on their data which is not sensitive like clients address but they are not allowing to have data like previous loans as this classified as sensitive. We are stuck at this time and don't know how to convince or offer the right way for banks to provide us the data for the ML algorithm.

The solution lies in the direction of anonymizing the data so that you can't connect sensitive data to a person's identity. For PoC do you need the address? Address can be used to identify a person or group of people so it shouldn't be combined with loan history and exposed to a third-party...like yourself. — DarkMatter, Apr 03 '19 at 16:29
Do you need the precise address? What if the addresses were fuzzed into a particular zip code or something? — DarkMatter, Apr 03 '19 at 17:29

score 2 · Answer 1 · edited Jun 16 '20 at 09:49

2

This is not just an issue with 3rd parties - banks have strict regulations about what PII they can use in development environments if controls are not up to the same strictness as production environment.

The usual route is anonymisation or pseudo-anonymisation. From https://gdpr.report/news/2017/11/07/data-masking-anonymisation-pseudonymisation/:

With anonymisation, the data is scrubbed for any information that may serve as an identifier of a data subject. Pseudonymisation does not remove all identifying information from the data but merely reduces the linkability of a dataset with the original identity of an individual (e.g., via an encryption scheme).

Both pseudonymisation and anonymization are encouraged in the GDPR and enable its constraints to be met. These techniques should, therefore, be generalised and recurring. Those in possession of personal data should implement one or other of these techniques to minimise risk, and automation can reduce the cost of compliance.

If those are not possible, have you looked at allowing your learning phase to be run within the bank's secure testing/development environment? Worth considering it.

edited Jun 16 '20 at 09:49

Community

1

answered Apr 03 '19 at 18:52

Rory Alsop

61,367
12
115
320

It does not matter for companies like banks on what environment the algo will run, as long as you don't have the client consent you can't do it anywhere per my understading. So doing it internally won't help. – Filipon Apr 04 '19 at 08:58
That is incorrect Filopn - you do not need consent for everything, especially if it is to deliver an expected service. GDPR explains this quite well. – Rory Alsop Apr 04 '19 at 09:22
Can you help me by pointing out for such statement in GDPR or others where it's OK with using clients data for training purposes when sharing with vendor/product that using dataset to our machine learning algorithms – Filipon Apr 05 '19 at 02:39
For example, banks deliver banking services to customers. As GDPR states, they are unlikely to need consent when delivering to this. GDPR often doesn't require consent. Have a look at https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/lawful-basis-for-processing/consent/ – Rory Alsop Apr 05 '19 at 07:45
This is not really helping unfortunately as this for banks internally. and not banks that can use TPP/startup like us to use their data to be used as our dataset. – Filipon Apr 05 '19 at 10:34
GDPR is also designed for people to deposit information intended for one single party which is the bank in this case, but question how we are going to use the data as a vendor trying to provide product to bank for their own benefits – Filipon Apr 05 '19 at 10:52

How we can PoC our product that needs training data for our machine learning algorithm of the the bank clients?

1 Answers1