Proper credit card encryption for use in a blacklist

Question

I have a general encryption question. I have read through many of the encryption related questions here and I can't find any specifically addressing my concern.

This is the hypothetical scenario:

I am storing credit card numbers in a database
The numbers are either encrypted or hashed
A public facing application checks if customer requests contain credit card information on a blacklist
An attacker has compromised the database and the ability to add credit cards to the blacklist.

It may seems unrealistic that an attacker could add credit cards without knowing the encryption key, but one scenario is that they make a request to the public facing application which appears to be fraud and will get the credit card automatically blacklisted by the application which knows the encryption key (This is a huge simplification of reality).

I want a solution which provides the following:
1) Insert a blacklisted credit card with O(log(n)) work or less
2) Check if a credit card is on the blacklist with O(log(n)) work or less. For example a btree index can provide O(log(n)) lookup work.
3) Have the credit card numbers secured with either encryption or a hashing function so that if the data is compromised the numbers will not be usable.
4) The attacker is unable to check if the card is on the list, even though they can insert values and can see the encrypted/hashed values.

My question is closely related to Hashing a credit card number for use as a fingerprint but the selected answer says "When a new card comes in, we look it up by comparing it to a hashed + salted column. If it matches that existing column we know we can return the same unique number identifier". This solution is not acceptable as this would require an O(n) lookup time. In other words I would need to check every row in the blacklist to see if the number is on the list.

First proposed solution : Typical lazy programmer answer
Don't encrypt. These are just bad fraudulent numbers anyway.
Failure: Just because we think these are bad customer's does not mean we should make the numbers public. It also does not mean we correctly blacklist 100% of the time and furthermore, even if a customer is bad and committing fraud we still don't have the right to make their number public.

Second proposed solution : Hash the credit cards without salt
This provides quick insertion and quick lookup. Simply hash the card number again and check if the hashed value is on the list.
Failure: The problem is that the attacker can brute force credit cards by simply hashing random card numbers and checking if the value is on the list. This is a problem even with a slow hashing algorithm because the space of valid card numbers is low and each hash checks against potentially millions of rows on the blacklist (Remember this is a hypothetical situation. I am not actually storing millions of credit card numbers).

Third proposed solution: Hash with a unique salt for each row
This solution can be provided easily with crypt(3) or something similar. It seamlessly stores the salt in the hashed value. Now if the attacker tries to brute force numbers they will have to also brute force the salt. This makes the attack infeasible. Failure: Now performing a blacklist lookup takes O(n) work. We need to call the slow hashing function on each row and the performance becomes unacceptable.

Fourth proposed solution : Hash with a global salt stored outside the database (HMAC)
Now the attacker needs to use the public facing api to perform a hashing operation instead of being able to perform millions of offline hashes per second. The reason they can not perform an offline attack is that the global salt stored outside the database is long enough that the salt can not be brute forced.
Failure: There is still the fact that an insertion checks against potentially millions of existing rows and the credit card state space is small. The attacker can perform 1000's of requests a day and log the ones which resulted as a duplicate in the database. The duplicates are credit card numbers which were already in the blacklist

Fifth proposed solution: Security through obscurity
Failure: This is not real security. It is tempting, but with the assumption that the attacker has compromised the database there is a real chance they are an admin internal to the company and have access to whatever solution and algorithm we have implemented.

Sixth proposed solution: Make another smaller table.
When blacklisting a number store the hash of the full number with salt in table 1 and the hash of the last 4 digits in table 2 without salt (with duplicates removed). When checking if a number is blacklisted check table 2. In most cases there will not be a hit and the check is quick. In rare cases there will be a hit and then do a slow check over table 1.
Failure: If I am storing thousands of records there is a very likely chance that the last 4 digits exist on this list. 4 digits is 10000 unique combinations, and with 10000 card entries there a large chance there will be a hit resulting in a slow check. Further, the attacker will know that entries in table 2 will have at least one match in table 1. They can brute table 1 with only 10000 requests and now they have significantly reduced the search space for table 1. The post quoted 100,000,000 as the likely size of the possible credit card number space. Table 2 would effectively reduce this space to 100,000,000/10,000 = 10,000. This means the time to reverse one hash would be roughly the time it takes my application to do the check over table 1 (10,000 rows in table 1 would mean 10,000 slow hashes and a brute force would also be 10,000 slow hashes)

All the solutions also apply to encryption instead of hashing. The benefit of encryption is that now the attacker has to do their attacks online to the public facing application. This still does not solve the problem as pointed out in proposed solution 4. Further more the risk of encryption, instead of hashing, is that if the encryption key is compromised the attacker has all the plain text values right away. At least with hashing they would only be able to brute force some of the card numbers on the list.

I have also looked into tokenization https://securosis.com/assets/library/reports/Securosis_Understanding_DBEncryption.V_.1_.pdf and the same problem exists, just in duplicate tokens instead of duplicate hashes.

Please note I want something stronger than PCI-DSS compliance. Solution 2 is technically PCI_DSS compliant because a salt is not required https://www.pcisecuritystandards.org/documents/PCI_DSS_v3.pdf (PDF link)

Sorry for the long post. Does anyone know if this is possible?

Whatever you choose, remember that you at LEAST have to meet PCI-DSS standards. So #1 and #5 are out. — schroeder, Feb 03 '15 at 17:59
Even hashing credit card numbers is problematic. Given the restricted nature of valid card numbers, it's likely that there's only one valid number which hashes to a given hash. — Bobson, May 29 '15 at 14:23
If the attacker knows the lookup algorithm then there is nothing you can do to prevent them from brute-forcing the list in (number of CCNs × average lookup time) parallel time. For example, your sixth solution (another smaller table) buys you nothing because attackers also benefit from the small table. (In fact they get a larger speedup from it than you do because they only have to hash each truncated CCN once.) Your third solution (salted hashes), if it was fast enough, would have the same problem. — benrg, Mar 09 '16 at 22:03
Like @bbozo said, if someone has write access to the database, you have deeper problems to care for... — ThoriumBR, Mar 14 '17 at 13:39
@Bobson bcrypt( "table UUID as salt", "First Last Name", "Expire", "CC#" ) ... should provide enough entropy — CaffeineAddiction, Mar 14 '17 at 13:42
@Andrew ... food for thought, credit card numbers eventually get recycled. You will need to either `Card Holder` to your hash ... or have a timeout period in which the hash is eventually removed from the db — CaffeineAddiction, Mar 14 '17 at 13:45
@CaffeineAddiction Not if they're stored as plaintext in the table linked to the hash. That just expands the salt - it doesn't add any additional entropy. You still only have 10^9 possible hashes, which is 8 times as easy to brute force as a 7-character all lowercase password (26^7). — Bobson, Mar 14 '17 at 14:55
@Bobson but why would you need to store it as text in the table linked to the hash ... its a stolen CC do you really need plaintext of who it was stolen from sitting next to it? I mean its not like you are intending on running the card so all you need is the info to filter it. — CaffeineAddiction, Mar 14 '17 at 15:00
@CaffeineAddiction see "failure" under first proposed solution. — Andrew, Mar 15 '17 at 08:56
@ThoriumBR This question is about security in layers. The reasoning "You have a bigger problem" (therefore ignore this one) is dangerous in the field of security. We don't leave card numbers in plain text with the reasoning "Someone getting access to your db is a bigger problem, so don't worry about encryption". — Andrew, Mar 15 '17 at 08:58
@Andrew I believe you misunderstood, i was not recommending storing anything in clear text, but rather grabbing `table UUID as salt`, `First Last Name`, `Expire", "CC#` and making one big hash ... being as how this would give you more entropy than just the `CC#` and none of the info would be in the filter db in any type of reversible fashion. It would be useful for filtering ... and thats about it. — CaffeineAddiction, Mar 15 '17 at 15:35
@CaffeineAddiction It seems I did misunderstand you. The reason I can not make one big hash of this information is that this information (full name) is not available when checking the blacklist. Additionally it is a requirement to not allow use of the cc number even if the name (etc.) is different. — Andrew, Mar 15 '17 at 17:07
You can improve method 6, instead of storing against 4 digit PIN, you can instead store against truncated hash, to create a [bloom filter](https://en.m.wikipedia.org/wiki/Bloom_filter). This gives you three things: 1) fast negative check, most entries in the filter will be unset, which implies that the card is valid, 2) flexibility on the length of the hash that you store, 3) knowing where a number is in the first table won't help in cracking the second table, which are salted individually and with higher iteration count and doesn't all share the same four digits. — Lie Ryan, Mar 16 '17 at 01:48
To ensure that false positives from the bloom filter don't negatively affect user's experience too much, you can store in table 1 the truncated hash from table 2. Most of table 2 buckets should contain a very small number of entries (e.g. less than five), so you could pick a very high iteration count which would make brute force not feasible, but still allow users whose card is a false positive to check only against at most five or so entries. — Lie Ryan, Mar 16 '17 at 02:04
@LieRyan The link to a bloom filter is interesting. I am new to the concept of a bloom filter, but I think it will not solve this problem. The attacker can use the bloom filter to iterate over the entire cc number space (roughly 100,000,000 bloom filter lookups) producing a list of every cc number on the blacklist plus 1-10% false positives. This is as good as having the data fully compromised. It means if you have 1000 entries in the blacklist the attack will produce about 1010 cc numbers which contains ALL of the 1000 on the blacklist. Can you elaborate on this? I think I missed something. — Andrew, Mar 16 '17 at 09:35

John Wu · Accepted Answer · 2017-03-17T00:48:44.793

4

This is a pretty tall order, OP, and I don't think it is possible to do it perfectly. Nonetheless here are a couple ideas.

The attacker is unable to check if the card is on the list, even though they can insert values and can see the encrypted/hashed values.

The answer to this is to insert false positives. If a card is to be blacklisted, compute a hash (or hash + crypt-- see below) composed of PAN. If it is not, compute the hash as PAN + a nonce. In both cases, insert a record. To check to see if the card is blacklisted, recompose the hash without the nonce and check for it.

This way, a hacker can see that a record is inserted, but the insertion itself doesn't tell him anything.

Have the credit card numbers secured with either encryption or a hashing function so that if the data is compromised the numbers will not be usable.

Hash the PAN + long, systemwide cryptographic pepper (secret salt)

The pepper will add enough entropy to make brute force impossible without knowing the key. Store the pepper outside of the database.

To check for a match, perform the hash on the PAN of interest, then search the table. Because the secrets are all systemwide, you should be able to leverage an index, including the nice B-tree index that gives you the O(log(n)) performance you are looking for.

edited Mar 17 '17 at 00:48

answered Mar 16 '17 at 09:47

John Wu

9,101
1
28
39

I think this might actually solve the problem! Wow, 2 years later and I have an answer. Specifically the first part of your answer mixed with "Proposed solution 4". You are correct about "secrets will be available to the hacker", but only the ones in the database, so if a global secret salt is stored outside the db and I use your strategy of inserting a different row on every attempt (even when the row already exists) the attacker can't perform an online brute force attack anymore! – Andrew Mar 16 '17 at 12:43
Sorry about the lack of clarity of what secret information is available to the attacker. The question was already very long. Do you mind changing the second half of your answer assuming that a global salt outside the db can be kept secret from the attacker? Using a global long secret salt is much more simple than the progressive hashing scheme. Then I will mark this answer as the accepted one. – Andrew Mar 16 '17 at 12:45
Generally speaking, safe RSA implementations use different paddings, meaning that the encrypted text will not be the same over 2 different attempts at encryption, meaning that hash + RSA will not result in something behaving like a hash (google "does encrypted text change with rsa") – bbozo Mar 16 '17 at 18:54
Options 1 & 2 are basically what I suggested, but for option 2 I suggested using 3DES/AES with ECB in an HSM, which seems generally like a better idea. I also suggested using a stretching scheme to make the brute force attack more costly. – bbozo Mar 16 '17 at 18:56
Also, the solution to "The attacker is unable to check if the card is on the list, even though they can insert values and can see the encrypted/hashed values" opens you up to a DDOS attack on the database by somebody who has no special access to your systems. On the other hand, knowing the strongly encrypted values of known clear texts brings you pretty much no closer to a cryptographic attack vector. – bbozo Mar 16 '17 at 19:00
Good point about the padding, you'd have to use textbook RSA, and it'd be hard to find an implementation of that. As for the DDOS, I don't understand your concern. I'm assuming record creation is a natural side effect of performing a purchase, so one more record does not exactly increase the attack surface. – John Wu Mar 16 '17 at 21:03
@bbozo that is true that 2 different encryption attempts will result in 2 different values. There is a way to initialize the the IV always the same (and keep the IV secret) so that it will result in the same value. You obviously loose some security by reusing the IV on every encryption (this is not what I am doing). Similarly you can hash always using the same salt (keeping the global salt secret) and hashing the same value twice will result in the same hash (this is what I am doing). – Andrew Mar 17 '17 at 10:01
@bbozo The particular hashing or encryption algorithm was not of importance to me. I was assuming that a strong algorithm was already chosen. Essentially any algorithm that is widely used without any known exploits and is "slow enough" meets my requirements. The main reason I have accepted this answer is the "Always insert a new hash even when the number is already on the blacklist" suggestion removes the attackers ability to detect duplicates when they have db read access and are performing an online attack on a public api. – Andrew Mar 17 '17 at 10:04

CaffeineAddiction · Answer 2 · 2017-03-15T20:52:19.000

The biggest problem I see with your question is: Have the credit card numbers secured with either encryption or a hashing function so that if the data is compromised the numbers will not be usable.

This is a problem because as @Bobson stated in the comments Even hashing credit card numbers is problematic. Given the restricted nature of valid card numbers, it's likely that there's only one valid number which hashes to a given hash.

At first I thought that this was because the search space was limited to [0-9]{15-16} or 1.11 x 10^16 ... which is kinda bad but at an offline attack rate of 1.29 days per card number ... its not end of the world bad. But after doing a bit more research I found that the problem is actually much worse than this.

The first number on your card: If it starts with 3, it is an American Express, Diner’s Club, or Carte Blanche; 4 is reserved for Visa, 5 for MasterCard, and 6 for Discover. The next five digits will indicate the card issuer such as the bank or credit union, as well as the type of credit card. So for example, all Citi AAdvantage Executive MasterCards will begin with 546616 with the 5 representing MasterCard, and the 46616 representing Citi as well as the fact that it is one of their American Airlines AAdvantage Executive cards. The remainder of the 16 digits (or 15, in the case of American Express), represent both the cardholder’s account number as well as one or more “check digits.” source

For 16-digit numbers, start with the first number on the left, and double each alternating number on the card. (For 15-digit numbers, start with the second number from the left.) Add any double-digit numbers so that they reduce to a single-digit number. (For example, if one of the numbers is 8, it doubles to 16, then you add 1 + 6 to get 7.) Next, add those together with the alternating numbers that you did not double. If the total you get is divisible by 10, the credit card number is likely valid. source

So this means that in the number 1222-2233-4444-4444 your search space is only slightly more than [0-9]{8} which would mean:

1.29 days per card - Assuming one thousand hashes per second (online attack)
1.11 ms per card - Assuming one hundred billion hashes per second (offline attack)

source

this problem gets even worse when you factor in how many cards are actually in circulation

Type of Credit Card 2000    2003    2007    2008    2010    2012
Visa                 255     283     321     304     240     261
Mastercard           200     267     279     260     171     174
Store                597     521     546     539     318     455
Oil Company           98      86      73      65      35      60
Discover              50      54      57      58      55      59
American Express      33      36      52      54      49      52
Other                192     185     166     160     124     106
Total              1,425   1,432   1,494   1,440     992   1,167

*cards in millions

source

This means that unless you have an additional source of entropy that is not stored in the database (aka not salt) nor in the program doing the hash (aka not pepper) besides the card-number itself ... storing the credit card hashes at all is not advisable.

Thank you for the submission. I really appreciate the detailed list of sources. The solution we implemented 2 years ago was to use "proposed solution 4" and monitor for suspicious online brute force attempts, which wasn't actually solving all the requirements as a sneaky online attack could still reverse some numbers, but this was deemed acceptable risk because it was already PCI DSS compliant (I have left some details out). The global salt outside the database increases the state space to make hashing acceptable. If the db is compromised the hashes can't be reversed because the salt is secret — Andrew, Mar 16 '17 at 12:58
The problem we still had was that an online brute force to a public api blacklisting cards on purpose could indirectly insert into the table through the global salt known by the application running the api (again some details have been left out). I believe the answer from Jhon Wu finally solves the online brute force problem by always inserting a new hash even if the card is already on the blacklist. — Andrew, Mar 16 '17 at 13:31
@Andrew, if I understood right, you are running a blacklisting service? And you return hashes to the API client in the response? I'm sorry, I'm struggling to figure out your use case here ^_^ — bbozo, Mar 16 '17 at 19:03
@bbozo I will try to make a summary of what happens, but I will have to leave out some details. You can think of it as an internal blacklisting microservice. A request comes to a public facing api. The api sends the cc number to the blacklist microservice to check if it is blacklisted. If it is, the request is denied immediately. If it is not blacklisted the api continues calling other services. One of these other services may determine the request to be fraudulent and blacklist some information contained in the request (including the cc number) to ensure similar future requests are blocked. — Andrew, Mar 17 '17 at 09:53
@bbozo I'm sorry I can't elaborate more due to proprietary reasons and limited comment space. I have tried my best to avoid the XY problem, but I have some restrictions about what I am allowed to say. — Andrew, Mar 17 '17 at 09:54

bbozo · Answer 3 · 2017-03-16T10:07:36.250

But what if a hacker gets access to my live database and is able to arbitrarily enter information into it...

You could (actually should) argue that if a person can add things to your database that you have bigger problems than the security of your blacklist - let's start with privilege escalation, admin super user hacking etc (it's much easier to add/remove cards from blacklists through your admin interface than through your database).

And if we're talking about an offline dump of your database getting stolen then it's your duty to keep the PANs safe. According to the PCI DSS council "the good enough for PAN safekeeping is" defined in Requirement 3: Protect stored cardholder data of the PCI DSS' "Requirements and Security Assessment Procedures":

3.4 Render PAN unreadable anywhere it is stored (including on portable digital media, backup media, and in logs) by using any of the following approaches:

One-way hashes based on strong cryptography, (hash must be of the entire PAN)

Truncation (hashing cannot be used to replace the truncated segment of PAN)

Index tokens and pads (pads must be securely stored)

Strong cryptography with associated key-management processes and procedures.

Note: It is a relatively trivial effort for a malicious individual to reconstruct original PAN data if they have access to both the truncated and hashed version of a PAN. Where hashed and truncated versions of the same PAN are present in an entity’s environment, additional controls must be in place to ensure that the hashed and truncated versions cannot be correlated to reconstruct the original PAN.

I would argue, though, that the note in the bottom implies that the PCI DSS council expected people to not salt their hashes. Read more about hashes & their security later.

Please note that all of the options I'm listing here are "good enough" per PCI DSS standards, in varying levels of usability and paranoia.

Option 1: use only the masked PAN

(and a masked PAN is first 6 and last 4 chars of a PAN)

Use the masked PAN for blacklisting. I have seen precious little cards in my history where 2 users came with the same masked PAN and different full PANs, perhaps it's rare enough to make it a Helpdesk and not a programming issue.

Option 2: use the masked PAN and the encrypted PAN

If you want to guard against this eventuality, use masked pan to grab a number of rows (usually 1, almost never 2), and then compare against a salted hashed PAN, a global salt will do. But at this point you might as well use the encrypted PAN too (decrypt & compare) because 99.99% of the time you will be comparing with just 1 record.

Q: but this doesn't work for me, my masked PAN is just the Luhn digit! What can I do?
A: then sorry, obviously you can't use this approach, just remember that the PCI council considers first 6 and last 4 to be the "masked PAN". You can still use hashing though, keep reading

Option 3: use the hashed PAN

No, you don't need to check O(n) records to do a verification. Use a table-level pepper and a database index and ditch the salt.

Hashes security is a another story altogether, but remember:

table-level strong pepper should be enough, salt isn't helpful so much because it gets stolen with the database
perform the hashing operation repeatedly, a number of times, a practice called stretching
store & use the pepper via an HSM

Take a look at this implementation to give you a general idea.

Does your HSM allow for such a scheme? Digests are generally poorly supported in hardware HSMs, so you can use normal 3DES/AES for the same thing.

Q: I know this approach is probably stronger than the security of my admin's passwords but I still don't feel sure about this, it's software-stored peppers after all, and they're table-wide, it feels insecure
A: OK - but remember - this is what PAN peppers are for - this approach is above and beyond what's required by PCI DSS - while row-level salts are something that makes rainbows attacks harder they're also something you steal together with the hashes themselves so their security is moot compared to something that's not stored in the database - like a database-level pepper - but OK, keep reading, you can extend this approach by applying your HSM to the problem

Option 3b: hashed + encrypted PAN

Use a 3DES or AES cipher (I just refer to them as "DES" in this section, don't be confused by this) in ECB mode or with fixed IV in CBC mode. For added security through obscurity you can encode something in the IV, such as parts of the PAN or, stronger - another cryptogram that's accessible only via the HSM (ECB encrypted PAN or its derivation comes to mind as a decent IV candidate).

Use the same technique as described before and as shown in the code sample I provided - with the global pepper - with or without the digest & under DES - this will make sure that if an attacker runs away with your database & your code base & your server setup that he will not be able to start a rainbow attack without breaking DES first and this is "impossible" because the DES keys are in the HSM.

Note that ECB and fixed IV CBC in this case is "good enough" because of the underlying randomness of the peppered and hashed material.

Q: but I'm still not happy because ....
A: well.. keep reading

But in the end...

I'm pretty sure you can't make the scheme more secure than this while still keeping the function it's supposed to do.

Security and usability are two polar opposites, don't go full paranoid mode, be smart. Full paranoid mode can backfire in interesting ways, from my experience, so far:

I can already imagine that in your institution while you're looking for the perfect solution some lazy programmer implemented clear PANs in the database 6 months ago because the management was shouting - it's probably not like you're running without a blacklist all this time - I've seen this happen with a global e-Wallet provider - if this is your case - you're not alone
a military intelligence institution and a bank with strong enforced passwords - passwords written on a post-it on the back-side of the keyboard and "not passwords.txt" file on the desktops, respectively
a super-uber-PCI-DSS-got-nothing-on-this secured national bank in the EU with contactless PKCS11 wallets you carry on your person - being there as a third party contractor I found out they gave us "blank access" cards to all areas of the building and all computers because the "third party" module of the security software wasn't compatible with the "linux drivers" whatever that means
a super-secure PCI DSS environment and a single "magic cable" "hidden" from the CSO sticking out of the secured box and into a router because the protocols didn't foresee mass hardware migrations in a secure closet with limited drawers
same institution as the last point, a bug in the software in which first 8 chars of 10-20 PANs appeared in a log in a secure environment (by PCI standards). Someone would have encrypted the logs, but the CSO argued that it conflicts with some "ease of use clause" in the PCI specification and insisted that the issuers of the cards be contacted to blacklist the cards and reissue them because "they have been compromised", the backlash of the banks who were also partners was pretty understandable, this CSO lost his job eventually
PIN cryptography, protection vs cloning Visa/MC cards is protected by 2-length DES keys with ECB, never broken cryptographicaly, but people write their PINs on cards because they're hard to remember and US still afaik by and large uses magstripe because EMV is hard to implement
Mifare cards, single length DES encryption by-and-large - 10 billion chips, 150 million readers, only extended recently after successful breaches by AES with a key negotiation scheme which might as well have been written by a child (one party makes half a key which is concatenated to the other party's half-a-key instead of a proper full-length XORing scheme - and the initial key exchange when the master key is loaded into the card is still DES with a predefined 000..0000 key) - considered secure
and let's remember https://xkcd.com/538/ :D

So yeah, if you're trying to be holier than Pope in relation to security, you do it at your own risk, don't be a fundamentalist, be smart. Focus your energy on securing your system where it matters most - and the blacklist almost certainly isn't it.

I'm guessing this because generally by the time you have a blacklist you already have a "secure card lookup mechanism" in place for the transactions, risk rules etc and the problem was, generally, solved months ago. And this isn't the case here.

PS get an online API and try a brute force "scan" (1$ authorizations) of 10.000 card numbers and see what the processor tells you about it. There are checks and balances against this in place, read:

"VISA: Global Acquirer Risk Standards"
"VISA: Global Brand Protection Program Guide for Acquirers"

to learn a bit more about them.

The masked PAN is not available at the moment where it is desired to check the blacklist, but let's say hypothetically it is. Using the masked PAN only increases the state space by a factor of 10000 and unfortunately 1,000,000,000,000 is still a space small enough to brute force. Please see "failure" under 6th proposed solution about why two tables is a bad idea. — Andrew, Mar 15 '17 at 08:52
Regarding "...you have bigger problems" the attackers write access is only indirect and very limited by making a special request to a public facing api. I am looking for a "security in layers" solution. If someone has read access to your db you have massive problems, but that doesn't mean you just leave all data in plain text because someone gaining access to your db was a bigger problem — Andrew, Mar 15 '17 at 08:54
@Andrew, for sure - you need to encrypt stuff for the eventuality of your DB getting stolen - meaning "read-only access" to the attacker, I was talking, and you were talking, about write-access - a different ballpark indeed - really don't even try to protect a blacklist against this - you really have more problems than that and no serious way of protection until you block access to the attacker. Read my answer again, especially the 2nd paragraph, I'm not sure what your problem is with this. — bbozo, Mar 15 '17 at 09:09
As stated twice now, the write access is indirect and very limited. You might not be aware, but every single application taking customer information and storing it in a db provides indirect write access to the database (how else does the data get there?). In the world of security people should never say "don't even try to protect against this". For one company someone gaining complete write access (again I'm not talking about complete write access) could be not so bad if they just throw out the database and start over. They may only be concerned about not leaking customer data. — Andrew, Mar 15 '17 at 09:36
I don't have a problem with your answer. I appreciate that you took the time to write a detailed response. I did read your 2nd paragraph multiple times before posting the first comment in which I explained why your submission does not answer the question. Maybe you would like to read the comment again, especially the first sentence. — Andrew, Mar 15 '17 at 09:39
Andrew, I'm a web developer among other things ^_^ anyway, my suggestion is not using last 4 characters, but first six and last 4 (bin of the card + last 4 chars - this is what is traditionally considered "the masked PAN") - this leaves you 99.99% of the time with only 1 record to choose from once you did the masked PAN lookup. I think if you look hard enough, you'll find that what I proposed to you covers all of the conditions you set in your post - or let me know which of the requirements it doesn't cover so I'll elaborate. — bbozo, Mar 15 '17 at 10:45
The main requirement it does not cover is that the PAN is not available. The second requirement not covered is explained in "Fourth proposed solution". If an attacker reads the masked pan they have 10 of 16 digits meaning there is a state space of only 1,000,000 possible full PANs to try. If an attacker can make 1,000 requests per second to a public facing api (where 1,000 per second might be reasonable) they could reverse a full PAN on average in 500 seconds. However, the main problem is that at this point in time the PAN is not available. — Andrew, Mar 15 '17 at 13:03
What do you mean under "the PAN is not available" ? I'll expand my answer now, I'm afraid it's all I can do :) — bbozo, Mar 16 '17 at 07:50
Wow, thank you for the detailed answer. This will take me some time to sort through. I might not be able to respond today. — Andrew, Mar 16 '17 at 09:55
One point I think we don't agree on is the implication of using salt. If you use salt in the database on a per row basis, you only slow the attacker down to cracking the hashes one row at a time (because the attacker sees the salt since the salt is in the database). Same applies to table level salt (since it is also in the database the attacker can see it). This is what the PCI DSS requirement is referring to. The remaining 6 digits of the PAN are trivial to brute force if the attacker knows the salt. The only feasible option to storing salt outside the database is a global salt. — Andrew, Mar 16 '17 at 12:24
@Andrew, two points I already did - salt (row-level) is in same place as the hash, so if you know it and you're doing a known plain text attack based on it and PAN candidates it offers practically no security benefit at all - only the pepper does anything here. Pepper (table-based) lives in the code, or (better) somewhere in the server config so it's *another thing* that needs to be stolen - more secure. Using salt, additionally, as you noticed kills the entire idea of hashing (O(n) lookups), using pepper-only on the other hand enables you to do fast indexed searches. That's the whole idea. — bbozo, Mar 16 '17 at 18:38
Additionally, adding the HSM ECB encryption to top it all off, adds another thing that needs to be stolen or broken through, and that is the physical PKCS11 device, which is harder to do, and it still allows you fast indexed searches. But if you add salt (row-level) - you get pretty much nothing in terms of security and you lose everything in terms of usability (no more lookups < O(n)). — bbozo, Mar 16 '17 at 18:40
Whatever you do, stay clear of home-made smart solutions. They are rarely smart. — bbozo, Mar 16 '17 at 18:50
I think we have different definitions of pepper. Mine is https://en.wikipedia.org/wiki/Pepper_(cryptography) . The pepper is not stored *anywhere*. It is used as a simple way to increase the time requirement of a hash lookup, but keep the initial hash cost low. — Andrew, Mar 17 '17 at 10:13
About the salt, my previous comment was addressing "I would argue, though, that the note in the bottom implies that the PCI DSS council expected people to not salt their hashes" and then the later suggestions implied storing the salt in the database. I was pointing out that no matter how you store the salt, be it row-wise (I agree this was not feasible because the O(n) problem anyway) or table wise, if you store it in the db then you loose the benefit salt provides of increasing the possible state space. — Andrew, Mar 17 '17 at 10:16
Regarding " stay clear of home-made smart solutions". I could not agree more. We have settled on a well known and proven solution of hashing HMAC style (obviously with a suitable strong hashing function). The only addition is the new extra records inserted from the suggestion of Jhon Wu. These records are essentially a hash of a random string and contain no information even if they are reversed. They are just to remove the ability of an attacker with read access to detect duplicates. — Andrew, Mar 17 '17 at 10:28

score 1 · Answer 4 · answered Mar 17 '17 at 02:38

I think that this is a case where you should re-work the problem.

There's no getting around the ridiculously small space of valid PANs. ALL of your security comes from entropy (salt) and work (iterations), and you're asking for that for free. You won't get it for free.

The only way to speed it up is to try to make sure you don't have to do those computations most of the time, by making a sort of "pre filter." All possible pre-filters are equivalent in that they are probabilistic and cannot make guarantees on complexity. All of these pre-filters return only "possibly in set" and "not in set." They leak that data. You can't get around this.

A simpler,faster pre-filter like a bloom filter doesn't give you a bounded set to search, even though it has a nice fast running time of O(1). A two-table approach like in proposed solution #6 gives you the exact set to search, but is more complex - o(log(n)).

But wait - how is solution #6 equivalent to a bloom filter? Well, a bloom filter returns "0 to search" or "n to search." It tells you "not in set" or "possibly in set." What do we get out of the two-table solution?

Two tables exist: One with masked PAN (first 6 and last 4 digits), hashed with global salt, where each record points to rows in a full table with full PANs, hashed strongly with unique salts. You hash your incoming PAN's masked form and do an o(log(n)) search vs. the pre-filter table.

Now that we have done our o(log(n)) search, we end up with between 0-n possible hashes to compare against, and smaller numbers are much more likely. A match number of 0 is "definitely not in set," and 1+ is "possibly in set." Then you still have a (probably small) number of difficult computations to make to check against strong hashes.

Whether it is a bloom filter or an o(log(n)) two-table approach, or any other, all the work gets done somehow. You just can't cheat it. You can fine-tune it, but you can't cheat it.

Any other method is equivalent to this. Reducing the entropy by having an easy pre-lookup means reducing the problem space, which you then try to map to a larger problem space, which means that you are working with probabilities.

These are just the facts from an information theory standpoint. So, I suggest that you re-work the problem, not the solution.

I agree with this two-table approach having a fundamental problem no matter how you disguise the second table. I only included "proposed solution 6" to deter people from proposing similar types of solutions. They all have this fundamental flaw. If you reduce the entropy for you, you also reduce it for the attacker. In the end we are using a global salt outside the db to increase the entropy (similar to solution 4, with an improvement suggested by Jhon Wu). If you see a problem with this solution, please let me know. I'm open to discussion on the topic. — Andrew, Mar 17 '17 at 10:23
@Andrew Jhon has good insight. Relying on pepper always makes me a bit nervous because an attacker that compromises your DB is likely also going to compromise that data, and seems a bit like security through obscurity to me. If an attacker compromises that peeper then all that entropy is effectively gone. But it can't HURT and does increase the amount of work required. And his false-positive suggestion is excellent. — std''OrgnlDave, Mar 18 '17 at 06:07