Is it security if an alternative of the full credit card number is stored in database

Question

Short Description

I am wondering if there is a way to store an alternative of the full credit card number securely, which the card number is not required to be retrieved from the alternative.

Long Description

My goal here is to somehow store an alternative of the customers' full credit card number in database thus we could identify the same customer on their next purchase. However it should be a huge security problem if we hash/encrypt the full number in database. So I am referring the idea of one-time pad cryptography where used the CC as the private key (Key) and the key will be used to encrypt a random string which will follow a pattern (Plain Text). Thus the Key will not be stored in the database, and there are millions of possibility of the Plain Text.

Exact Flow of the Initial Idea

Testing card number: 4539112781735373 (source: http://www.getcreditcardnumbers.com/)

A) Encrypt full credit card number by bcrypt with cost 14 ($2y$14$g161t43Dcu7beXbIn2a55.LvH.Cpn7pwu05MPCz1PbAzbFEr0SqZS)

B) Convert the bcrypt output to bytes (00100100 00110010 01111001 00100100 00110001 00110100 00100100 01100111 00110001 00110110 00110001 01110100 00110100 00110011 01000100 01100011 01110101 00110111 01100010 01100101 01011000 01100010 01001001 01101110 00110010 01100001 00110101 00110101 00101110 01001100 01110110 01001000 00101110 01000011 01110000 01101110 00110111 01110000 01110111 01110101 00110000 00110101 01001101 01010000 01000011 01111010 00110001 01010000 01100010 01000001 01111010 01100010 01000110 01000101 01110010 00110000 01010011 01110001 01011010 01010011)

C) Generate 6 random 10-digits numbers follow a pattern as the plain text (the pattern should be simple enough to leave more open possibilities and complicate enough for to be identifiable, let's say they all can be divided by 2 and 7) (1410065398 1410065384 1410065370 1410065356 1410065328 1410065342) and convert the numbers to bytes (00110001 00110100 00110001 00110000 00110000 00110110 00110101 00110011 00111001 00111000 00110001 00110100 00110001 00110000 00110000 00110110 00110101 00110011 00111000 00110100 00110001 00110100 00110001 00110000 00110000 00110110 00110101 00110011 00110111 00110000 00110001 00110100 00110001 00110000 00110000 00110110 00110101 00110011 00110101 00110110 00110001 00110100 00110001 00110000 00110000 00110110 00110101 00110011 00110010 00111000 00110001 00110100 00110001 00110000 00110000 00110110 00110101 00110011 00110100 00110010)

D) Perform XOR with bytes from step B) and C), output 101010000011001001000000101000000000100000010000100010101010000001000000011100000000001000000000001010000001101110100010101010100000000000100010110100101000101101001010101100111100001011110000000100101011100000000000001100001100101111100010001110111110000011111011100110100000001011000000000100100001101000010010000110000000100000001011111000110000001110011010011000000010001100011010100000111100101001011010101100111011101110101010000100000011001100110010000100110111001100001

E) When there is another purchase done by the same credit card, step B) will be performed again and the outputting bytes will be used to against all of the records from step D) (regardless the performance)

F) All retrieved bytes will then be converted back to numbers and check which are able be divided by 2 and 7, the matching records will be treated as a 'return customer'

What I am really wondering is...

Is the above flow secure enough to protect the customers' card number?
Is there any further step I could take to make it more secure?
Any better algorithm or pattern could be applied for step C)?
Does cost 14 good enough for step A)? Or is the cost/security matter when getting the bytes from credit card?
Is it possible to crack the full card number by the above method? if so, how long does it take?
What is the possibility that 2 different credit card clashes the above method? (same outcome)

Sorry I am not good at mathematics, encryption or security, any idea is welcome.

Remark

The first 6 and last 4 digits of the card number are store in the database somewhere else.

Why do you think that "hashing the full credit card number is a huge security problem"? Proper cryptographic hash functions are irreversible. — Philipp, Jan 15 '18 at 12:58
@Philipp True, but even for a naïve search, the search space is only 10^15 + 10^16, corresponding to about log2(10^15 + 10^16) ~ 53 bits, which is already quite doable. OP storing 10 out of the 15-16 digits elsewhere makes the problem downright trivial. — user, Jan 15 '18 at 13:48
Also, if whatever approach you use works and allows you to confirm that two customers have the same number, you've effectively created a hash function anyway. — Michael Mior, Jan 15 '18 at 15:18
@Philipp As mentioned by Michael, I am thinking having the salt and 10 digits in the database doesn't really prevent the brute force attack (6^10 is a million possibilities which shouldn't be lots of time), however I could still run more rounds to encrypt one record (let's say 100), but I am not famous with today's hardware as well so I cannot estimate the time to crack one hashed string. — Zay Lau, Jan 15 '18 at 15:29
The problem is insurmountable, presumably your system needs to identify the repeat customer relatively quickly, so it needs to do the hash in <1s. That means your own computer would take at maximum 11.6 days to do all 1 million hashes and find the original credit card number. That's nowhere near long enough, and an attacker can probably do hashes thousands of times faster than your computer with a handful of GPUs. — patstew, Jan 15 '18 at 16:02
One solution would be to use a normal hash function, but only store the first few bits, e.g. if you store 10 bits there are only 1024 possible hashes, so it can't identify a specific cc number among the 1m possibilities. However then you have to deal with collisions, and it might still violate PCI, I have no idea. I think you'd be better off comparing other information you hold like name, address, expiry date, the cc number fragments you're holding anyway etc. — patstew, Jan 15 '18 at 16:12
@patstew Collisions are acceptable but we would like to try our best to minimize the chance, however in the real logic we could only compare the records which match the first 6 and last 4 digits of the card (and the records will be stored up to 3 days), normally it will only be not more than 3 records to compare (10 at most) — Zay Lau, Jan 15 '18 at 16:25

score 10 · Answer 1 · edited Jan 15 '18 at 18:33

Frankly, this idea is not very clever. It's massively overcomplicated, and it isn't clear what you're trying to achieve.

Your XOR obfuscation is really useless because you're XOR-ing bcrypt's prefix with your number series. Since the prefix ($2y$14$) is almost always going to be constant and known to the attacker, you can use that to decode the first 7 characters of your random number series: "$2y$14$" xor output_of_step_D[:7] = "1410065". This narrows down the effective number series to just the last three digits. Further, knowing that the number series must be divisible by 2 and 7 and narrows down the start of the three digit numbers to just about 71 possibilities (1000/14 = 71), and since the number series is going to be in decreasing order, you can eliminate most of the rest by checking which of the 71 possibilities can be produced from xor-ing bcrypt's base64 charset and ASCII digits.

Since the XOR part and the generating not-really-random numbers part here are effectively useless, you're effectively no better than just storing the credit card in bcrypt here.

The worst thing about this scheme is this:

the outputting bytes will be used to against all of the records from step

Because of this part, it will be extremely expensive for your own application to check for return customers, your performance will suffer as the number of customers grow, and you'll have to redo this on every request as caching these checks kinda defeats the purpose. I wouldn't be surprised if this check easily becomes your main CPU bottleneck.

Thanks for the comment, what if we don't use bcrypt? there is a thousand of algorithms in the world to hash string such as SHA family. And by the number could be start from 0000000014, the numbers are not any order form (should be random and unrelated), sorry i didn't mention it clearly. I think I forgot to mention the performance is ignored because 1)Most records are storing up to 3days. 2)There is a table which store 10 digits of the CC num and we will cross-check with that table(hopefully there would only a few records to against per card).However may I have your advise on storing the num? — Zay Lau, Jan 15 '18 at 15:39
An additional remark, I am trying to identify if two purchases are done with the same credit card without storing the actual full card number in anywhere of the system, collisions are acceptable but should be minimized if possible. — Zay Lau, Jan 15 '18 at 15:50

Philipp · Accepted Answer · 2018-01-15T19:45:57.973

Disclaimer: Depending on your location there might be local laws and regulations regarding how to store credit card numbers (like PCI-DSS, for example). Look them up and when they are at odds with what I am proposing here, do what they say, not what I am saying.

You are inventing an own cryptographic algorithm. This is something you should never do, unless you are part of the very small elite of world-leading cryptography experts who have the necessary knowledge in mathematics and computer science to do that. And considering that you describe yourself as "not good at mathematics, encryption or security", you don't seem to be one. When you try to invent your own crypto without having sufficient expertise, you will do things which are pointless at best and actively harming security at worst.

In this case, just use a stock one-way hashing function and call it a day. bcrypt is not an appropriate choice here, because you want to be able to detect collisions with data you already have. But bcrypt adds a random salt to each generated hash, which means multiple bcrypt hashes from the same data look nothing alike. This is good for the purpose bcrypt was designed for (hashing passwords for user authentication) but bad for your purpose (fingerprinting data to detect duplicates). What you want is a cryptographic hash algorithm which gives you the same output for the same input. Current state-of-the-art algorithms with this property are SHA-3 (which has little technical similarities with the obsolete algorithms SHA-1 and SHA-2) and BLAKE2, for example.

If you want to minimize the damage in case your whole database gets stolen:

Add a pepper (append a string to each credit card number which is the same for all numbers) to your hash generation. That way it isn't possible to cross-reference your database with another database which uses the same encryption algorithm.
Use multiple rounds of your hash algorithm by repeatedly feeding the output as input to the next round. This multiplies the time a cracker will need to brute-force your CC numbers, but also the time it takes you to process new numbers.

Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/71731/discussion-on-answer-by-philipp-is-it-security-if-an-alternative-of-the-full-cre). — Jeff Ferland, Jan 15 '18 at 17:26