46

Transforming the credit card number into a fingerprint

What's the best way to hash a credit card number so that it can be used for fingerprinting (i.e. so that comparing two hashes will let you know if the card numbers match or not)?

Ideally, I'm looking for recommendations of which hash + salt might be well suited for this use-case. The hash would uniquely identify a particular card number. You can use this attribute to check whether two customers who've signed up with you are using the same card number, for example. (since you wouldn't have access to the card numbers in plain text)

Fingerprints match

To give this some context, see the Stripe API docs and search for fingerprint. It's where I first heard about this concept. The credit card information will be stored on a secure machine (somewhere in the Stripe back-end) that's not accessible by the customer, and the API returns the fingerprint in order to allow the API consumer to make comparisons (e.g. to answer questions like has this card been used before?).

Let's make this one clear, coz I know you're going to ask: I'm not trying to replicate it. I'm just curious to understand how this could be implemented

kalina
  • 3,354
  • 5
  • 20
  • 36
FloatingRock
  • 791
  • 1
  • 6
  • 12
  • Are you going to be checking them sort of like a password, or will you need to be able to query? Salting would straight up break the ability to query. – Andrew Hoffman Jul 16 '14 at 15:33
  • Cause if you want to protect it from bruteforce guessing like you would a password, then you'd need to treat it like a password, except with CC numbers instead. CC hash and salt associated with that user. A good slow hash with stretching, secure comparison, etc. – Andrew Hoffman Jul 16 '14 at 15:37
  • 12
    So salt + hash *won't* work, because salts are unique. So, two identical card numbers, once salted, will produce different hashes, and you will not be able to determine that the card number matches a pre-existing salted hash of the same card number, since the salts will be different. – Xander Jul 16 '14 at 16:40
  • @Xander Thanks for the clarification. Ok, no salt then. Which hash would you suggest? – FloatingRock Jul 16 '14 at 17:07
  • 2
    I tend to agree with [u2707's answer](http://security.stackexchange.com/a/63251/12) and if you can't safely store the credit card numbers, I don't think it's reasonable to store hashes either. I can't suggest doing this to your customers. – Xander Jul 16 '14 at 17:19
  • Redirect them to paypal? – Andrew Hoffman Jul 16 '14 at 17:50
  • @Xander: Reuse the same salt. The salt is normally stored unencrypted. For comparison purposes anyways, just retrieve it and rehash using the same salt, check if they match. That being said, I wouldn't store any credit card number. – Engineer2021 Jul 16 '14 at 18:16
  • 5
    @staticx That isn't a salt, it's a pepper, and it isn't particularly valuable from a security perspective when trying to prevent hash-cracking, as you still only need to compute a single set of hashes to recover all values. – Xander Jul 16 '14 at 18:18
  • @Xander: Interesting, never heard that term before. I passed a PCI audit using a salt, I mean pepper, about 2-3 years ago. – Engineer2021 Jul 16 '14 at 18:19
  • 1
    @staticx It's kind of a cheesy term I know, but since by definition a salt needs to be globally unique, a value that's used more than once doesn't qualify. – Xander Jul 16 '14 at 18:24
  • @Xander: I prefer a one way token personally. – Engineer2021 Jul 16 '14 at 18:28
  • 1
    @staticx so to give this some context, see the [Stripe API docs](https://stripe.com/docs/api) and search for `fingerprint`. It's the same concept. The credit card information will be stored on a secure machine that's not accessible by the customer (API consumer), but the fingerprint is returned in order to allow them to make comparisons. – FloatingRock Jul 16 '14 at 18:32
  • So how is trunk encrypted and how is the card encrypted inside the trunk? – Engineer2021 Jul 16 '14 at 18:39
  • 1
    not quite related to security, but is your goal to make it so that 2 customers cannot use the same CC? I can quickly imagine a few situations where 2 separate accounts use the same CC: a father and son that both play videogames, and the father uses his CC to pay for the son's games on his own account; a married couple that have separate Amazon accounts to keep their kindle collections apart, but use one creditcard for both purchases; A company that needs multiple separate Office365 organizations, but pays for them with the company credit card. avoid making stuff unique that might not be that. – Nzall Jul 17 '14 at 09:47
  • @NateKerkhofs it's so that the customer (the consumer of the API) can decide on what to do in each scenario. If you want to group customers that use the same CC, you'd need some sort of fingerprint to identify the token -- which is what Stripe provides. You could do whatever you like with that fingerprint. – FloatingRock Jul 17 '14 at 09:57
  • In your last edit, just because somebody does it, does not mean tomorrow it will be frowned upon. Plus, what if the Stripe server is down? Do you really want to be tied to that service? – Engineer2021 Jul 17 '14 at 11:33
  • I *have* used Hash( fixed_salt_or_pepper + PAN ) to index blacklisted cards, and that passed PCI audit (and FISMA I believe). However, we did not use that mechanism to store customer's usable cards. – Neil Slater Jul 18 '14 at 12:57
  • 2
    @staticx If Stripe's server is down and you're using them for your merchant services, you can't process cards/transactions anyway, so it's irrelevant if you happen to depend on them for this function as well. – Xander Jul 18 '14 at 13:21

8 Answers8

31

Ok, so if this is a business requirement that you must meet (check to see if a card already exists in the system) this is how I would do it.

First, PCI-DSS does allow for a PAN (the primary account number, or credit card number) to be stored in hashed form using "strong cryptography" and they recommend (but do not require) that a salt be used as well. PCI-DSS v3 § 3.4(PDF link) The reason is to inhibit the ability of a malicious party who gets ahold of the hash from determining the PAN that was used as the input for the hash.

Second, your plan to store this in a separate, secure system and expose it only via an API is a good defense in depth measure. If I were implementing this, I wouldn't have it send a hash back to the primary system, I would simply take the hash as a parameter for the API call and have the API return "true" or "false" depending on whether the hash already exists in the system or not.

For the hashing function, the key to preventing the recovery of the inputs (as required by PCI-DSS) is to chose an algorithm that allows you to make the process computationally hard, so the task of brute-forcing the relatively small input space of valid PANs reasonably slow. SHA-512, suggested by another answer, is not this algorithm. Looking at the test matrix for hash cracking provided by hashcat.net the fastest machine in their matrix can calculate nearly 2 billion SHA-512 hashes per second. With an estimated input space of approximately 30,000,000,000,000 cards**, that means that unsalted, you could compute the SHA-512 of each of those PANs in just over 4 hours, and have the card numbers for ever card in the database, should it ever be stolen.

So, what we need is a slow hash function, however the definition of slow changes all the time as computers get faster and technology evolved. This means that a secure hash function will be one with an adjustable work factor, so over time you can increase the computational effort required to compute the hash of a given input. There are several candidate algorithms that meet this criteria including PBKDF2, bcrypt, and scrypt. The general consenus for password hashing is bcrypt, and that is what I would suggest here. For the cost, set it as high as your application allows. It likely isn't something you're going to be computing terribly frequently (I'd imagine, depending on your application, you're going to be looking at x hashes per minute or hour or day rather than per second) and the higher the cost, the harder it will be for an adversary who captures the hash to brute force.

** My estimate is based on a 9 digit account number, a computable check digit that doesn't increase complexity and a WAG of 30,000 active bank identification numbers (the first 6 digits of the card) based on a list I found.

Xander
  • 35,525
  • 27
  • 113
  • 141
  • 2
    I think you're too optimistic. Bcrypt doest resist FPGAs very well, and I'd assume the last 4 digits are known to the attacker. – CodesInChaos Jul 16 '14 at 23:27
  • @CodesInChaos Well, overall, my advice to not to do this at all in practice still stands. In theory though, if following PCI, the truncated card numbers are well separated from the hashes and a breach of one doesn't compromise the other. In practice... *Shrug.* In a theoretical implementation though, vis-a-vis the lack of resistance to FPGA-based attacks you mentioned, would you suggest scrypt is a better option? – Xander Jul 16 '14 at 23:34
  • 4
    credit card numbers are insecure in the first place anyway. Most serial number systems for games have more entropy than that. When will we get rid of them? – pqnet Jul 17 '14 at 03:01
  • @CodesInChaos what would you do? (if doing nothing wasn't an option) – FloatingRock Jul 17 '14 at 08:46
  • 1
    @CodesInChaos Because BCrypt uses a different salt every time, you wouldn't really be able to compare the resulting hashes. Unless, of course, you used the same salt, which doesn't seem very BCryptish to me. – corsiKa Jul 18 '14 at 15:04
31

I cannot comment on how Stripe does this but I can tell you exactly how Braintree does it (because that is where I work). If I had to guess, Stripe probably uses a similar method.

In the Braintree API, we offer a unique number identifier for a credit card. This identifier is a random opaque token that will always be the same for card number stored in our system. The seed for this number is different per merchant in our system so you cannot compare them across merchants.

When a new card comes in, we look it up by comparing it to a hashed + salted column. If it matches that existing column we know we can return the same unique number identifier. If it doesn't match any existing record, we use a cryptographically secure pseudo-random number generator to create a new unique number identifier and ensure it doesn't conflict with an existing one.

This way the hashed + salted value never leaves our backend but we can still provide a way for a merchant to uniquely identify stored credit cards.

John Downey
  • 1,915
  • 13
  • 12
  • Brilliant! @Xander, I'd love to hear your commentary on the security bits (and anyone else for that matter). – FloatingRock Jul 17 '14 at 16:48
  • 4
    @FloatingRock My take is that this basically extends the method I outlined by (a) peppering the PAN before hashing, and then (b) storing an additional unique, randomly generated token (or fingerprint) alongside the hash. You still have the requirements to strongly secure the app and database computing and storing the hashes and tokens, but you can then return the token to the front-end app to keep, and it doesn't reveal any information about the card it's associated with. It's quite nice in that it lets you offload functions to the front-end without any additional exposure to card data. – Xander Jul 17 '14 at 19:02
  • +1 I absolutely LOVE how the token is completely separate, since it isn't derived from the PAN at all. – FloatingRock Jul 17 '14 at 19:42
  • @Xander: This is better because you aren't storing the hashed and salted PAN at all. It is re-run everytime the transaction occurs. The only problem I see with this is or Stripe is that you have to be online 24/7. If the internet connection ever goes down, then you cannot process credit cards. I used to write credit card processing code for Parking Applications and many parking garages are fully automated with no cashier. We had to devise a method to securely store the credit cards offline up to a certain dollar amount. Once it came back online, then it dumped the transactions. – Engineer2021 Jul 18 '14 at 13:34
  • At the time, and I think still, this was a less risky situation because it wouldn't store cards forever and the database was securely deleted once it was dumped to the master server which never retained card information. It required us to place a hardened PC in the machine created by ADAM called the UNO with special software as an intermediary. – Engineer2021 Jul 18 '14 at 13:35
  • 2
    @staticx I think you missed a bit. "When a new card comes in, we look it up by comparing it to a hashed + salted column" The hashed PAN *is* stored, by Braintree, and I assume Stripe. The OP wants to know how that implementation works. – Xander Jul 18 '14 at 13:38
  • 1
    @Xander: Then there is still a weak point. It offloads the storage to another service hoping that it is secure. – Engineer2021 Jul 18 '14 at 13:49
  • @staticx true, the backend would definitely need to be secured (and PCI compliant). The key is that the fingerprint shared with the end user can not be used to derive the PAN _period_. (more like comma .. unless of course the backend's been compromised). – FloatingRock Jul 18 '14 at 18:33
  • @FloatingRock: Yes, the correlation has to occur within their DB only as there isn't a way to provide the fingerprint or unique number and get back a card number. That would be royally stupid. I think you increase your risk by storing the card number anywhere. PCI does not go far enough in my opinion in preventing storage. It should flat out deny the capability to store the number anywhere. It would lead to a slight inconvenience but not as bad as what happened recently to Target or other retailers. – Engineer2021 Jul 18 '14 at 18:35
  • @staticx that sounds great in theory. In practice however, there are legitimate needs that require storage of this information. Recurring billing (or subscriptions), for example, necessitates storage of the card number. – FloatingRock Jul 18 '14 at 20:29
  • @FloatingRock: It is very, very difficult to securely store credit card information. In fact, it was announced five years ago that 130 million credit card numbers were stolen from major retail and finance companies that have far more resources than you probably do to secure that data. I fully understand the desire to easily facilitate recurring payments. However, think though and understand the risk related to storing of credit card numbers before deciding to do so. If you decide that you need to store the card numbers, I recommend hiring a security expert with a proven track record – Engineer2021 Jul 18 '14 at 20:33
  • In terms of refunds, you could in theory store the truncated card number hashed with a unique salt with the person's name so that (like when you go return something at Target), it matches the secure hash if the card is presented again. This wouldn't rely on Stripe and you wouldn't have to store the full PAN though you still need to protect the truncated version. There are various tricks you can do, the point is once you decide to store it anywhere, there is a business risk than just transmitting it to the processor. – Engineer2021 Jul 18 '14 at 20:36
  • @staticx oh most certainly. I wouldn't attempt to do so, even with guidance from an expert. I'd much rather have someone who's fully qualified doing it, and even then hire someone to attempt a break-in. I was merely highlighting an example of where there's a legitimate business case for it. – FloatingRock Jul 18 '14 at 20:37
  • 1
    Out of curiosity, what prompted you to use a PRNG to generate the token, instead of say an auto-incremented PK, which would not need nearly as computational power and database lookups to ensure lack of collisions? – Mark Jul 19 '14 at 15:05
  • @RainFromHeaven we wanted the returned value to be completely opaque and not give away any information. If use used an auto-incrementing field it could reveal the relative count of cards stored and the velocity of growth – John Downey Jul 21 '14 at 18:43
  • 2
    For everyone commenting on the PCI implications of this. I agree that a hashed version of a PAN is trivially reversed into the full PAN if it is leaked (just because of the small domain of PANs). Braintree is a PCI Level 1 Service Provider and the hashing of PANs for this purpose is included in our yearly on-site PCI audit. The value add of using a service like Braintree is to help reduce the PCI burden on the merchants we support. That is why we provide a way to uniquely identify cards without needing to store the hash on the merchants end. – John Downey Jul 21 '14 at 18:47
  • 3
    @JohnDowney by 'When a new card comes in, we look it up by comparing it to a hashed + salted column', did you mean when a new card coming, you compare it to every credit card you already have? As they are slated, you need try them one by one? – Alfred Jan 16 '16 at 10:03
  • @Alfred yes, but database indexes make it not as scary as it sounds – John Downey Jan 12 '17 at 15:44
16

Hashing credit card numbers is not a substitute for securing the data. If your system isn't secure enough to store raw credit card numbers then it's not secure enough to store CC hashes.

Same thing for anything that's a fixed size and limited character set: date of birth, phone number, zip, etc.

Credit cards are all sixteen digits and while 10^16 seems like a big number, it's pretty straightforward for compute resources. If I have your table of CC# hashes, I can easily attack that with a rainbow table and get a full list of credit cards. Salting the values before you hash makes searching impossible.

All possible CC values: something less than 10^16.

For comparison, imagine your users used a reasonable 8 character password with a 20-byte salt: 62^28.

(If I was your PCI auditor I would consider hashing credit cards to be tantamount to storing the card numbers themselves.)

u2702
  • 2,086
  • 10
  • 11
  • 12
    One addition and one pedantic point. The pedantic point: Some credit card numbers are 15 digits, not 16. Not that this changes your answer much, but I wanted to note it. Additionally, you wouldn't have to brute-force the entire space. The first set of digits in a CC number are one of a few known prefixes, and the last is a check-digit, so the space of possible valid numbers is quite a bit smaller. – Xander Jul 16 '14 at 16:17
  • Gotcha. Updated the question to clarify my intentions – FloatingRock Jul 16 '14 at 16:18
  • 2
    @Xander: Yes, and its smaller than people think because the entire card number has to pass the Luhn algorithm. – Engineer2021 Jul 16 '14 at 18:14
  • 2
    Not enough rep to comment yet, but this is mainly directed at the previous answers suggesting on 10^16 as the "max" number of "randomness" - it's not really close to 10^16, since out of the 16 digits that make up a credit card number, only the digits 7-15 (inclusive) identify the account - the others identify the type (mastercard, visa, etc.), the issuer (bank XXX), the specific type of card (I.E. Delta gold rewards). Finally there is a "checksum" which means you can (quickly for a computer relative to computing a hash) eliminate another digit if that doesn't add up, so realistically you are l – user2813274 Jul 16 '14 at 17:50
  • "something less than 10^16" -- yep, definitely something smaller than 10^16, just too lazy to do the math. – u2702 Jul 17 '14 at 18:41
9

The answer will disappoint you: if a service/oracle/hash can tell you if an input number is on file, it can be broken by a very small amount of brute force.

Consider that in much of the country, credit cards are issued by relatively few banks in their geographical region. You can learn the BIN numbers for the most popular of these banks quite easily. Let's say you have a receipt from a Memphis store, and it says VISA ****1234. Chances are about 20% that it will be one of ten different BINs that belong to the largest banks in the Memphis area. That's 10^1 guesses for the first six digits. Next, you know the last 4, because they're printed on the receipt. That's now only 10^1 guesses that cover 10 of the 16 digits. Of the remaining six digits, iterate over 5 and compute the last using Luhn. That's 10^6 tests that will yield a valid account number 20% of the time.

All you have to do to test a number is loop through the service 100,000 times. It doesn't matter if the service uses a hash or encryption or a magic wand to protect the account numbers, as long as it tells you whether or not your input was a match.

You may note that this works only about 20% of the time. Consider that the guy who stole from Target took 40 million account numbers, and he couldn't sell them all, as there weren't enough crooks in the world to buy that many. 20% of that is 8 million, and he probably didn't sell that many, either. So 20% is good enough odds for an attacker.

John Deters
  • 33,650
  • 3
  • 57
  • 110
  • I updated the question with a thought .. many of you consider this practice insecure, so perhaps a responsible disclosure to Stripe is in order. – FloatingRock Jul 17 '14 at 08:47
  • And Braintree. But honestly, no one is interested in this vulnerability, because removing it would necessarily kill the business justification. – John Deters Jul 21 '14 at 04:03
4

tl;dr:
It is not possible to do such a thing in a truly secure manner, but it can be done in a "probably good enough" way for being usable and acceptable.

Your options are basically some form of encrypted storage (with the risk of encryption keys being stolen, as they need to be present for decryption), or something involving hashing, with the well-known issues of hashing passwords (a credit card number is basically a "password").

I would implement this as as two-level system where the first level, would be a simple CRC32 and the second level would be a many times iterated secure hash function. Yes, CRC32 is not a cryptographically secure hash, but that doesn't matter in this particular case, since the CRC is not the weak link.
If the thought of using a not-cryptographically-secure algorithm scares you too much, you can still replace the CRC with a cryptographically secore algorithm and only use e.g. the least significant 32 (or 16, for a bigger margin) bits from the resulting hash.

The reason for this approach is that the range of possible inputs only spans around 39 bits[1]. It is practically impossible to hash this keyspace in a truly secure way while keeping the system operational for the designated use.
A brute-force attack on a 39 bit keyspace is not only theoretically feasible, but in fact quite trivial -- unless each single operation is extremely expensive.

The main threat you need to defend against is theft of your database followed by brute-forcing your hashes to retrieve valid card numbers. The next best thing an attacker could do would be querying online whether some unknown number that hashes identically to a randomly submitted number has been used before, or querying whether an already known valid number has been used (neither one is very useful, though!).

Credit cards are usually valid for 5 years, thus for the system to be secure, it must withstand at least 5 years of brute-force. In other words, the work performed for a single verification must be so computionally complex that a dedicated multi-GPU cracking machine cannot perform more than 2^39/(5*365*86400) = 3486.5 verifications per second.

Taking the well-known 348-billion-per-second figure from 2012, the only really secure approach would thus have to store the value of having hashed at least 100 million times. Taking Moore's Law into account, that'd be 200 million by now, and we're not even considering the possibility that someone owns 2 or 3 such machines (or maybe 10).

However, this still does not take into account the fact that banks are inherently stupid and will simply increment the second last digit for a new valid number (and update the check digit). All my credit cards issued during the last 20 years have had the same number with the exception of the second-last digit incrementing 0, 1, 2, 3, ...
Which means that planning for 5 years still isn't enough. Someone knowning an expired credit card number can trivially derive the valid number of the follower card. You therefore need to plan rather for something like 30 years, not 5 years. I'll leave extrapolating Moore's Law to another 20-30 years for homework.

In conclusion, for the system to be really, really secure, your server would have to perform the equivalent of a trillion hashes (or more) per verification, which at a rate of, say 10 million per second for a typical CPU implementation, would take over a day. In other words, the system is completely unusable for its designated purpose.

Something that's several orders of magnitude faster is needed. Most queries will be negative queries, and a simple, fast hash could be used to prune out 99.9% of all negatives, if only it doesn't render the already weak security even weaker. This is where CRC32 comes in.

If the CRC32s of two credit card numbers don't match, it is guaranteed that the numbers are not the same. Most of the time, that will be the case, and you can immediately return "No!". You can skip 99.9% of the work without giving anything away.
If the CRCs do match, it might be a hit, but there is a chance of a hash collision. Now, the secure cryptographic hash (which would have at least 160, better 256 bits) is calculated and compared. The likelihood of a collision is neglegible now. If the hash matches, it is certain that the number matches.

CRC32 contains 32 bits of information. It is impossible to restore 39 bits of information from 32 bits, no matter what computional resources you may own. Therefore, the CRC is safe to use in this case (or at least, it is not less safe than anything else).

As pointed out by Ben Voigt, it is of course still possible to guess the information that cannot be restored, and 7 bits to guess aren't a terrible lot (concatenating the expiry date will make this 15 bits). That is true, but there is not much you can do about it.

What about CRC and security anyway? CRC is not cryptographically secure. It is relatively easy to tamper with bits to get a desired output (not a problem here, would be with logins), and the function can be trivially reversed (4 table lookups, plus a few shifts and xor) to a 32 bit input, a somewhat bigger problem. Luckily, but this bit pattern is pretty worthless since it isn't a valid number, nor a subset of any 39-bit pattern that might be a valid number (other than by sheer coincidence).

Brute-forcing the complete 39-bit keyspace is more straightforward than trying to play tricks on the CRC, since, unluckily, this is easy enough and fast: A SSE4.2 implementation uses about half a cycle per byte (so, 2 cycles per hash), which means a desktop computer could do the complete keyspace in a second or two (assuming the keys to compare with all fit in L1).

The exhaustive search would reveal all 39-bit patterns that have the same CRC, which means they could possibly be credit card numbers, and using Luhn's algorithm the attacker can immediately discard 9 out of 10 numbers which cannot be valid. That's bad, admittedly, but it is really just a consequence of not having too many input choices and having a check digit implanted on top!

Still, the CRC is actually stronger in this scenario than a cryptographically secure algorithm: Given 2-3 billion credit cards in circulation worldwide, the chance of a collision using a 160-bit hash is somewhere around 1 to 1020, and don't even think about the chances using a 256-bit hash. Which means that any hit during your exhaustive search of the 39-bit keyspace will immediately give you the one and only guaranteed valid card number. A hit on the CRC32 will only give you one of many possible numbers.

One thing you could do to trade some speed for security would be to further throw away some information from the CRC, for example, you could just discard the most significant 8 or 16 bits of the CRC, which would add these to the bits that an attacker will have to guess. Of course this will somewhat reduce the effectiveness of the pruning, but not too drastically. Instead of pruning 99.99% of the work, you might end up pruning 99% or 98%, which is still mighty fine.

The second level of verification, as stated above, needs to be a cryptographically secure hash that runs at least a million iterations much like it is done in a key stretching scheme (actually it needs much more than that, but the system needs to remain practically usable, too).
The implementor needs to trade risk versus usability here, a server should at least still be able to process a few dozen requests per second. Customers/merchants will not consent if looking up one number takes several minutes (or longer), no matter what the security implications are. Nor is it practical to have more or less one dedicated server dedicated per merchant.

Since matching is only interesting on a per-merchant base, one could use a constant-per-merchant salt and store card numbers redundantly along with an index on the merchant's ID, which should be within the powers of any reasonable database system.


[1] A credit card number has 16 digits of which the last is a checksum derived from the others (Luhn algorithm). The second but last digit is usually (not necessarily, but usually) the sequential card number, starting at zero for the account owner's first card, 1 for the spouse or a replacement card, 2 for the next card thereafter, and so on. Further, the first three digits encode the card issuer and the bank, and only use about half of the possible number of 3-digit combinations.
All in all, this leaves us with betwen 12 and 13 complete decimal digits of entropy, which is anywhere between slightly over 36 bits, and 39 bits.

I would recommend concatenating the expiry date to the number, as this will make this situation a bit less miserable. My credit cards are always valid for 5 years (not sure if that is the universal rule, but I'd assume so), so adding the expiry date would add 5*12 possibilities, or 5.9 bits (these come pretty much for free, since you need the expiry data in a transaction anyway!).

Damon
  • 5,001
  • 1
  • 19
  • 26
  • Thanks Damon. Just to clarify, RE: _(only a query of the type "did this particular person use that number before" would be possible)._ If I'm a merchant using Stripe, I don't care if the card was used with another merchant or not. The scope would be limited to the cards used on one particular merchant (i.e. _Me_). – FloatingRock Jul 17 '14 at 13:46
  • This is an absolutely *terrible* approach. Sure you cannot recover 39 bits from a CRC-32. But the space you need to brute-force is reduced to 7 bits :( It doesn't matter how large your cryptographic hash is, anything larger than the input space isn't expanding the search. – Ben Voigt Jul 17 '14 at 21:56
1

Complementing the previous suggestions. How about using SHA-3 (which apparently was not official until 2015, later than this discussion) prepending the card number with a merchant specific secret key in order to make the hashes merchant-specific.

Simply prepending a secret key to the message was insecure for SHA-1 and SHA-2 due to their length-extension weakness. This is not the case with SHA-3, which is resistant to length-extension attacks by design. As explained here, one could even use SHA-3 for MAC computation by simply prepending the message with the key.

Of course the problem of how to securely store the secret keys remains. I have no experience with the payment card industry. I just came across this question and found it interesting. I would love to get some comments from more experienced users on the issues specific to dealing with card numbers (and their relatively small search space) by using this approach.

KurtWegner
  • 11
  • 2
0

You must not store it per the PA-DSS standards defined here: https://www.pcisecuritystandards.org/documents/pa-dss_v2.pdf

It states:

1.1 Do not store sensitive authentication data after authorization (even if encrypted): Sensitive authentication data includes the data as cited in the following Requirements 1.1.1 through 1.1.3.

In the adjacent column it states:

If this payment application stores sensitive authentication data, verify that the application is intended only for issuers and/or companies that support issuing services

I assume you are not an issuer or a company that support issuing services. Keep in mind that Target Corporation was recently hacked and credit card information was stolen in real-time and from servers who did not encrypt the data end to end. It is my understanding that it is being rectified and made more secure. Frankly, do not store it period. Even if you use SHA512 hash, your implementation might be flawed. With the best countermeasures, I would not want to be on the end of a lawsuit.

Edit:

It technically is permitted in section 2.3 of PA-DSS v2.0. But don't do it.

Engineer2021
  • 125
  • 6
  • 4
    There's actually specific guidance around storing PAN hashes in section 3.4 of PCI-DSS v3.0. It is allowed, grudgingly and with caveats. – Xander Jul 16 '14 at 18:22
  • @Xander: Yea, I was hoping to avoid saying that and just say carte blanche don't do it :) – Engineer2021 Jul 16 '14 at 18:26
  • 2
    Actually, it's permitted, because the PAN isn't on that list of sensitive data (see page 12.) Sensitive data is either full track data, CAV2/CVC2/CVV2/CID, or PIN blocks. (Disclaimer: I am not a QSA, consult your QSA for your own situation.) – John Deters Jul 17 '14 at 00:41
  • @JohnDeters: Yes, but the PAN is included in Track 1 and 2. It's a bad paragraph really. They shouldn't allow it – Engineer2021 Jul 17 '14 at 10:44
  • There's no such thing as 'SHA512 encryption'. Hashing and encryption are very different, and it's dangerous to confuse them. – sapi Jul 17 '14 at 12:45
  • @staticx, the spec clearly states the PAN is customer data, just like cardholder name, and is not in the sensitive data column. Track data is sensitive because it contains CVV, not because it contains PAN. – John Deters Jul 17 '14 at 23:48
  • @JohnDeters: I understand that. My point is it is ridiculous to protect the tracks more than the PAN – Engineer2021 Jul 18 '14 at 00:02
  • @staticx, it's not ridiculous. The CVV data in the track is used as "evidence" that the mag stripe was present for the transaction, as opposed to the CVV2 number, which only indicates a human saw it. Banks assign less risk to Card Present transactions, so the retailer gets a reduced rate on processing them. Keeping CVV2 would let a retailer commit fraud. – John Deters Jul 21 '14 at 04:11
  • @JohnDeters: Yes, I understand all that. I have been through the PA-DSS audit process. Anyone can generate a Track 1 only credit card using the Customer's Name, the PAN, and the expiration date. Neither of which are considered sensitive data. Therefore, you could create a credit card easily using that data. I used to create my own for testing, it isn't that hard. My point is that they assign less risk to retaining the PAN than retaining the full track data. – Engineer2021 Jul 21 '14 at 09:50
  • @staticx, no, you cannot create a valid track 1 that will pass authorization unless you have the correct CVV. The bank will reject it. The PAN is less risky to keep than the CVV. – John Deters Jul 22 '14 at 02:27
  • @JohnDeter: I have created my own cards. It's not that hard. All you need is track one. It's easy to generate. I used to create them all the time for testing. And they would pass authority – Engineer2021 Jul 22 '14 at 08:40
  • @JohnDeters: The discretionary data where the CVC1 is stored is entirely optional. You can put 0's there. CVC1 is to there as a sort of checksum to make sure the data has not been tampered with and that is in the discretionary data. ALOT of issuers ignore this discretionary data as it is well, discretionary. CVC2 that is on the back of card usually, is meant for eyes only. – Engineer2021 Jul 22 '14 at 09:51
0

If you're going to do it, then add some kind of obfuscation. I just guaranteed myself downvotes, but obfuscation is not without value like some will insist.

If you assume the system is known, then there is no value. But if the system isn't known, there is absolutely value. An IV used to 'pepper', as Xander put it, would be a form of obfuscation, while not increasing collisions, and maintaining the ability to query.

A pepper wouldn't be a very effective obfuscation however, unless the pepper is large enough to not be bruteforce guessed in a reasonable amount of time. 100 random bytes would may be overkill but theres no reason not to go big.

Another thing that might be effective obfuscation would be to not assume that both the pepper and the message are attained at the same time, so that if you periodically rehash the hashes in a nightly batch (keeping track of how many rounds), they'd need another piece of info - how many rounds, or how the rounds are calculated.

Keep in mind that I don't think this is a good idea, I'm just thinking out loud here. If your environment is secure enough for credit cards, not only would I not hash them, I'd store them in a table designed for the fastest lookup performance.

Andrew Hoffman
  • 1,987
  • 14
  • 17
  • Check out the accepted answer .. it uses a random number generator that reveals nothing about the card number. – FloatingRock Jul 17 '14 at 18:28
  • 1
    Oh, well I read this all yesterday and there wasn't any context or use case, which changes everything. – Andrew Hoffman Jul 17 '14 at 18:33
  • @FloatingRock: Your weak point is the transit of the card from the application to Braintree. Also, there is a weakness at the swipe reader to the PC to the application. There are solutions out there that encrypt end-to-end but it costs money. – Engineer2021 Jul 22 '14 at 11:23