According to this paper from the Proceedings of the National Academy of the Sciences (PNAS), social security numbers (SSNs) are pretty easy for hackers, identity thieves, and other miscreants to predict based on publicly available data. I found this interesting partly because I just recently (a few months ago) wrote a chapter for a book discussing security for SSNs.
Here's the deal - all SSNs have a very regular structure that looks like this: xxx-yy-zzzz. With 9 numeric digits there are 1 billion possible combinations that can be assigned. And of course we have the same information that identity thieves have - the rules for SSN assignment are posted for the public at the Social Security Administration website.
Here are some of the key rules that determine how SSNs are assigned, summarized from the SSA website:
- xxx is a 3-digit Area Number, and is assigned based on the ZIP Code from which the request to assign the SSN originates.
- yy is a 2-digit Group Number, which is assigned in a predictable (nonconsecutive) order. The order of assignment of Group Numbers is also documented on the SSA website as well. It's always a number between "01" and "99".
- zzzz is a 4-digit Serial Number, which is a number between "0001" and "9999".
- There are a few stray SSNs that have been taken out of circulation for various reasons (used in marketing campaigns, etc.)
- And of course no SSN is ever reassigned.
According to the rules a bad guy can narrow down the scope of his search substantially just by eliminating all SSNs that begin with 8xx, 9xx, 666, and 000. That eliminates a couple 100 million+. No SSNs have been assigned with a Group Number above 772, eliminating tens of millions in the 773 - 799 range. No SSNs have, or will be, assigned with Group Numbers of 00 or Serial Numbers of 0000, eliminating millions more. In addition the Group Numbers that have been assigned are available from the SSA website high group list, knocking hundreds of millions more possible SSNs off the list.
This is just the beginning -- it gets better:
If you know where a person applied for their SSN (in many cases this will be where they were born, or close to it) you can use the SSN Allocations list to narrow down the search substantially. In some cases this won't work though, since some parents don't apply for an SSN for their child immediately at birth.
All this is to show how an identity thief can use the location and approximate date of birth to accurately guess the first 5 digits of the SSN. The PNAS authors were able to correctly guess the first 5 digits of SSNs with a single try for 44% of their test records.
At the other end of the spectrum, identity thieves can use the SSA's Death Master File (DMF) to narrow down the last 4 digits (the Serial Number). The PNAS authors used the DMF to figure out statistical distributions of SSN Serial Numbers to dramatically narrow down the last 4 digits. They correctly guessed the complete SSNs for 8.5% of the test records with less than 1,000 attempts each; making the SSN for 8.5% of those tested less secure than a 4-digit ATM card PIN (in fact the authors compared it to an insecure 3-digit financial PIN).
The authors' testing showed that overall full SSNs can be guessed with an accuracy of between 0.08% to 10% with less than 1,000 attempts each. In rural areas they guessed complete SSNs at the rate of >60% for rural areas on the very first attempt.
To put some hard numbers to it, the authors estimated (based on various fairly reasonable assumptions), that an identity thief targeting a specific location (like a given state) could guess SSNs and obtain credit card accounts at the rate of about 47 per minute.
Makes you wonder how secure your SSN is, really.