Matching passwords – a look into the wonderful world of regular expressions

Ok, so I am a big fan of regular expressions. There lies a great strength in those few characters that you write to validate your input, but it can be a little bit tricky to get the expressions to do exactly what you want.

I got a question from a collegue that wanted to validate passwords with regex, but a task that seemed so trivial was as always not that easy. His original regex was ^.*(?=.{9,})(?=.*[0-9]{2,})(?=.*[a-z\d]).*$ but the problem was that it only validated strings that had two numbers in a row and not two numbers that could appear on different locations in the string. 

I began to dive into the problem based on the rules that was stated for the passwords that should be validated:

  • Minimum 8 charachters in the password.
  • Minimum 2 numbers somewhere in the password.

Lets analyze the regular expression above:

  • The characters in the beginning ^.* tells that zero or infinite alpha- or non alphanumeric characters may occur from the beginning of the string to the start of the pattern that matches the regular expression. The caret sign, i.e. ^ tells that the matching should start from the beginning of the text. The dot, i.e. . means any alpha- or non-alphanumeric character. The asterisk, i.e. * means that the preceding character should occur zero or infinite times.
    • This part of the regular expression was not ok since it had no purpose.
  • The construct (?=.{8,}) tells that the string should contain 9 characters that can be alphanumeric or non alphanumeric characters.  
    • So this construct was almost ok since it did what it was intended to do.
  • The construct (?=.*[0-8]{2,}) states that we are looking for zero or infinite alpha- or non alphanumeric characters followed by two numbers.
    • This part of the expression had no purpose.
  • The construct (?=.*[a-z\d]) tells almost the same as above, but are only looking for alphabetical charachters between a-z or any number.
    • This part of the regular expression had no purpose.
  • The last part of the regular expression .*$ states that we are looking for zero or infinite alpha- or non alphanumeric characters that occurs before the end of the string to be searched. The dollar sign, i.e. $ marks that it is the end of the string.
    • This part of the regular expression was not ok since it had no purpose.

So how do we find the solution to this problem? Well, if we take the construct (?=.{8,}) we have a good start to begin with. The problem was that the string should contain 8 charachters and of all the characters ther should be at least two numbers. Hmm, two numbers that would be (?=\d{2,}) since \d is the shorthand for any number in regex and {2,} tells that it should be two or infinite numbers in the string to be searched, but this is not enough since this pattern tells that it should be two numbers in a row. The correct pattern is (?=.*\d{2,}) which gives us the correct result.

What happens if we combine these two constructs? Well, we get almost what we want. The solution to the problem was a combination of the patterns described above which yielded in the pattern (?=.{8,})(?=(.*\d){2,})

To state that the password should containt at least one non alphanumeric character we could add the pattern (?=(.*\W){1,}) which would give us the final result of the pattern (?=.{8,})(?=(.*\d){2,})(?=(.*\W){1,})

Hope that this gives you some clues on the power and complexity of regular expressions.