Playing with Regular Expressions, part 1 – Find the last word in a sentence

As a developer I quite often run into situations where I need to find an occurrence of a word or phrase in a text or some kind of number or pattern in a string. Regular expressions makes these tasks relatively simple and usually you will find loads of examples on how to match your specific pattern on the internet. This blog series will cover how to think when working with regular expressions. 

In all the examples in this blog series I have used the same sample text (you will see it in the samples). I have used an excellent tool named Expresso to evaluate all expressions in this blog entry. Of course it is possible to tweak the expressions in my examples below so that it searches other patterns as well.

Problem

I want to retrieve the last word in a sentence.

Solution

Use the \b anchor together with the pattern that ends the sentence to instruct regex that you only want the word that appears right before the dot-character, carriage return (and/or) line feed or the dollar-character that represent the end of the string.

The pattern below will match all the last words in each sentence.

(?<myNamedGroup>\b\w+)(?:\.|\r\n)

 

You should se the following result when executing the expression against the sample text.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut eu sem nisl.
Nulla elementum consectetur leo nec consequat. Vestibulum quis libero sit amet arcu euismod bibendum a.

Nulla elementum:    1389-89-1443

Praesent a nibh sed augue mollis vehicula.
Vestibulum nisl elit, eleifend a tristique nec, faucibus a sem.

Explanation

  • The first parenthesis tries to match the first word and will put the word found in a named group named myMatchedWord when a word is successfully matched.
  • \b\w+ will match every word in the text that has at least one character. The \b anchor matches either the beginning or the end of a word.
  • (?:\.|\r\n) will match either a dot or the combination of carriage return and line feed. The (?: part of the expression tells regex to skip the pattern in the matched content.
  • The whole expression together will return all words where the first part of the expression is combined with one of the expressions found in the second part.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s