Don’t forget to account for new lines in your iOS NSRegularExpression

The NSRegularExpression class is handy for searching and parsing strings. That is, it is handy until it starts to fail at seemingly random intervals.

I had a regex that was working fine on my small set of initial test data, but when I expanded it to work on a slightly larger and varied set of data, all of a sudden the weather turned a bit cloudy.

After some investigation on Stack Overflow and diving through the Apple documentation, I realized that I should probably add the NSRegularExpressionDotMatchesLineSeparators option to my NSRegularExpression, and the skies turned bright and sunny again.

Here is an example of what I am talking about. In this instance, I am looking to obtain the text in an NSString called theString that is in between the sentinel literals of =BeginTag= and =EndTag=:

NSString *pattern = @"=BeginTag=(.*?)=EndTag=";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
    options:(NSRegularExpressionCaseInsensitive | NSRegularExpressionDotMatchesLineSeparators)
    error:nil];
[regex enumerateMatchesInString:theString options:0 range:NSMakeRange(0, [theString length])
    usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop)
    {
        NSString *s = [theString substringWithRange:[match rangeAtIndex:1]];
        NSLog(@"Enumerated matching string: %@", s);
    }];

If you leave off the NSRegularExpressionDotMatchesLineSeparators flag, and theString contains new line characters in between the sentinels, you end up with no text enumerated with the block.

BTW, Happy Birthday to Steven Spielberg, one of the true geniuses of our time.

Leave a Reply