Monday, April 21, 2014

Looping Over Regular Expressions in JavaScript

Much as JavaScript has literal forms for strings & numbers, it also has a literal Regular Expression (henceforth regex) form. So you can wrap characters in single or double quotes to make a string literal, & you can wrap characters in forward slashes ("/") to make a regex literal.

So regexes are literals in JavaScript…except JavaScript is sort of a broken language with regard to literals. The typeof operator is nearly useless.
typeof /foo/
// returns "object"…damn you, typeof
/foo/ instanceof RegExp
// true! hurrah for instanceof
Because regexes have a literal form of sorts, you can do nice things with them like put them in an array & loop over the array:
var tests = [/Foo/i, /BAR/, /baz/i],
    str = 'This is a sentence. Foo, says the sentence.';

// check if each regex has a match in sentence
tests.forEach(function(re) {
    if (re.test(str)) {
        console.log(re + ' is a match!');
    }
});
This is the approach I use in my Wordpress Spam Clicker bookmarklet: I have an array of regexes matching known spammer patterns which I loop over, testing each comment against them.

BUT what if you want to slightly modify each regex in an array? For instance, what if you want to loop over regexes but test for strings with a space at the beginning the match? You can save some typing & potentially (depending on how big the array of regexes is) a lot of bytes by storing a truncated version of the regexes & then modifying them later. Except it doesn't work:
var tests = [/Foo/i, /BAR/, /baz/i],
    str = 'This is a sentence. Foo, says the sentence.';

// check if each regex has a match in sentence
tests.forEach(function(re) {
    if ((/\s/ + re).test(str)) {
        console.log(re + ' is a match!');
    }
});
I'm trying to take each regex & prepend the special character for a space ("\s"), so /foo/i should become /\sfoo/i. But the addition operator doesn't work here, JavaScript doesn't know how to add 2 regular expressions, instead it casts them to a string (typeof (/\s/ + /foo/i) === 'string'). What do?

Well, JavaScript also has constructor functions for all its literals: String(), Number(), & RegExp(). Generally, you do not use these. I repeat, if you're writing code like var count = new Number(0) you can stop it, stop it right now. One reason is that typeof count will return "object" if count was created with a constructor. But also it's just an unnecessary amount of typing.

BUT it turns out that compiling regexes from strings can be done using the RegExp constructor. So to achieve my earlier goal I can write:
var tests = ['Foo', 'BAR', 'baz'],
    str = 'This is a sentence. Foo, says the sentence.';

// check if each regex has a match in sentence
tests.forEach(function(re) {
    if (RegExp('\\s' + re).test(str)) {
        console.log(re + ' is a match!');
    }
});
Instead of storing regexes which are later cast to strings, I can store strings & then essentially cast them to regexes using the RegExp constructor. I have to escape the backslash to ensure \s makes it into the regex, but it works. The RegExp constructor takes the regex flags as a second argument too, so I could write RegExp('\\s' + re, 'i') to make all my regexes case insensitive. This, too, could be very handy & save a lot of bytes/typing.