Hey guys,
I’m trying to implement a fuzzy string match regex that will:
- match only whole words not sentences
- accept typos
- accept missing chars
I found a regex that works ok - /a[^b]*b[^c]*c/gi
It matches 'abc', 'a123 b123 c'
and will match regardless of length, word boundary (white space) and chars between, but all chars must be present.
I didn’t want white space and unlimited chars between so I modified it to /a[^ ]?b[^]?c/gi
It matches 'a1bc'
but not 'a12bc'
*
to ?
matches only one random char between.
[^a]
to [^ ]
matches any char not white space.
The only issue is that all chars must be present to match.
The only way I can see around the issue would be to use 2 regex and split the string every second letter to try to match both.
e.g. search term 'Matching'
would become 2 regex M_t_h_n_
and _a_c_i_g
.
The issue with this is that a small search term like "air"
would match any i
and word length would have to be exact.
If I went back to *
instead of ?
even a four letter regex could match any 2 chars in any word.
I’m not really seeing a way around this bar writing some monster badly performant regex.
Does anybody have any out of the box ideas?