I’m trying to use a regex to extract the first author’s name from a thread such as: http://www.threadless.com/profile/470607/wotto/blog/247841/percentage_Blowwwwg
The relevant HTML is:
Aug 01 '07 by
<a class="lavendar" href="/profile/470607/wotto">wotto</a>
I’ve gotten this far. It matches the date.
preg_match_all('/[
][ ]{3}[A-Z]{1}[a-z]{2} [0-9]{2} \'[0-9]{2} by [\r]/', $html, $match);
The next step is to add the tab part: [ ]{7} to the end of the regex. But doing so makes the match start failing. It is strange that they used both newline [
] and return [\r] in the same document. Maybe both Linux and Windows programmers were working on their thread page code. Perhaps Linux has an analogous symbol to [ ] that I don’t know of?