file_get_contents() gives slightly altered file contents

NeoDreamer · December 11, 2008, 6:22pm

I have a simple echo of a website’s source:


echo file_get_contents('http://www.threadless.com/blogs/blogs');

A small portion of the output is a little fishy:


<a class="pagea selected" href="/blogs/blogs?token=ccaea4f99cbadd8262c148c86e1d8b06&uuid=5abf8a35510975e77f4618b544f7fe65/page,1">1</a> <a class="pagea " href="/blogs/blogs?token=ccaea4f99cbadd8262c148c86e1d8b06&uuid=5abf8a35510975e77f4618b544f7fe65/page,2">2</a>  <a class="pagea " href="/blogs/blogs?token=ccaea4f99cbadd8262c148c86e1d8b06&uuid=5abf8a35510975e77f4618b544f7fe65/page,3">3</a>  <a class="pagea " href="/blogs/blogs?token=ccaea4f99cbadd8262c148c86e1d8b06&uuid=5abf8a35510975e77f4618b544f7fe65/page,4">4</a> <a class="pageelipsa" href="/blogs/blogs?token=ccaea4f99cbadd8262c148c86e1d8b06&uuid=5abf8a35510975e77f4618b544f7fe65/page,2826">...</a>  <a class="pagea " href="/blogs/blogs?token=ccaea4f99cbadd8262c148c86e1d8b06&uuid=5abf8a35510975e77f4618b544f7fe65/page,2827">2,827</a>  <a class="pagea " href="/blogs/blogs?token=ccaea4f99cbadd8262c148c86e1d8b06&uuid=5abf8a35510975e77f4618b544f7fe65/page,2828">2,828</a> 		<div class="pagecontext grey">(84,833 results!)</div>

When viewing the page source of the actual page in a browser, that same area is:


<a class="pagea selected" href="/blogs/blogs/page,1">1</a> <a class="pagea " href="/blogs/blogs/page,2">2</a>  <a class="pagea " href="/blogs/blogs/page,3">3</a>  <a class="pagea " href="/blogs/blogs/page,4">4</a> <a class="pageelipsa" href="/blogs/blogs/page,2826">...</a>  <a class="pagea " href="/blogs/blogs/page,2827">2,827</a>  <a class="pagea " href="/blogs/blogs/page,2828">2,828</a> 		<div class="pagecontext grey">(84,833 results!)</div>

The file_get_contents() actually returns a slightly different contents than the actual page! The function seems to have added “?token=randomString” to all of the page traversal URL’s. Since I’m working on a web crawler, this is a big no no. What’s going on?

Topic	Replies	Views
Problem with file_get_contents()	88	August 26, 2007
[PHP] file_get_contents programming	55	June 19, 2007
PHP file_get_contents programming	75	August 15, 2005
PHP: problem with using file_get_contents() programming	79	October 28, 2007
What in the name of god is happening?!	88	August 16, 2005

file_get_contents() gives slightly altered file contents

Follow:

Popular

Loose Ends

file_get_contents() gives slightly altered file contents

Related topics

Follow:

Popular

Loose Ends