Retrieving the source code from a webpage?

Is there any way I can use Flash MX 2004 to retrieve the source code from a webpage? For example, I give Flash the string “”, and Flash gives me the source code for google’s home page.

If not, does anyone know what programming language I can use to do this?

No. You can’t view the server side coding on webpages.

but you can see the finished source code, there’s a php function that’ll look up source code for pages, search for it on their site :wink:

something like

webcode=new LoadVars()

I thought you were kidding :P, that actually kind of works, other than it traces like a million times, but there seems to be html in there, I saw cellspacing?

I don’t know if this would help, but when you type in ‘view-source:’ minus the quotes (on IE at least), it pops up the source code. Don’t know if this might help in flash…


getURL ("view-source:");

in the first keyframe of a blank movie seems to work (pops up the source code in Notepad for me), maybe you could make some use of this after all?

Hope it helps! :beer:

Thank you McGiver, that was exactly what I was looking for.

By the way, I modified the code, so that it only traces once, and so that it decodes all the escape sequences.

[AS]webcode = new LoadVars()
webcode.onLoad = function()

This traces a readable source code to the webpage.

Thanks again!

Wow - this is really neat! I had no idea that could be done. Would either of you mind if I write a small article about this on the site? I’ll definitely give both of you credit for asking and answering :slight_smile:

hmm, concerning the article:
I just looked over it again, and it produces some problems:

-on the end of every code is a “&onLoad=%5Btype%20Function%5D” (== “&onLoad=[type Function]”). =>easy to solve, no problem at all.

-code is full of %20 and stuff. =>no problem at all, since you can decode it easyly to its original form ("%20"==" ")

  • stuff like " " (another way to display " " == coded space) causes problems (to be precise the “&” causes the problems) . I guess this happens because the “&” is the standart serverstring delimiter (i.e. &aaa=hallo&bbb=peops are stored as,… and not given back in the correct order)
    so if you “read” a source (i.e. google) the code is mixed up and some parts (i.e. " <a href=“abc”>hallo</a> <a href=“abc”>at all</a> replace each other because the " <a href=" is seen as a variable declaration) are even missing

    so you will have to work with the “%”+hexnumber stuff only (no “&”+something stuff)

here the improved code

String.prototype.tochars = function() {
   	var current = this;
   	var final;
   	var indpos;
   	var indpos2=this.indexOf("&onLoad=[type Function]",0)
   	indpos = this.indexOf("%", 0);
   	while (indpos != -1) {
 		this = this.substr(0, indpos)+String.fromCharCode(Number("0x"+this.charAt(indpos+1)+this.charAt(indpos+2)))+this.substr(indpos+3);
   		indpos = this.indexOf("%", 0);
   	return (this);
   webcode = new LoadVars();
   webcode.onLoad = function() {
   	strangestring = webcode.toString();
   	_root.createTextField("showhtml", 10, 5, 5, 890, 590);
   	htmlstring = strangestring.tochars();
   	_root.showhtml.text = htmlstring.tochars();

Cool, I wouldn’t mind if you wrote an article Kirupa.

McGiver: you could do that, or you could just use the built in unescape function (see the code in my other post in this thread); It does all that stuff for you.

And a suggestion for the article: it should talk about the Flash security issue. That is, how this technique works fine until you try to put it into a webpage or upload it onto the internet. Maybe it should include how to write a server-side proxy in order to fix this problem?

Following is Macromedia’s documented ASP proxy. I’ve used it, and it works well.

<%@ LANGUAGE=VBScript%>
	Dim MyConnection, TheURL
	' Specifying the URL
	TheURL = ""
	 Set MyConnection = Server.CreateObject("Microsoft.XMLHTTP")
	' Connecting to the URL
	MyConnection.Open "GET", TheURL, False
	' Sending and getting data
	TheData = MyConnection.responseText

	'Set the appropriate content type
	Response.ContentType = MyConnection.getResponseHeader("Content-Type")
	Response.Write (TheData)

	Set MyConnection = Nothing

I wrote something that returns the source code without the "&"s

 echo str_replace("&", "%26", $content);
webcode = new LoadVars();
 sendvar = new LoadVars();
 webcode.onLoad = function() {
 	strangestring = webcode.toString();
 	_root.createTextField("showhtml", 10, 5, 5, 890, 590);
 	htmlstring = unescape(strangestring);
 	var indpos = htmlstring.indexOf("&onLoad=[type Function]", 0);
 	htmlstring = htmlstring.substr(0, indpos)+htmlstring.substr(indpos+23);
 	_root.showhtml.text = htmlstring;