Author |
Topic |
|
Clean
Starting Member
34 Posts |
Posted - 24 Aug 2012 : 11:39:09
|
Alessio,
I was fooling around using a "http://www.google.com/search?". The engine took some of the returned web page but ommitted a lot. Have you already looked into this, and if not, what would you need from me to experiment with (temp files, script copies, etc...) |
|
JDommi
Administrator
Germany
4653 Posts |
Posted - 24 Aug 2012 : 13:02:04
|
??? |
In order to achieve what is possible, you have to try the impossible over and over again. Hermann Hesse |
|
|
Clean
Starting Member
34 Posts |
Posted - 25 Aug 2012 : 03:11:54
|
#WEBQUERY#=http://www.google.com/search?site:http://Web Address Here
A Google Site Search. Very useful for sites that, while no longer stocking the DVD, have not removed the web page of the product but just do not include it in their own search results. |
|
|
JDommi
Administrator
Germany
4653 Posts |
Posted - 25 Aug 2012 : 09:00:46
|
And how will you get the info by script? I think the structure of these sites don't fit any of the existing scripts or even are compatible among each other. |
In order to achieve what is possible, you have to try the impossible over and over again. Hermann Hesse |
|
|
Clean
Starting Member
34 Posts |
Posted - 26 Aug 2012 : 03:49:12
|
quote: Originally posted by JDommi
And how will you get the info by script? I think the structure of these sites don't fit any of the existing scripts or even are compatible among each other.
Getting the info from the site is not the problem. The current problem is the info returned by the query and stored in the temp txt file is not the same as what google actually returns. It is missing a lot of info. |
|
|
JDommi
Administrator
Germany
4653 Posts |
Posted - 26 Aug 2012 : 09:54:25
|
When you are using the already existing google script: it's only for images so I think it will show only the header. But I never have used this script. Or do you mean that there are missing results? Then I would say that the script doesn't go into any loop to get info from more than one side. Sorry, but only reading your initial post ist's difficult to understand your problem. As some are used to say: Do I have holes in my hand? Or can I walk on water? |
In order to achieve what is possible, you have to try the impossible over and over again. Hermann Hesse |
Edited by - JDommi on 26 Aug 2012 09:54:52 |
|
|
Prinz
Senior Member
Germany
1522 Posts |
Posted - 26 Aug 2012 : 12:55:00
|
Magic Script isn't a Browser and doesn't render a Page, it only loads the pure HTML Code from the Site. To load Embedded Files in the html code the site would need to be rendered and that is not practical. |
|
|
Clean
Starting Member
34 Posts |
Posted - 26 Aug 2012 : 16:38:22
|
quote: Originally posted by JDommi
When you are using the already existing google script: it's only for images so I think it will show only the header. But I never have used this script. Or do you mean that there are missing results? Then I would say that the script doesn't go into any loop to get info from more than one side. Sorry, but only reading your initial post ist's difficult to understand your problem. As some are used to say: Do I have holes in my hand? Or can I walk on water?
Much of what is returned is missing. MS produces a temp file of 24K but the page is 214K.
Also, I am not using any existing scripts. I write my own. |
|
|
Clean
Starting Member
34 Posts |
Posted - 26 Aug 2012 : 16:57:40
|
quote: Originally posted by Prinz
Magic Script isn't a Browser and doesn't render a Page, it only loads the pure HTML Code from the Site. To load Embedded Files in the html code the site would need to be rendered and that is not practical.
Not looking for it to render a page. Just trying to figure out why the info returned is so different.
For example, try this (#WEBQUERY#=http://www.google.com/search?q=site:http://www.dvdempire.com/#MOVIE#) in a script and use 12 Monkeys as the movie to search in MagicScript Editor. Now do a search for "site:http://www.dvdempire.com/ 12 Monkeys" in google. Check the temphtml1.txt file and it will be appx 24k when the page returned is 214k.
Why search this way? If you do a search at DVD Empire for 12 Monkeys and check it against the results returned by google you will notice DVD Empire does not have "12 Monkeys: Collector's Edition" (the fifth result from google I believe and the copy I own) But through google you can get to this DVD. The reason is that it is discontinued and DVD Empire removes it from their own search results but the page and info are still there.
So, any ideas what may be causing the hiccup in MS on the google searches?
EDIT: One thing I do notice is that MS brings the page in as one line when there are line breaks (<wbr>) all over the page but they do not show up in the temphtml1.txt file. Curious is MS has a problem with HTML5. |
Edited by - Clean on 26 Aug 2012 17:06:07 |
|
|
Prinz
Senior Member
Germany
1522 Posts |
Posted - 26 Aug 2012 : 18:24:13
|
You compare the rendered page to the unrendered page like I sad. If load a page in the browser the browser loads additional content to the Page (=rendering).
And a server sends always Data formated for the specific Browser, so the Data is different depending on the used browser or downloader, the location you are in and more things. Sites like google aren't static pages, the are created depending on the request.
Therefor never use the source from your browser for magic script parsing, use the source in the .txt temp files. Magic Script is a File Downloader not a Browser and gets in some cases different pages. |
|
|
Clean
Starting Member
34 Posts |
Posted - 26 Aug 2012 : 20:12:45
|
quote: Originally posted by Prinz
You compare the rendered page to the unrendered page like I sad. If load a page in the browser the browser loads additional content to the Page (=rendering).
And a server sends always Data formated for the specific Browser, so the Data is different depending on the used browser or downloader, the location you are in and more things. Sites like google aren't static pages, the are created depending on the request.
Therefor never use the source from your browser for magic script parsing, use the source in the .txt temp files. Magic Script is a File Downloader not a Browser and gets in some cases different pages.
Not sure what you mean by using the source from my browser. If you are referring to the actual rendered page that is displayed in firefox, that is not what I use. I do, however, use the page source (ctrl U) in firefox and have been for the last 6 or 7 years I've been writing scripts for MS and it has always worked. This is the first time I have come across a web site whose MS download differs so dramatically from what the actual page source is. |
|
|
Prinz
Senior Member
Germany
1522 Posts |
Posted - 26 Aug 2012 : 20:27:00
|
quote: Originally posted by Clean Not sure what you mean by using the source from my browser. If you are referring to the actual rendered page that is displayed in firefox, that is not what I use. I do, however, use the page source (ctrl U) in firefox and have been for the last 6 or 7 years
That is the rendered source, since you used Firefox to load it and any Browser renders the the page. And with more advanced browser and server you get very different pages. The Server knows what browser requested the page and sends the best page for it. For example html5 if the browser supports and html4 if it doesn't... and many more things. |
|
|
Clean
Starting Member
34 Posts |
Posted - 27 Aug 2012 : 03:05:56
|
quote: Originally posted by Prinz
quote: Originally posted by Clean Not sure what you mean by using the source from my browser. If you are referring to the actual rendered page that is displayed in firefox, that is not what I use. I do, however, use the page source (ctrl U) in firefox and have been for the last 6 or 7 years
That is the rendered source, since you used Firefox to load it and any Browser renders the the page. And with more advanced browser and server you get very different pages. The Server knows what browser requested the page and sends the best page for it. For example html5 if the browser supports and html4 if it doesn't... and many more things.
Thanks for the input, it is fixed. MS reads google fine but my script was wrong. It had to be #WEBQUERY#=http://www.google.com/search?q=site:http://www.dvdempire.com/ #MOVIE#... the space before #MOVIE# is necessary. |
|
|
|
Topic |
|