5ubliminal@twitter

PHP Script To Get Google Search Results Pages (SERPs) : 5ubliminal's TellinYa

<a href="http://www.tellinya.com/art2/100/">PHP Script To Get Google Search Results Pages (SERPs) : 5ubliminal's TellinYa</a>
5ubliminal's YAMS
Update: When I pasted this script I had some problems with the slashes that got lost on the way. I did not notice it and most who use it fixed them but I changed the script now to a functional version.

This has to be regarded as an educational script and should not be used outside Google.com guidelines! Check here for The MSN Version and this is the link to a Google results counter script.

Warning : This article uses and relies on the PHP Script cUrl Class which is mandatory for the following script to work!

Why would one need to parse the Google Results?

You should not need to do this. This script is educational and not for unethical use!

But … hypothetically speaking … this could be used for several reasons. To find your ranking in the SERPs, to see how many pages you have indexed, to see competition and watch certain keyphrases for new sites or sites that drop and so on …

If you can't help it and you will use this use common sense and do not hammer Google. You will get a temporary ban and it's not so great. Keep at least 30-60 seconds between queries and do not do more then 25 in a row. Simulate human behaviour or be kicked. Even if you help Google save bandwidth by not viewing the search page dependencies you loose him money for not clicking the ads!

Detect supplemental pages with this script.

There is a trick, don't know how long it will last that allows you to use a * at the end of the search to find non-supplemental pages.

So site:yoursite.com/ will show all pages indexed, and site:yoursite.com/* will list those in main index (non-supplemental). Do the math and get the list of supplemental pages by differencing the two arrays. PHP coders are always lucky on this site:

<?
// Easy way to spot your supplemental pages
$AllLinks googleLinks("site:yoursite.com/",1,100);
$MainIndex googleLinks("site:yoursite.com/*",1,100);
$Supplementals array_diff($AllLinks,$MainIndex);
?>

Virtual legends say that site:yoursite.com/& brings supplementals directly but the numbers do not add up so I do not consider this query functional.

The PHP Script To Parse Google Results

Enough with the chit-chat! Let's get on to the code …

$page parameter is the page number and starts from 1. Never use 0-9 pages but 1-10!

<?
function googleResults(
    $query,$page=1,$perpage=10,
    $dc="www.google.com",$filter=true
){
    if($page) $page--;
    $url=sprintf("http://%s/ie?q=%s&num=%d&start=%d&hl=en&ie=UTF-8&filter=%d&c2coff=1&safe=off",
        $dc,urlencode($query),$perpage,$page*$perpage,$filter);
    $hc=new eHttpClient();
    $hc->setReferer("http://".$dc."/");
    $html=$hc->get($url);
    if(!preg_match_all( "/<nobr>(.+?)<\/nobr>/is", $html, $matches))
        return false;
    $matches=$matches[1];
    $results=array();
    for($i=0;$i<count($matches);$i++){
        $match=trim($matches[$i]);
        if(!preg_match_all( "/(.+?)\.\s<a title=[\"](.+?)[\"] href=(.+?)>(.+?)<\/a>/i",
            $match, $parts)) continue;
        $parts[4][0]=strip_tags($parts[4][0]);
        array_splice($parts,0,1);
        $LinkTitle    =trim($parts[3][0],"\r\n\t \"");
        $LinkDesc    =trim($parts[1][0],"\r\n\t \"");
        $Rank        =trim($parts[0][0]);
        $LinkUrl    =trim($parts[2][0],"\r\n\t \"");
        if(!strstr($LinkUrl,"://"))
            continue;
        if(!preg_match("/^([^:]+):\/\/([^\/]+)[\/]?(.*)$/",$LinkUrl,$Dom)){
            continue;
        }
        $Http=$Dom[1];
        $Rel="/".$Dom[3];
        $Dom=$Dom[2];
        $serp=array(
            "Rank"            => $Rank,
            "Url"            => $LinkUrl,
            "Title"            => trim(html_entity_decode(strip_tags($LinkTitle))),
            "Host"            => $Dom,
            "Protocol"        => $Http,
            "Path"            => $Rel,
            "Summary"        => trim(html_entity_decode(strip_tags($LinkDesc))),
        );
        array_push($results,$serp);
    }
    return $results;
}
// --
function googleLinks(
    $query,$page=1,$perpage=10,
    $dc="www.google.com",$filter=true
){
    $res=googleResults($query,$page,$perpage,$dc,$filter);
    $links=array();
    for($i=0;$i<count($res);$i++){
        $link=$res[$i]['Url'];
        array_push($links,$link);
    }
    return $links;
}
?>

The googleLinks is a helper function and will only list the URLs of the results.

And finally a sample output …

It is the first element of an array of 10 output by:

<? print_r(googleResults("site:tellinya.com/",1,10)); ?>

Array
(
    [0] => Array
        (
            [Rank] => 1
            [Url] => http://www.tellinya.com/
            [Title] => (I'm)TellinYa - Bits of wisdom published by regular people
            [Host] => www.tellinya.com
            [Protocol] => http
            [Path] => /
            [Summary] => (I'm)TellinYa - Bits of wisdom published by regular people.
        )

)

n-Joy and stay safe! I take no responsabilities, ... bla bla bla.

28 Comments Posted By Readers :

Add your comment
#1 Ceros from Philippines
Posted on Wednesday, 19 September, 2007
cud you kindly tell me how do i scrap the top 5 url links in a google search???using curl?
#2 5ubliminal web
Posted on Wednesday, 19 September, 2007
Copy the code from here and the one I refer on top of the page and then use:
$links=googleLinks("your search goes here",1,10);
$links=array_slice($links,0,5);
And you got the 5 links.
#3 Référencement Google from Switzerland web
Posted on Thursday, 06 March, 2008
This script is nice but it doesn't always work on shared hosting, you need to set a proxy. Adding proxies will also let you run this function in a loop to retrieve ranks for multiple requests. Thanks for the script :-)
#4 Small Business SEO from United States web
Posted on Friday, 04 April, 2008
Great script thanks.
#5 Sharry from Great Britain web
Posted on Friday, 11 April, 2008
I've just found your site after spending the last two days looking for this code, I'm getting this error:
"Parse error: syntax error, unexpected ']' in /www/site.com/page.php on line 29"

line 29: if(!preg_match_all( "/(.+?).s(.+?)/i",

Any help would help me greatly. Thank you.
PS - You're site is great. Keep up the great work :D
#6 5ubliminal web
Posted on Friday, 11 April, 2008
I fixed this. I placed a notice on top about the problem.
Thanks for telling me. Everyone who uses it seems to have fixed it themselves but the version on site should work.

Let me know if it's got other issues.
#7 Sharry from Great Britain web
Posted on Friday, 11 April, 2008
Thanks for quick response, much appreciated. Add/Stripslashes of course!! Getting a few more errors now though:
I'm using:
------------------------------------------------
class eHttpClient{
.....
}
function googleLinks(
....
)
$links=googleLinks("mylo forums",1,10);
$links=array_slice($links,0,5);
print "";
print_r($links);
print "";
--------------------------------------------------
But I get these errors:
Warning: Missing argument 2 for eHttpClient::get(), called in /www/site.com/page.php on line 272 and defined in /www/site.com/page.php on line 123
Warning: Missing argument 3 for eHttpClient::get(), called in /www/site.com/page.php on line 272 and defined in /www/site.com/page.php on line 123
Warning: Missing argument 4 for eHttpClient::_prepare(), called in /www/site.com/page.php on line 131 and defined in /www/site.com/page.php on line 74
Warning: Missing argument 1 for eHttpClient::getInfo(), called in /www/site.com/page.php on line 197 and defined in /www/site.com/page.php on line 226

Thank you 5ubliminal
#8 5ubliminal web
Posted on Friday, 11 April, 2008
Please read the comments to understand the difference between errors and warnings and learn of variable parameter functions.
10 more people asked me these question.

Cheers.
#9 Sharry from Great Britain web
Posted on Friday, 11 April, 2008
All sorted. Thank you for your skills & patience (with my lack of reading skills), hope you have good weekend.
#10 5ubliminal web
Posted on Friday, 11 April, 2008
I'm glad it worked. U2!
#11 bobobelix from Switzerland web
Posted on Saturday, 19 April, 2008
Hi the script is great but I get a forbidden 403 error when running it on my server , i've tried changing the request but still get this error. What can I do to avoid getting the errror ???
#12 5ubliminal web
Posted on Saturday, 19 April, 2008
You're being to violent and Google blocked you.
You need to wait or enter the Captcha. Or change IP. Which you can't so you just wait it out.

Use a decent rate of request to stay unblocked.:) Don't abuse.
#13 bobobelix from Switzerland web
Posted on Sunday, 20 April, 2008
Well it's a shared hosting so I don't know what others are doing (maybe this is the problem), personnally I never used the script before and got the 403 error the very first time I used the script. If you have any other suggestions... thanks anyway, works fine in local though.
#14 5ubliminal web
Posted on Sunday, 20 April, 2008
Others may be abusing Google with queries too.
It works on local… so it's the server's fault. Can't help you in any way here.

Try to get some proxies in between… see if it works.
#15 silvermario from Poland
Posted on Saturday, 26 April, 2008
Hello, first thx for those scripts.
I wanted to find or to make a script that gives me the number of results for a given keyword, so I tried to modify your script. But when I var_dump the $html variable it seems that the page that i'm getting is not the same whole search engine results page that i'm getting while searching google through a browser. Could you be so kind and tell me what I am doing wrong, please?
#16 5ubliminal web
Posted on Sunday, 27 April, 2008
Hold your horses. I'll have one posted in a few days as soon as Easter is over :)
Subscribe to RSS and you'll be notified.

Regards.
#17 Ali Usman from Pakistan
Posted on Wednesday, 04 June, 2008
Hi, This is a great script. Thanks for serving the humanity :-)
#18 ninep from Thailand
Posted on Friday, 11 July, 2008
Oh! Thank u.
#19 marchionni from Switzerland
Posted on Sunday, 03 August, 2008
great script, thank you for publishing it!

i noticed i get different results when querying google http://www.google.com/search?q= rather than http://www.google.com/ie?q=
any thoughts on that?

cheers

ps: your anti spam captcha doesn't work in my firefox, i had to use IE for this post.
#20 5ubliminal web
Posted on Sunday, 03 August, 2008
I'll look at the captcha but didn't have problems so far.

Google results may change on refresh, depend on your gl(=US) (results locations), whatever restrictions you may have in your browser settings.
I haven't noticed different results if all location/language related variables are similar.

Different results are normal as every request may be redirected do a different datacenter with slightly differet result sets. Look this up as it's normal behavior.

If you see things way too different send my your queries by email and I'll look into them but in a few days as right now I'm away and 'off duty' :)

Cheers and enjoy summer.
#21 reinversion from United States
Posted on Wednesday, 06 August, 2008
Brilliant is and understatement for your work.

When you return I am curious if you think this can be adapted for a unique purpose of mine. I am attempting to setup a script that will allow me to perform google searches via email.

IE: I will have a fixed inbox monitored by a perl script automated as a cron job, the script will read the subject of the email sent by a requester and interpret it as the search query passing it along to your function. Your function will return the results to the perl script in order that it may send the data back to the requester.

The main purpose of the script will be to provide locale data more than anything else. For instance if the query is "starbucks+on+water+street+city+state" it would return the locations of interest as well as their address and phone number. The reason I am leaning towards searches done using google maps is simply because they seem to already be truncated, eliminating superfluous data.

The practical application in the long run will be to allow myself and other users access to google searches via email enabled cell phones for ease of use and ability to quickly forward information to others. Mainly it is a fun project I came up with and am trying to execute it in the best manner possible!

I hope you're having a great vacation, thanks for your assistance!
#22 5ubliminal web
Posted on Thursday, 07 August, 2008
Thanks ... u2. I'll get back to you when I return.
It won't be as easy at it seems .
#23 Dave from United States
Posted on Wednesday, 13 August, 2008
If you were to create a script that checked the number of results in Google for a large number of keywords, would this be a problem? I'm new to scraping and am not sure what is and isn't allowed. Thanks.
#24 5ubliminal web
Posted on Thursday, 14 August, 2008
I already have the google results count script.
But you need proxies as you get blocked after several quick requests. With proxies it works, without ... not for many.
#25 Anthony from United States
Posted on Tuesday, 30 September, 2008
Can you provide information on how to set up proxies?
#26 5ubliminal web
Posted on Wednesday, 01 October, 2008
Check curl_setopt. Search proxy on page. You'll find the option flags you need to set.
#27 Ryan from United States
Posted on Saturday, 11 October, 2008
Thanks so much for this!!! Do you have a rough estimate as to how much use will get your IP blocked? I wonder for example if I use it to do two queries every 15 minutes if I'll be safe.

Thank you so much!
#28 5ubliminal web
Posted on Sunday, 12 October, 2008
Do some tests but one every 10 seconds is quite safe.
2 every 15 minutes is ... super safe.
Post Feedback 
Name *
Mail *
URL
« Anti-Spam
» URL will only go live after a review. Comments are moderated. «
5ubliminal's TellinYa.com SEM & SEO Blog © 2007 - All rights reserved unless mentioned otherwise .
Rendered On : [Friday, 21 November, 2008 - 10:08:39 GMT]   No Ajax / Flash Used Here
" PHP Script To Get Google Search Results Pages (SERPs) : 5ubliminal's TellinYa "