5ubliminal@twitter

Parsing the Google Suggests in PHP : 5ubliminal's TellinYa

<a href="http://www.tellinya.com/art2/259/">Parsing the Google Suggests in PHP : 5ubliminal's TellinYa</a>
5ubliminal's YAMS
Warning : This article uses and relies on the PHP Script cUrl Class (aka eHttpClient) which is mandatory for the following script to work! Make sure you include it before or use your own page retrieval on the URL provided.
What is Google Suggests?

This is not an official name but I call it so. It's actually that JavaScript that suggests keywords when you type your search. And the following script will allow you to reuse those suggestion Google gives you.

The PHP code for scraping the Google Suggests:

I made it using JS as $type but Blackhat-seo from blackhat-seo.com told me about the JSON and XML types also available. He also mentioned this cUrl class but, without modesty, I have to say mine looks more friendly.

So I decided to make a one-size-fits-em-all function so you can choose the type you like and the ouput is the same. It uses the json_decode function but if unavailable switches type to something that older PHP can handle too. The code is below:

<?
//$type can be one of xml,json,js
function googleSuggests($keyword,$justKeys=0,$lang="en",$type="xml"){
    if(!
is_string($keyword)) return false;
    
//Invalid type: use one of xml,json,js
    
if(!strstr("'xml','json','js'","'".$type."'")) return false;
    
//Older php. Function unavailable ... SWITCH!
    
if(!function_exists("json_decode")){ $type="xml"; }
    
//Load the url basen on parameters
    
$url=sprintf(
        
"http://www.google.com/complete/search?hl=$lang&$type=true&qu=%s",
        
urlencode($keyword)
    );
    
//We load the page
    
$hc=new eHttpClient();
    
$html=$hc->get($url);
    
//We make it all single-line
    
$html=preg_replace("/\s+/"," ",$html);
    
//Decode data based on type
    
if($type=="json"){
        
//Decode JSON
        
$results=json_decode($html);
        
$results=array_combine($results[1],$results[2]);
    }elseif(
$type=="js"){
        
//It all starts with the following. Not found = invalid
        
if(!strstr($html"new Array(2, ")) return false;
        
$html=substr(strstr($html,"new Array(2, "),strlen("new Array(2, "));
        
//We match all results
        
if(!preg_match_all("/\"([^\"]+)\", \"([0-9,]+)\\s[^\"]+\"/i",
            
$html$matches)) return false;
        
//We build result array
        
$results=array_combine($matches[1],$matches[2]);
    }elseif(
$type=="xml"){
        
//I used this .*? in case they insert spaces or stuff between
        
if(!preg_match_all(
            
"/<suggestion data=\"([^\"]+)\"\/>.*?".
            
"<num_queries int=\"([0-9]+)\"\/>/i",
            
$html,$matches)
        ) return 
false;
        
$results=array_combine($matches[1],$matches[2]);
    }
    
//We fix numbers
    
if($type!="xml"){
        
//XML outputs numbers directly! no point for this!
        //I used regexp in case they change the word results into smth else
        
foreach($results as $keyphrase => $count){
            
$count=preg_replace("/\s([^\s]+)$/i","",$count);
            
$count=preg_replace("/[^0-9]/i","",$count);
            
$results[$keyphrase]=(int)$count;
        }
    }
    
//We return it. If just keys we dump results count!
    
if($justKeys) return array_keys($results);
    return 
$results;
}
?>

How do I use it?

No matter which of the three types (xml,json,js) you use, output exactly the same. Use whichever and the below will apply.

Just assign the result of the function to an array and output it. Remeber, by setting justKeys to 1 you will no longer get result count but just an array of keyphrases.

<?
print_r
(googleSuggests("google suggests"));
?>
Will return:
Array
(
    [google suggests] => 12200000
    [google suggests labs] => 2240000
    [google suggests lab] => 2060000
)

<?
//Here I ask just for keyphrases by setting $justKeys to 1(true)
print_r(googleSuggests("google suggests",1));
?>
Will return:
Array
(
    [0] => google suggests
    [1] => google suggests labs
    [2] => google suggests lab
)

In case it changes, as it did few weeks ago, lemme know and I'll rebuild it or maybe I'll just notice myself.

11 Comments Posted By Readers :

Add your comment
#1 taky from United States web
Posted on Thursday, 13 December, 2007
great post, thanks a bunch. i did something similar ;) more black though.
#2 5ubliminal web
Posted on Thursday, 13 December, 2007
Ur welcome.

Damn … taky … u said something nice to me?
PS: I'm seodude from syndk8 … ur supposed to h8 me! This world is coming to an end.
#3 blackhat seo from Greece web
Posted on Thursday, 13 December, 2007
Try json_decode instead of regex.
#4 5ubliminal web
Posted on Thursday, 13 December, 2007
I'm not really sure how json_decode works on the output from Google. I tried it but no valid ouput.
It's way easier with regexp for me :)

PS: If you could show an example …
This is a sample output:


window.google.ac.Suggest_apply(frameElement, "google suggests", new Array(2, "google suggests", "12,200,000 results", "google suggests labs", "2,240,000 results", "google suggests lab", "2,060,000 results", "google suggests beta", "2,200,000 results"), new Array(""));



Thanks.
#5 blackhat seo from Greece web
Posted on Friday, 14 December, 2007
Here's my code:
// Had to censore the long code with no dependencies available
But it all cuts down to this: in the URL of the Google Suggests we can replace js=true with json=true. And the output will be in JSON format.
#6 5ubliminal web
Posted on Friday, 14 December, 2007
I've posted a new section in the article with new code based on blackhat-seo's suggestions.

Thanks. (Efcharisto)
#7 blackhat seo from Greece web
Posted on Friday, 14 December, 2007
Should have just posted the urls, though you got the count cleaning wrong. Replace ',' and ' results' with ''.

Anyway here's the facts:

js=true return google specific javascript
json=true return json with counts
output=firefox returns json without counts
xml=true returns xml with counts

and you can also change hl=en to other languages and get different suggestions. On a sidenote the old Curl class is here:
http://webuildspam.com/code/class.Curl.phps but the code I posted is for the newest,much improved version (http://webuildspam.com/code/class.curl2.phps) that will be published shortly, after some testing and documenting.
#8 5ubliminal web
Posted on Friday, 14 December, 2007
Shame on me for not checking code :)
I actually rebuilt the function and now, one can choose the output type he enjoys most. It works with all formats and I TESTED IT :)

Thanks.
PS: Your cUrl class looks a bit scary.
#9 Dimi from Bulgaria web
Posted on Friday, 27 June, 2008
Just to add you can find the eHttpClient class here
http://www.tellinya.com/read/2007/08/03/39.html

Thanks for this article.

Regards
Dimi
#10 5ubliminal web
Posted on Friday, 27 June, 2008
What? o_O
#11 Prakash.R.R from India web
Posted on Saturday, 16 August, 2008
Ooo Pretty good. Anyone can get me the code for Yahoo Search Suggestion? Thanks in advance...
Post Feedback 
Name *
Mail *
URL
« Anti-Spam
» URL will only go live after a review. Comments are moderated. «
5ubliminal's TellinYa.com SEM & SEO Blog © 2007 - All rights reserved unless mentioned otherwise .
Rendered On : [Friday, 21 November, 2008 - 10:47:02 GMT]   No Ajax / Flash Used Here
" Parsing the Google Suggests in PHP : 5ubliminal's TellinYa "