5ubliminal@twitter

PHP Script To Get MSN (Live Search) Search Results Pages (SERPs) : 5ubliminal's TellinYa

<a href="http://www.tellinya.com/art2/131/">PHP Script To Get MSN (Live Search) Search Results Pages (SERPs) : 5ubliminal's TellinYa</a>
Must Reads: Web Scraping | Link Farming | Code Snippets | SEO Freeware » I'm on vacation! … still alive :)
Reveal More!

This has to be regarded as an educational script and should not be used outside Search.Live.com guidelines!

Warning : This article uses and relies on the PHP Script cUrl Class which is mandatory for the following script to work!

The version for Google.com is also available!

Why would one need to parse the Search.Live.com Results?

All that I mentioned when I made this work for Google.com applies here! Still MSN provides results in RSS form and this shows they might have a more permisive policy regarding automated access to their search results.

The PHP Script To Parse MSN (Search Live)Results

Enough with the chit-chat! Let's get on to the code …

$page parameter is the page number and starts from 1. Never use 0-9 pages but 1-10!

<?
// --
function msnResults($query,$page=1,$perpage=10,$dc="search.live.com"){
    if($page) $page--;
    $url=sprintf("http://%s/results.aspx?q=%s&count=%d&first=%d&format=rss",
        $dc,urlencode($query),$perpage,($page*$perpage)+1);
    $hc=new eHttpClient();
    $xml=$hc->get($url);
    //Rss Has items. If not terminate!
    if(!preg_match_all( "/<item>(.+?)<\/item>/", $xml, $matches)) return false;
    $matches=$matches[1];
    $results=array();
    for($i=0;$i<count($matches);$i++){
        $match=trim($matches[$i]);
        $match=str_replace("&amp;","&",$match);
        //If item can't be translated, continue!
        if(!preg_match(
            "/<title>(.+?)<\/title>\s*".
            "<link>(.+?)<\/link>\s*".
            "<description>(.*)<\/description>\s*".
            "<pubDate>(.+?)<\/pubDate>/i",
            $match, $parts
        )) continue;
        //Decode the HTML encodes and strip tags.
        $title=html_entity_decode(strip_tags(trim($parts[1],"  \"")));
        $desc=html_entity_decode(strip_tags(trim($parts[3],"  \"")));
        //Rank?
        $pos=($page*$perpage)+$i+1;
        $link=trim($parts[2],"  \"");
        $tm=strtotime($parts[4]);
        //-- Link invalid ... continue;
        if(!preg_match("/^([^:]+):\/\/([^\/]+)[\/]?(.*)$/",
            $link,$Doms)) continue;
        $Http=$Doms[1];
        $Rel="/".$Doms[3];
        $Dom=$Doms[2];
        //Prepare result
        $serpEntry=array(
            "Rank"            => $pos,
            "Url"            => $link,
            "Title"            => trim($title),
            "Host"            => $Dom,
            "Protocol"        => $Http,
            "Path"            => $Rel,
            "Summary"        => trim($desc),
            "Cached"        => $tm, //UnixTime Stamp
            //Human Readable
            "CachedOn"        => strftime("%d %B %Y",$tm),
        );
        array_push($results,$serpEntry);
    }
    return $results;
}
// --
function msnLinks($query,$page=1,$perpage=10,$dc="search.live.com"){
    $res=msnResults($query,$page,$perpage,$dc);
    $links=array();
    for($i=0;$i<count($res);$i++){
        $link=$res[$i]['Url'];
        array_push($links,$link);
    }
    return $links;
}
// --
?>

The msnLinks is a helper function and will only list the URLs of the results.

And finally a sample output …

It is the first element of an array of 10 output by:

<? print_r(msnResults("site:tellinya.com/",1,10)); ?>

    [4] => Array
        (
            [Rank] => 5
            [Url] => http://www.tellinya.com/
            [Title] => I'm)TellinYa - Bits of wisdom published by regular people
            [Host] => www.tellinya.com
            [Protocol] => http
            [Path] => /
            [Summary] => I'm TellinYa - Bits of wisdom published by regular people ...
            [Cached] => 1188669600
            [CachedOn] => 01 September 2007
        )

n-Joy and stay safe! I take no responsabilities, ... bla bla bla.

PS: Thanks to Ivan from MT-Soft.com for pointing out some errors due to defacement produced by my blog.

3 Comments Posted By Readers :

Add your comment
#1 juust from Netherlands web
Posted on Sunday, 06 July, 2008
thanks, i hadn't thought of the rss option.
here's another:

$first=1;
$query="php+serp";
$count=50;

$xml = @simplexml_load_file("http://search.live.com/results.aspx?q=$query&count=$count&first=$first&format=rss");
foreach($xml->channel->item as $i) echo $i->link."";
#2 5ubliminal web
Posted on Sunday, 06 July, 2008
Thanks. Yeah! This code was written ages ago and now I use the XML stuff more but this code works on any webhosting.
Some crazy people don't enable such extensions.
… or maybe they all do but you can never be too sure when you're sharing code with others. Some don't even know how to enable extesions.

;)
#3 juust from Netherlands web
Posted on Monday, 07 July, 2008
yup :)

I didn't mean to be smug there.

I wrote a line-parser to handle a blog-pipe
for that one your code is a major improvement in speed.
Post Feedback 
Name *
Mail *
URL
« Anti-Spam
» URL will only go live after a review. Comments are moderated. «
5ubliminal's TellinYa.com SEM & SEO Blog © 2007 - All rights reserved unless mentioned otherwise .
Rendered On : [Thursday, 21 August, 2008 - 21:31:47 GMT]   No Ajax / Flash Used Here
" PHP Script To Get MSN (Live Search) Search Results Pages (SERPs) : 5ubliminal's TellinYa "