This has to be regarded as an educational script and should not be used outside Search.Live.com guidelines!
Warning : This article uses and relies on the PHP Script cUrl Class which is mandatory for the following script to work!
The version for Google.com is also available!
Why would one need to parse the Search.Live.com Results?
All that I mentioned when I made this work for Google.com applies here! Still MSN provides results in RSS form and this shows they might have a more permisive policy regarding automated access to their search results.
The PHP Script To Parse MSN (Search Live)Results
Enough with the chit-chat! Let's get on to the code …
$page parameter is the page number and starts from 1. Never use 0-9 pages but 1-10!
<?
function msnResults($query,$page=1,$perpage=10,$dc="search.live.com"){
if($page) $page--;
$url=sprintf("http://%s/results.aspx?q=%s&count=%d&first=%d&format=rss",
$dc,urlencode($query),$perpage,($page*$perpage)+1);
$hc=new eHttpClient();
$xml=$hc->get($url);
if(!preg_match_all( "/<item>(.+?)<\/item>/", $xml, $matches)) return false;
$matches=$matches[1];
$results=array();
for($i=0;$i<count($matches);$i++){
$match=trim($matches[$i]);
$match=str_replace("&","&",$match);
if(!preg_match(
"/<title>(.+?)<\/title>\s*".
"<link>(.+?)<\/link>\s*".
"<description>(.*)<\/description>\s*".
"<pubDate>(.+?)<\/pubDate>/i",
$match, $parts
)) continue;
$title=html_entity_decode(strip_tags(trim($parts[1]," \"")));
$desc=html_entity_decode(strip_tags(trim($parts[3]," \"")));
$pos=($page*$perpage)+$i+1;
$link=trim($parts[2]," \"");
$tm=strtotime($parts[4]);
if(!preg_match("/^([^:]+):\/\/([^\/]+)[\/]?(.*)$/",
$link,$Doms)) continue;
$Http=$Doms[1];
$Rel="/".$Doms[3];
$Dom=$Doms[2];
$serpEntry=array(
"Rank" => $pos,
"Url" => $link,
"Title" => trim($title),
"Host" => $Dom,
"Protocol" => $Http,
"Path" => $Rel,
"Summary" => trim($desc),
"Cached" => $tm,
"CachedOn" => strftime("%d %B %Y",$tm),
);
array_push($results,$serpEntry);
}
return $results;
}
function msnLinks($query,$page=1,$perpage=10,$dc="search.live.com"){
$res=msnResults($query,$page,$perpage,$dc);
$links=array();
for($i=0;$i<count($res);$i++){
$link=$res[$i]['Url'];
array_push($links,$link);
}
return $links;
}
?>
The msnLinks is a helper function and will only list the URLs of the results.
And finally a sample output …
It is the first element of an array of 10 output by:
<? print_r(msnResults("site:tellinya.com/",1,10)); ?>
[4] => Array
(
[Rank] => 5
[Url] => http://www.tellinya.com/
[Title] => I'm)TellinYa - Bits of wisdom published by regular people
[Host] => www.tellinya.com
[Protocol] => http
[Path] => /
[Summary] => I'm TellinYa - Bits of wisdom published by regular people ...
[Cached] => 1188669600
[CachedOn] => 01 September 2007
)
n-Joy and stay safe! I take no responsabilities, ... bla bla bla.
PS: Thanks to Ivan from MT-Soft.com for pointing out some errors due to defacement produced by my blog.
Post Feedback