GoArticles.com is a huge databse with nonsense articles published only for SEO porpouses. The currently have around 60k articles and about 6k authors and the only use one can find for their content is to build a scraper based on GoArticles content.
Articles are only written for the sake of writing but also for the links the authors add in the text. Articles can be downloaded freely but there is a catch. You are not allowed to alter them and in their html you will find links pointing to sites of the author. Black hat seo in its finest form.
I personally think less then 1% of the articles provide some interesting content the rest being written with seo in mind and with no real value. This is my personal opinion, but who am I to judge?
Go to google find an article and view the cached version. Then copy the URL of the article, wipe your cookies and visit the page directly. Or try to download it with a script. You will bump into a CAPTCHA which is obviously not seen by the search engines indexing the site.
I do not consider this methods ethical, and I want to bring my contributions with a simple example of retrieving webpages from the web. I will show you a sample of using cUrl and PHP to download a web page. Some might use this to scrape articles from GoArticle but I do not allow this to be used in that way!
Warning : This article uses and relies on the PHP Script cUrl Class which is mandatory for the following script to work!
<?
//--------------------------------------------------------------------------
// This function can be fed an ID as the one you see in this URL :
// http://www.goarticles.com/cgi-bin/showa.cgi?C=566619
//--------------------------------------------------------------------------
function getGoArticle($aid){
$hc=new eHttpClient();
$link="http://64.233.183.104/search?q=cache:".
urlencode("http://www.goarticles.com/cgi-bin/showa.cgi?C=".$aid).
"&hl=en&strip=1";
$html=$hc->get($link);
$html=preg_replace("/\s+/"," ",$html);
if(!preg_match("/<div class=article align=left>(.*)<\/div>/U",$html,$pcs))
return false;
$html=$pcs[1];
preg_match("/<h1>(.*)<\/h1>/Ui",$html,$h1);
$html=str_replace($h1[0],"",$html);
$html=preg_replace("/<h1>(.*)$/","",$html);
$h1=$h1[1];
$info=array();
$info['Html']=trim($html);
$info['Text']=preg_replace("/<\/([^>]+)>/"," ",$info['Html']);
$info['Text']=strip_tags($html);
$info['Text']=preg_replace("/\s+/"," ",$info['Text']);
if(preg_match("/^(.*) by (.*)$/",$h1,$inf)){
$info['Author']=trim($inf[2]);
$info['Title']=trim($inf[1]);
}
return $info;
}
//--------------------------------------------------------------------------
// This function can search google and find you some IDs like in the URL:
// http://www.goarticles.com/cgi-bin/showa.cgi?C=566619
//--------------------------------------------------------------------------
function getGoArticles($query,$page=1,$count=100){
$page--;
$hc=new eHttpClient();
$link="http://www.google.com/ie?q=".
urlencode("site:goarticles.com $query").
"&hl=en&start=".($page*10)."&num=$count";
$html=$hc->get($link);
if(!preg_match_all("/showa\.cgi\?C=([0-9]+)/",$html,$pcs))
return false;
return $pcs[1];
}
//--------------------------------------------------------------------------
?>
<?
//--------------------------------------------------------------------------
// Set timeout for this script to 5 minutes
set_time_limit(5*60);
//--------------------------------------------------------------------------
// What you look for in GoArticles.com
$query = "online casino";
// Grab one page with 10 results # 100 max
$articleIDs = getGoArticles($query,1,10);
$articles = array();
foreach($articleIDs as $articleID){
$article = getGoArticle($articleID)
// To keep just the text uncomment :
// $article = $article['Text'];
$articles[$articleID] = $article;
// As we query pages from Google cache
// take a small delay between visits.
// Do not abuse!
sleep(rand(5,10));
}
// In the end print the results for your educational
// porpouses and to see the format of the output.
print_r($articles);
//--------------------------------------------------------------------------
?>

This script displays a concept and has educational porpouses. It is not meant for use but only to display and comprehend the power of Google, cUrl and PHP.
Warning! Do not use this script for:
Post Feedback