5ubliminal@twitter

Content Generation - Markov vs. Slice and Join : 5ubliminal's TellinYa

<a href="http://www.tellinya.com/art2/206/">Content Generation - Markov vs. Slice and Join : 5ubliminal's TellinYa</a>
5ubliminal's YAMS
I sometimes play dirty …

First of all these are my favourites because these are some of the lazy methods. The algorithms run for themselves and there's not input necessary. You don't have to write templates with fill-in words … you just code the algo and feed lots of content and … voila.

I will detail you my favourite content generation methods and the bright and dark side of each! Anyone ever interested in content generation must have heard of Markov chains and Slice and Join. Both methods are quite simple when you understand them but Markov can kill one that first hears about it with it's mathematical formulas.

I'm not a math doctor and I will only explain you here how to use these methods to generate content. Any other mathematical applications are way out of my league.

1. Markov content generation!

This is mostly a principle, a way of functionality, a basis for you do develop on. Markov chains will give you the starting point but it can be handled in many ways for content generation.

Markov works easy. If you take a text you will have words that go one after another in a sentence. By analyzing a lot of text you will find patterns and you will discover the probability of one word following another.

So … by analyzing a phrase you will notice that one word is followed by 5 different ones through out the text. And another is followed by only 2 others. And so on …

In the end you will have 3 types of words. Those that begin a sentence, those that end a sentence and those that are between two other words.

To generate content you can begin with any word or a word that starts a sentence. The algo picks that word and then gets a list of all those that follow. Then you can choose one of those that follow and then you repeat and repeat untill you reach a word that ends a sentence.

My lamest Markov algorythm Sample:

As I said the algo can be made to suite different pourposes and behave differently each time. I will show you a demo of my most simple Markov algo. Basic stuff:

Original: link^!

Markoved:

Conventional scientific evidence to change his name they must be prepared well over 10000 oysters would be found in asia. Mr may who were excited about 350. Conventional scientific wisdom has patented the fact market them. Pfizer wants to change his oysters are about 350. Conventional scientific evidence to indulge in which the efforts of consuming the farmer is still prevalent in the results.
He knows that he states it that if pfizer the trademark on the proclivity to market for aphrodite the soaking period. Pfizer wants to change his oysters and knows there is already prepared for them. An australian farmer was inspired to indulge in australia then his oysters and knows that if he states it that he aims to increase libido.
Pfizer wants to friends and that there is new more scientific evidence to the oysters are about 350. In which the soaking period. An attempt to come up viagra to tanks full of the soaking period. Food and has stated that they must be present. Oysters traces of viagra can be prepared for them as the female sexual organs of the world's most famous aphrodisiacs named for significant legal action.
Pfizer wants to get him to change his oysters in asia. Mr may says that there isn't any scientific wisdom has stated that if he knows there isn't any scientific wisdom has consumed many himself and. Food authorities of virile animals such as rabbits or dried tiger's penises is illegal and lace the makers of food authorities of love and south. Their reputation simply by the overseas market in certain foods increases the makers of food and similarity to friends and south korea costs about to.
Mr may who were excited about to suggest that he aims to suggest that indulging in an australian farmer was inspired to grow his oysters. Food authorities of virile animals such as viagra because he is number of nsw state that many himself and that if he states it. Aphrodisiacs gained their appearance and have also threatened the oysters are about the makers of the sexual organs of conduct under large industry marketing. Food authorities of food authorities of bowl of number of food and lace the water in an australian farmer in certain foods increases. The world's most famous aphrodisiacs gained their reputation simply by association.

After you stop laughing we carry on! Somehow readable, the content is mostly weird and unexpected. If you start playing with some thesaurus, identifying the parts of speech you can go much further in this. But that you personal task and it depends on the quality of original text.

2. Slice and join method!

One day I went into the park to think clearly about my tasks. I needed a better content generation. This was happening before I coded my Markovs. I was staring at a book and finally an idea hit me.

Each sentence has words and many sentences share many words. What if you split sentences in pieces and recombine randomly. Eg.:

1. Anna arrived at school in the pouring rain hardly wet.
2. John could not come to school as he was seriously sick.

I selected the word school as the split mark and by split and join we will get:

1. Anna arrived at school as he was seriously sick .
2. John could not come to school in the pouring rain hardly wet.

You will still get funny text but, by choosing the split markers carefully, it will be more readable. When doing split and join make sure you don't leave more then 5 words in a row unchanged as you may trip duplicate filters!

Human review and duplicate filters!

No method may stand up to a serious semantic check or human review. And Markov is more duplicate proof then slice and join, unless done right.

Hope this tutorial helped, you had some fun and learned a little something! More techniques coming soon.

4 Comments Posted By Readers :

Add your comment
#1 Garcia from Bolivia web
Posted on Wednesday, 24 October, 2007
Is this a PHP script or something?
#2 5ubliminal web
Posted on Wednesday, 24 October, 2007
The algo can be written in anything. I actually have it in C++.
#3 netizen from Switzerland web
Posted on Saturday, 24 November, 2007
y0. had to say something about text generation too. :)
#4 signul9 from United States
Posted on Monday, 03 November, 2008
Best explanation of Markov I've read. Great stuff!!
Post Feedback 
Name *
Mail *
URL
« Anti-Spam
» URL will only go live after a review. Comments are moderated. «
5ubliminal's TellinYa.com SEM & SEO Blog © 2007 - All rights reserved unless mentioned otherwise .
Rendered On : [Friday, 21 November, 2008 - 09:39:02 GMT]   No Ajax / Flash Used Here
" Content Generation - Markov vs. Slice and Join : 5ubliminal's TellinYa "