First of all these are my favourites because these are some of the lazy methods. The algorithms run for themselves and there's not input necessary. You don't have to write templates with fill-in words … you just code the algo and feed lots of content and … voila.
I will detail you my favourite content generation methods and the bright and dark side of each! Anyone ever interested in content generation must have heard of Markov chains and Slice and Join. Both methods are quite simple when you understand them but Markov can kill one that first hears about it with it's mathematical formulas.
I'm not a math doctor and I will only explain you here how to use these methods to generate content. Any other mathematical applications are way out of my league.
This is mostly a principle, a way of functionality, a basis for you do develop on. Markov chains will give you the starting point but it can be handled in many ways for content generation.
Markov works easy. If you take a text you will have words that go one after another in a sentence. By analyzing a lot of text you will find patterns and you will discover the probability of one word following another.
So … by analyzing a phrase you will notice that one word is followed by 5 different ones through out the text. And another is followed by only 2 others. And so on …
In the end you will have 3 types of words. Those that begin a sentence, those that end a sentence and those that are between two other words.
To generate content you can begin with any word or a word that starts a sentence. The algo picks that word and then gets a list of all those that follow. Then you can choose one of those that follow and then you repeat and repeat untill you reach a word that ends a sentence.
As I said the algo can be made to suite different pourposes and behave differently each time. I will show you a demo of my most simple Markov algo. Basic stuff:
Original: link^!
Markoved:
After you stop laughing we carry on! Somehow readable, the content is mostly weird and unexpected. If you start playing with some thesaurus, identifying the parts of speech you can go much further in this. But that you personal task and it depends on the quality of original text.
One day I went into the park to think clearly about my tasks. I needed a better content generation. This was happening before I coded my Markovs. I was staring at a book and finally an idea hit me.
Each sentence has words and many sentences share many words. What if you split sentences in pieces and recombine randomly. Eg.:
1. Anna arrived at school in the pouring rain hardly wet. 2. John could not come to school as he was seriously sick.
I selected the word school as the split mark and by split and join we will get:
1. Anna arrived at school as he was seriously sick . 2. John could not come to school in the pouring rain hardly wet.
You will still get funny text but, by choosing the split markers carefully, it will be more readable. When doing split and join make sure you don't leave more then 5 words in a row unchanged as you may trip duplicate filters!
No method may stand up to a serious semantic check or human review. And Markov is more duplicate proof then slice and join, unless done right.
Hope this tutorial helped, you had some fun and learned a little something! More techniques coming soon.
Post Feedback