This program uses map/reduce to just run a distributed job where there is
no interaction between the tasks and each task writes a large unsorted
random sequence of words.
In order for this program to generate data for terasort with a 5-10 words
per key and 20-100 words per value, have the following config:
mapreduce.randomtextwriter.minwordskey
5
mapreduce.randomtextwriter.maxwordskey
10
mapreduce.randomtextwriter.minwordsvalue
20
mapreduce.randomtextwriter.maxwordsvalue
100
mapreduce.randomtextwriter.totalbytes
1099511627776
Equivalently,
RandomTextWriter
also supports all the above options
and ones supported by
Tool
via the command-line.
To run: bin/hadoop jar hadoop-${version}-examples.jar randomtextwriter
[-outFormat
output format class]
output