Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Is TeraGen's generated data deterministic?

Copy link to this message
Re: Is TeraGen's generated data deterministic?

Since the data generation and sorting is different hadoop jobs, you can generate the data once and sort the same data as many times as as you want.

I don't think Teragen is deterministic.( or rather , the keys are random but the text is deterministic if I remember correctly ) 


> From: David Erickson <[EMAIL PROTECTED]>
>Sent: Saturday, April 14, 2012 1:53 PM
>Subject: Is TeraGen's generated data deterministic?
>Hi we are doing some benchmarking of some of our infrastructure and
>are using TeraGen/TeraSort to do the benchmarking.  I am wondering if
>the data generated by TeraGen is deterministic, in that if I repeat
>the same experiment multiple times with the same configuration options
>if it will continue to generate and sort the exact same data?  And if
>not, is there an easy mod to make this happen?