Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Is TeraGen's generated data deterministic?

Copy link to this message
Re: Is TeraGen's generated data deterministic?
Raj Vishwanathan 2012-04-14, 21:15

Since the data generation and sorting is different hadoop jobs, you can generate the data once and sort the same data as many times as as you want.

I don't think Teragen is deterministic.( or rather , the keys are random but the text is deterministic if I remember correctly ) 


> From: David Erickson <[EMAIL PROTECTED]>
>Sent: Saturday, April 14, 2012 1:53 PM
>Subject: Is TeraGen's generated data deterministic?
>Hi we are doing some benchmarking of some of our infrastructure and
>are using TeraGen/TeraSort to do the benchmarking.  I am wondering if
>the data generated by TeraGen is deterministic, in that if I repeat
>the same experiment multiple times with the same configuration options
>if it will continue to generate and sort the exact same data?  And if
>not, is there an easy mod to make this happen?