Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Is TeraGen's generated data deterministic?


Copy link to this message
-
Re: Is TeraGen's generated data deterministic?
David

Since the data generation and sorting is different hadoop jobs, you can generate the data once and sort the same data as many times as as you want.

I don't think Teragen is deterministic.( or rather , the keys are random but the text is deterministic if I remember correctly ) 

Raj

>________________________________
> From: David Erickson <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>Sent: Saturday, April 14, 2012 1:53 PM
>Subject: Is TeraGen's generated data deterministic?
>
>Hi we are doing some benchmarking of some of our infrastructure and
>are using TeraGen/TeraSort to do the benchmarking.  I am wondering if
>the data generated by TeraGen is deterministic, in that if I repeat
>the same experiment multiple times with the same configuration options
>if it will continue to generate and sort the exact same data?  And if
>not, is there an easy mod to make this happen?
>
>Thanks!
>David
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB