|
|
-
Is TeraGen's generated data deterministic?
David Erickson 2012-04-14, 20:53
Hi we are doing some benchmarking of some of our infrastructure and are using TeraGen/TeraSort to do the benchmarking. I am wondering if the data generated by TeraGen is deterministic, in that if I repeat the same experiment multiple times with the same configuration options if it will continue to generate and sort the exact same data? And if not, is there an easy mod to make this happen?
Thanks! David
-
Re: Is TeraGen's generated data deterministic?
Raj Vishwanathan 2012-04-14, 21:15
David
Since the data generation and sorting is different hadoop jobs, you can generate the data once and sort the same data as many times as as you want.
I don't think Teragen is deterministic.( or rather , the keys are random but the text is deterministic if I remember correctly )
Raj
>________________________________ > From: David Erickson <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Saturday, April 14, 2012 1:53 PM >Subject: Is TeraGen's generated data deterministic? > >Hi we are doing some benchmarking of some of our infrastructure and >are using TeraGen/TeraSort to do the benchmarking. I am wondering if >the data generated by TeraGen is deterministic, in that if I repeat >the same experiment multiple times with the same configuration options >if it will continue to generate and sort the exact same data? And if >not, is there an easy mod to make this happen? > >Thanks! >David > > >
-
Re: Is TeraGen's generated data deterministic?
David Erickson 2012-04-14, 21:59
Thanks Raj. Unfortunately I have to tear down hadoop completely between runs, including the backing data store, so if possible I need to figure out a way to generate the same data repeatedly by providing a single seed, or similar.
On Sat, Apr 14, 2012 at 2:15 PM, Raj Vishwanathan <[EMAIL PROTECTED]> wrote: > David > > Since the data generation and sorting is different hadoop jobs, you can generate the data once and sort the same data as many times as as you want. > > I don't think Teragen is deterministic.( or rather , the keys are random but the text is deterministic if I remember correctly ) > > > > Raj > > > >>________________________________ >> From: David Erickson <[EMAIL PROTECTED]> >>To: [EMAIL PROTECTED] >>Sent: Saturday, April 14, 2012 1:53 PM >>Subject: Is TeraGen's generated data deterministic? >> >>Hi we are doing some benchmarking of some of our infrastructure and >>are using TeraGen/TeraSort to do the benchmarking. I am wondering if >>the data generated by TeraGen is deterministic, in that if I repeat >>the same experiment multiple times with the same configuration options >>if it will continue to generate and sort the exact same data? And if >>not, is there an easy mod to make this happen? >> >>Thanks! >>David >> >> >>
-
Re: Is TeraGen's generated data deterministic?
Owen O'Malley 2012-04-14, 22:15
Yes, both versions of teragen are completely deterministic. They each use a random number generator with a fixed seed.
-- Owen
On Apr 14, 2012, at 1:53 PM, David Erickson <[EMAIL PROTECTED]> wrote:
> Hi we are doing some benchmarking of some of our infrastructure and > are using TeraGen/TeraSort to do the benchmarking. I am wondering if > the data generated by TeraGen is deterministic, in that if I repeat > the same experiment multiple times with the same configuration options > if it will continue to generate and sort the exact same data? And if > not, is there an easy mod to make this happen? > > Thanks! > David
|
|