|
|
-
Quick Question about Bulk loading of HFiles & Timestamps
Jacques 2011-08-05, 21:10
Can someone confirm that bulk loading hfiles keeps cell timestamps from overwriting each other.
For example: I run mapreduce A job on Monday. I run mapreduce B job on Tuesday.
I then run LoadIncrementalHFiles on job B first, followed by A.
Please confirm that at the intersection of outputs A & B will be the values from B.
Thanks, Jacques
+
Jacques 2011-08-05, 21:10
-
Re: Quick Question about Bulk loading of HFiles & Timestamps
Todd Lipcon 2011-08-05, 22:53
Hi Jacques,
Yes, the timestamps are set at the time the MR job runs, not the time they're loaded. So, you'll see the values from the job that wrote its output most recently.
You can also specify timestamps explicitly for each KeyValue, if you prefer.
-Todd
On Fri, Aug 5, 2011 at 2:10 PM, Jacques <[EMAIL PROTECTED]> wrote: > Can someone confirm that bulk loading hfiles keeps cell timestamps from > overwriting each other. > > For example: > I run mapreduce A job on Monday. > I run mapreduce B job on Tuesday. > > I then run LoadIncrementalHFiles on job B first, followed by A. > > Please confirm that at the intersection of outputs A & B will be the values > from B. > > Thanks, > Jacques >
-- Todd Lipcon Software Engineer, Cloudera
+
Todd Lipcon 2011-08-05, 22:53
-
Re: Quick Question about Bulk loading of HFiles & Timestamps
Jacques 2011-08-05, 23:24
Perfect.
thanks, Jacques
On Fri, Aug 5, 2011 at 3:53 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> Hi Jacques, > > Yes, the timestamps are set at the time the MR job runs, not the time > they're loaded. So, you'll see the values from the job that wrote its > output most recently. > > You can also specify timestamps explicitly for each KeyValue, if you > prefer. > > -Todd > > On Fri, Aug 5, 2011 at 2:10 PM, Jacques <[EMAIL PROTECTED]> wrote: > > Can someone confirm that bulk loading hfiles keeps cell timestamps from > > overwriting each other. > > > > For example: > > I run mapreduce A job on Monday. > > I run mapreduce B job on Tuesday. > > > > I then run LoadIncrementalHFiles on job B first, followed by A. > > > > Please confirm that at the intersection of outputs A & B will be the > values > > from B. > > > > Thanks, > > Jacques > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
+
Jacques 2011-08-05, 23:24
|
|