|
Cosmin Lehene
2012-02-02, 04:46
Jonathan Hsieh
2012-02-02, 07:30
Cosmin Lehene
2012-02-03, 00:46
Ted Yu
2012-02-03, 01:03
Cosmin Lehene
2012-02-06, 16:25
Cosmin Lehene
2012-02-14, 20:02
Amitanand Aiyer
2012-02-14, 21:06
Cosmin Lehene
2012-02-15, 11:33
Cosmin Lehene
2012-03-02, 21:45
|
-
MR job "randomly" scans up thousands of rows less than the it should.Cosmin Lehene 2012-02-02, 04:46
We have a MR job that runs every few minutes on some time series data which is continuously updated (never deleted).
Every few (in the range of tens to hundreds) runs the map task that covers the last region will get fewer input records (off by 500-5000 rows) without any splits happening. This lower number of input records could persist for a few MR runs, but will eventually get back to the "correct" value. This drop can be seen both in the "map input records" metric but it's correlated with the metrics that get computed by the MR job (so it's not a MR counter bug). There are no exceptions in the MR job, or in the region server and this doesn't seem to be correlated with any compaction, split or region movement. The only "variable" in this scenario is that new data gets injected continuously (and the actual MR job which is idempotent) This entire puzzle takes place on HBase 0.90.5 –ish (12 dec 2011) on top of Hadoop cdh3u2. Cosmin +
Cosmin Lehene 2012-02-02, 04:46
-
Re: MR job "randomly" scans up thousands of rows less than the it should.Jonathan Hsieh 2012-02-02, 07:30
Cosmin,
How many column families to you have in this table? Are you using any filters in you HBase scans? Are you using skip rows that may not have qualifiers present? There are a few known issues with multi-CF atomicity and a recent one about flushes that may be related to this problem. There HBASE-2856, a fix having to do with flushes which is pretty intricate and only in 0.92. Jon. On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote: > We have a MR job that runs every few minutes on some time series data > which is continuously updated (never deleted). > Every few (in the range of tens to hundreds) runs the map task that covers > the last region will get fewer input records (off by 500-5000 rows) without > any splits happening. This lower number of input records could persist for > a few MR runs, but will eventually get back to the "correct" value. > > This drop can be seen both in the "map input records" metric but it's > correlated with the metrics that get computed by the MR job (so it's not a > MR counter bug). > > There are no exceptions in the MR job, or in the region server and this > doesn't seem to be correlated with any compaction, split or region movement. > The only "variable" in this scenario is that new data gets injected > continuously (and the actual MR job which is idempotent) > > This entire puzzle takes place on HBase 0.90.5 –ish (12 dec 2011) on top > of Hadoop cdh3u2. > > Cosmin > > > > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [EMAIL PROTECTED] +
Jonathan Hsieh 2012-02-02, 07:30
-
Re: MR job "randomly" scans up thousands of rows less than the it should.Cosmin Lehene 2012-02-03, 00:46
(sorry for the damaged subject :))
Hey Jon, We have two column families. There are no filters and there's a full table scan. We're not skipping rows. I did see however a single time that we had one qualifier "fault" in the job counters (it was missing, and it wasn't supposed to be missing). However that was only once and it doesn't happen when we encounter missing rows. We're getting this behavior consistently although I couldn't figure a way to reproduce it. I'll try running multiple instances of the job in parallel to figure out if that would affect the outcome. I'll probably have to add more debugging for the affected rows and dig deeper. HBASE-2856 is a pretty large issue - do you think it could be related to what I'm seeing? If so it could help me reproduce it. Thanks, Cosmin On 2/1/12 11:30 PM, "Jonathan Hsieh" <[EMAIL PROTECTED]> wrote: >Cosmin, > >How many column families to you have in this table? Are you using any >filters in you HBase scans? Are you using skip rows that may not have >qualifiers present? > >There are a few known issues with multi-CF atomicity and a recent one >about >flushes that may be related to this problem. There HBASE-2856, a fix >having to do with flushes which is pretty intricate and only in 0.92. > >Jon. > >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote: > >> We have a MR job that runs every few minutes on some time series data >> which is continuously updated (never deleted). >> Every few (in the range of tens to hundreds) runs the map task that >>covers >> the last region will get fewer input records (off by 500-5000 rows) >>without >> any splits happening. This lower number of input records could persist >>for >> a few MR runs, but will eventually get back to the "correct" value. >> >> This drop can be seen both in the "map input records" metric but it's >> correlated with the metrics that get computed by the MR job (so it's >>not a >> MR counter bug). >> >> There are no exceptions in the MR job, or in the region server and this >> doesn't seem to be correlated with any compaction, split or region >>movement. >> The only "variable" in this scenario is that new data gets injected >> continuously (and the actual MR job which is idempotent) >> >> This entire puzzle takes place on HBase 0.90.5 ish (12 dec 2011) on >>top >> of Hadoop cdh3u2. >> >> Cosmin >> >> >> >> > > >-- >// Jonathan Hsieh (shay) >// Software Engineer, Cloudera >// [EMAIL PROTECTED] +
Cosmin Lehene 2012-02-03, 00:46
-
Re: MR job "randomly" scans up thousands of rows less than the it should.Ted Yu 2012-02-03, 01:03
HBASE-4838 ports HBASE-2856 to 0.92
FYI On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote: > (sorry for the damaged subject :)) > > > Hey Jon, > We have two column families. > There are no filters and there's a full table scan. We're not skipping > rows. > I did see however a single time that we had one qualifier "fault" in the > job counters (it was missing, and it wasn't supposed to be missing). > However that was only once and it doesn't happen when we encounter missing > rows. > > We're getting this behavior consistently although I couldn't figure a way > to reproduce it. I'll try running multiple instances of the job in > parallel to figure out if that would affect the outcome. > I'll probably have to add more debugging for the affected rows and dig > deeper. > > HBASE-2856 is a pretty large issue - do you think it could be related to > what I'm seeing? If so it could help me reproduce it. > > Thanks, > Cosmin > > > > > On 2/1/12 11:30 PM, "Jonathan Hsieh" <[EMAIL PROTECTED]> wrote: > > >Cosmin, > > > >How many column families to you have in this table? Are you using any > >filters in you HBase scans? Are you using skip rows that may not have > >qualifiers present? > > > >There are a few known issues with multi-CF atomicity and a recent one > >about > >flushes that may be related to this problem. There HBASE-2856, a fix > >having to do with flushes which is pretty intricate and only in 0.92. > > > >Jon. > > > >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote: > > > >> We have a MR job that runs every few minutes on some time series data > >> which is continuously updated (never deleted). > >> Every few (in the range of tens to hundreds) runs the map task that > >>covers > >> the last region will get fewer input records (off by 500-5000 rows) > >>without > >> any splits happening. This lower number of input records could persist > >>for > >> a few MR runs, but will eventually get back to the "correct" value. > >> > >> This drop can be seen both in the "map input records" metric but it's > >> correlated with the metrics that get computed by the MR job (so it's > >>not a > >> MR counter bug). > >> > >> There are no exceptions in the MR job, or in the region server and this > >> doesn't seem to be correlated with any compaction, split or region > >>movement. > >> The only "variable" in this scenario is that new data gets injected > >> continuously (and the actual MR job which is idempotent) > >> > >> This entire puzzle takes place on HBase 0.90.5 ish (12 dec 2011) on > >>top > >> of Hadoop cdh3u2. > >> > >> Cosmin > >> > >> > >> > >> > > > > > >-- > >// Jonathan Hsieh (shay) > >// Software Engineer, Cloudera > >// [EMAIL PROTECTED] > > +
Ted Yu 2012-02-03, 01:03
-
Re: MR job "randomly" scans up thousands of rows less than the it should.Cosmin Lehene 2012-02-06, 16:25
Thanks Ted!
I wonder if it would make more sense to port it to 0.90.X or upgrade to 0.92. Cosmin On 2/2/12 5:03 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >HBASE-4838 ports HBASE-2856 to 0.92 > >FYI > >On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote: > >> (sorry for the damaged subject :)) >> >> >> Hey Jon, >> We have two column families. >> There are no filters and there's a full table scan. We're not skipping >> rows. >> I did see however a single time that we had one qualifier "fault" in the >> job counters (it was missing, and it wasn't supposed to be missing). >> However that was only once and it doesn't happen when we encounter >>missing >> rows. >> >> We're getting this behavior consistently although I couldn't figure a >>way >> to reproduce it. I'll try running multiple instances of the job in >> parallel to figure out if that would affect the outcome. >> I'll probably have to add more debugging for the affected rows and dig >> deeper. >> >> HBASE-2856 is a pretty large issue - do you think it could be related to >> what I'm seeing? If so it could help me reproduce it. >> >> Thanks, >> Cosmin >> >> >> >> >> On 2/1/12 11:30 PM, "Jonathan Hsieh" <[EMAIL PROTECTED]> wrote: >> >> >Cosmin, >> > >> >How many column families to you have in this table? Are you using any >> >filters in you HBase scans? Are you using skip rows that may not have >> >qualifiers present? >> > >> >There are a few known issues with multi-CF atomicity and a recent one >> >about >> >flushes that may be related to this problem. There HBASE-2856, a fix >> >having to do with flushes which is pretty intricate and only in 0.92. >> > >> >Jon. >> > >> >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> >>wrote: >> > >> >> We have a MR job that runs every few minutes on some time series data >> >> which is continuously updated (never deleted). >> >> Every few (in the range of tens to hundreds) runs the map task that >> >>covers >> >> the last region will get fewer input records (off by 500-5000 rows) >> >>without >> >> any splits happening. This lower number of input records could >>persist >> >>for >> >> a few MR runs, but will eventually get back to the "correct" value. >> >> >> >> This drop can be seen both in the "map input records" metric but it's >> >> correlated with the metrics that get computed by the MR job (so it's >> >>not a >> >> MR counter bug). >> >> >> >> There are no exceptions in the MR job, or in the region server and >>this >> >> doesn't seem to be correlated with any compaction, split or region >> >>movement. >> >> The only "variable" in this scenario is that new data gets injected >> >> continuously (and the actual MR job which is idempotent) >> >> >> >> This entire puzzle takes place on HBase 0.90.5 ish (12 dec 2011) on >> >>top >> >> of Hadoop cdh3u2. >> >> >> >> Cosmin >> >> >> >> >> >> >> >> >> > >> > >> >-- >> >// Jonathan Hsieh (shay) >> >// Software Engineer, Cloudera >> >// [EMAIL PROTECTED] >> >> +
Cosmin Lehene 2012-02-06, 16:25
-
Re: MR job "randomly" scans up thousands of rows less than the it should.Cosmin Lehene 2012-02-14, 20:02
I just got back on this issue. Initially the behavior we've seen (missing
rows) wouldn't reproduce on 0.90 using TestAcidGuarantees. However, if the puts in the writer threads include additional rows the scanners will start reading less rows. This reproduces consistently on 0.90 and seems to be working correctly on 0.92. HBASE-2856/HBASE-4838 are probably the solution, although there's a chance it's some other fix on 0.92 (ideas?) We're undecided whether backporting to 0.90 vs upgrading the affected clusters to 0.92 would be better? Also is there interest for this fix on 0.90? Thanks, Cosmin On 2/6/12 6:25 PM, "Cosmin Lehene" <[EMAIL PROTECTED]> wrote: >Thanks Ted! > >I wonder if it would make more sense to port it to 0.90.X or upgrade to >0.92. > >Cosmin > >On 2/2/12 5:03 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: > >>HBASE-4838 ports HBASE-2856 to 0.92 >> >>FYI >> >>On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote: >> >>> (sorry for the damaged subject :)) >>> >>> >>> Hey Jon, >>> We have two column families. >>> There are no filters and there's a full table scan. We're not skipping >>> rows. >>> I did see however a single time that we had one qualifier "fault" in >>>the >>> job counters (it was missing, and it wasn't supposed to be missing). >>> However that was only once and it doesn't happen when we encounter >>>missing >>> rows. >>> >>> We're getting this behavior consistently although I couldn't figure a >>>way >>> to reproduce it. I'll try running multiple instances of the job in >>> parallel to figure out if that would affect the outcome. >>> I'll probably have to add more debugging for the affected rows and dig >>> deeper. >>> >>> HBASE-2856 is a pretty large issue - do you think it could be related >>>to >>> what I'm seeing? If so it could help me reproduce it. >>> >>> Thanks, >>> Cosmin >>> >>> >>> >>> >>> On 2/1/12 11:30 PM, "Jonathan Hsieh" <[EMAIL PROTECTED]> wrote: >>> >>> >Cosmin, >>> > >>> >How many column families to you have in this table? Are you using >>>any >>> >filters in you HBase scans? Are you using skip rows that may not have >>> >qualifiers present? >>> > >>> >There are a few known issues with multi-CF atomicity and a recent one >>> >about >>> >flushes that may be related to this problem. There HBASE-2856, a fix >>> >having to do with flushes which is pretty intricate and only in 0.92. >>> > >>> >Jon. >>> > >>> >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> >>>wrote: >>> > >>> >> We have a MR job that runs every few minutes on some time series >>>data >>> >> which is continuously updated (never deleted). >>> >> Every few (in the range of tens to hundreds) runs the map task that >>> >>covers >>> >> the last region will get fewer input records (off by 500-5000 rows) >>> >>without >>> >> any splits happening. This lower number of input records could >>>persist >>> >>for >>> >> a few MR runs, but will eventually get back to the "correct" value. >>> >> >>> >> This drop can be seen both in the "map input records" metric but >>>it's >>> >> correlated with the metrics that get computed by the MR job (so it's >>> >>not a >>> >> MR counter bug). >>> >> >>> >> There are no exceptions in the MR job, or in the region server and >>>this >>> >> doesn't seem to be correlated with any compaction, split or region >>> >>movement. >>> >> The only "variable" in this scenario is that new data gets injected >>> >> continuously (and the actual MR job which is idempotent) >>> >> >>> >> This entire puzzle takes place on HBase 0.90.5 ish (12 dec 2011) >>>on >>> >>top >>> >> of Hadoop cdh3u2. >>> >> >>> >> Cosmin >>> >> >>> >> >>> >> >>> >> >>> > >>> > >>> >-- >>> >// Jonathan Hsieh (shay) >>> >// Software Engineer, Cloudera >>> >// [EMAIL PROTECTED] >>> +
Cosmin Lehene 2012-02-14, 20:02
-
RE: MR job "randomly" scans up thousands of rows less than the it should.Amitanand Aiyer 2012-02-14, 21:06
Hi Cosmin,
https://issues.apache.org/jira/browse/HBASE-4485 might be applicable. The patch was included in the fix for 2856. Cheers, -Amit ________________________________________ From: Cosmin Lehene [[EMAIL PROTECTED]] Sent: Tuesday, February 14, 2012 12:02 PM To: [EMAIL PROTECTED] Subject: Re: MR job "randomly" scans up thousands of rows less than the it should. I just got back on this issue. Initially the behavior we've seen (missing rows) wouldn't reproduce on 0.90 using TestAcidGuarantees. However, if the puts in the writer threads include additional rows the scanners will start reading less rows. This reproduces consistently on 0.90 and seems to be working correctly on 0.92. HBASE-2856/HBASE-4838 are probably the solution, although there's a chance it's some other fix on 0.92 (ideas?) We're undecided whether backporting to 0.90 vs upgrading the affected clusters to 0.92 would be better? Also is there interest for this fix on 0.90? Thanks, Cosmin On 2/6/12 6:25 PM, "Cosmin Lehene" <[EMAIL PROTECTED]> wrote: >Thanks Ted! > >I wonder if it would make more sense to port it to 0.90.X or upgrade to >0.92. > >Cosmin > >On 2/2/12 5:03 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: > >>HBASE-4838 ports HBASE-2856 to 0.92 >> >>FYI >> >>On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote: >> >>> (sorry for the damaged subject :)) >>> >>> >>> Hey Jon, >>> We have two column families. >>> There are no filters and there's a full table scan. We're not skipping >>> rows. >>> I did see however a single time that we had one qualifier "fault" in >>>the >>> job counters (it was missing, and it wasn't supposed to be missing). >>> However that was only once and it doesn't happen when we encounter >>>missing >>> rows. >>> >>> We're getting this behavior consistently although I couldn't figure a >>>way >>> to reproduce it. I'll try running multiple instances of the job in >>> parallel to figure out if that would affect the outcome. >>> I'll probably have to add more debugging for the affected rows and dig >>> deeper. >>> >>> HBASE-2856 is a pretty large issue - do you think it could be related >>>to >>> what I'm seeing? If so it could help me reproduce it. >>> >>> Thanks, >>> Cosmin >>> >>> >>> >>> >>> On 2/1/12 11:30 PM, "Jonathan Hsieh" <[EMAIL PROTECTED]> wrote: >>> >>> >Cosmin, >>> > >>> >How many column families to you have in this table? Are you using >>>any >>> >filters in you HBase scans? Are you using skip rows that may not have >>> >qualifiers present? >>> > >>> >There are a few known issues with multi-CF atomicity and a recent one >>> >about >>> >flushes that may be related to this problem. There HBASE-2856, a fix >>> >having to do with flushes which is pretty intricate and only in 0.92. >>> > >>> >Jon. >>> > >>> >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> >>>wrote: >>> > >>> >> We have a MR job that runs every few minutes on some time series >>>data >>> >> which is continuously updated (never deleted). >>> >> Every few (in the range of tens to hundreds) runs the map task that >>> >>covers >>> >> the last region will get fewer input records (off by 500-5000 rows) >>> >>without >>> >> any splits happening. This lower number of input records could >>>persist >>> >>for >>> >> a few MR runs, but will eventually get back to the "correct" value. >>> >> >>> >> This drop can be seen both in the "map input records" metric but >>>it's >>> >> correlated with the metrics that get computed by the MR job (so it's >>> >>not a >>> >> MR counter bug). >>> >> >>> >> There are no exceptions in the MR job, or in the region server and >>>this >>> >> doesn't seem to be correlated with any compaction, split or region >>> >>movement. >>> >> The only "variable" in this scenario is that new data gets injected >>> >> continuously (and the actual MR job which is idempotent) >>> >> >>> >> This entire puzzle takes place on HBase 0.90.5 ish (12 dec 2011) >>>on >>> >>top >>> >> of Hadoop cdh3u2. >>> >> >>> >> Cosmin +
Amitanand Aiyer 2012-02-14, 21:06
-
Re: MR job "randomly" scans up thousands of rows less than the it should.Cosmin Lehene 2012-02-15, 11:33
Amit, HBASE-4485 describes the behavior I'm seeing, thanks.
Looking over the patches I'm under the impression that HBASE-4485 which is a subtask of HBASE-2856 was back ported through HBASE-4838 to 0.92 by Lars. Am I wrong? Thanks, Cosmin On 2/14/12 11:06 PM, "Amitanand Aiyer" <[EMAIL PROTECTED]> wrote: >Hi Cosmin, > https://issues.apache.org/jira/browse/HBASE-4485 might be applicable. > > The patch was included in the fix for 2856. > >Cheers, >-Amit > >________________________________________ >From: Cosmin Lehene [[EMAIL PROTECTED]] >Sent: Tuesday, February 14, 2012 12:02 PM >To: [EMAIL PROTECTED] >Subject: Re: MR job "randomly" scans up thousands of rows less than the >it should. > >I just got back on this issue. Initially the behavior we've seen (missing >rows) wouldn't reproduce on 0.90 using TestAcidGuarantees. >However, if the puts in the writer threads include additional rows the >scanners will start reading less rows. This reproduces consistently on >0.90 and seems to be working correctly on 0.92. > >HBASE-2856/HBASE-4838 are probably the solution, although there's a chance >it's some other fix on 0.92 (ideas?) > >We're undecided whether backporting to 0.90 vs upgrading the affected >clusters to 0.92 would be better? >Also is there interest for this fix on 0.90? > >Thanks, >Cosmin > >On 2/6/12 6:25 PM, "Cosmin Lehene" <[EMAIL PROTECTED]> wrote: > >>Thanks Ted! >> >>I wonder if it would make more sense to port it to 0.90.X or upgrade to >>0.92. >> >>Cosmin >> >>On 2/2/12 5:03 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >> >>>HBASE-4838 ports HBASE-2856 to 0.92 >>> >>>FYI >>> >>>On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote: >>> >>>> (sorry for the damaged subject :)) >>>> >>>> >>>> Hey Jon, >>>> We have two column families. >>>> There are no filters and there's a full table scan. We're not skipping >>>> rows. >>>> I did see however a single time that we had one qualifier "fault" in >>>>the >>>> job counters (it was missing, and it wasn't supposed to be missing). >>>> However that was only once and it doesn't happen when we encounter >>>>missing >>>> rows. >>>> >>>> We're getting this behavior consistently although I couldn't figure a >>>>way >>>> to reproduce it. I'll try running multiple instances of the job in >>>> parallel to figure out if that would affect the outcome. >>>> I'll probably have to add more debugging for the affected rows and dig >>>> deeper. >>>> >>>> HBASE-2856 is a pretty large issue - do you think it could be related >>>>to >>>> what I'm seeing? If so it could help me reproduce it. >>>> >>>> Thanks, >>>> Cosmin >>>> >>>> >>>> >>>> >>>> On 2/1/12 11:30 PM, "Jonathan Hsieh" <[EMAIL PROTECTED]> wrote: >>>> >>>> >Cosmin, >>>> > >>>> >How many column families to you have in this table? Are you using >>>>any >>>> >filters in you HBase scans? Are you using skip rows that may not >>>>have >>>> >qualifiers present? >>>> > >>>> >There are a few known issues with multi-CF atomicity and a recent one >>>> >about >>>> >flushes that may be related to this problem. There HBASE-2856, a fix >>>> >having to do with flushes which is pretty intricate and only in 0.92. >>>> > >>>> >Jon. >>>> > >>>> >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> >>>>wrote: >>>> > >>>> >> We have a MR job that runs every few minutes on some time series >>>>data >>>> >> which is continuously updated (never deleted). >>>> >> Every few (in the range of tens to hundreds) runs the map task that >>>> >>covers >>>> >> the last region will get fewer input records (off by 500-5000 rows) >>>> >>without >>>> >> any splits happening. This lower number of input records could >>>>persist >>>> >>for >>>> >> a few MR runs, but will eventually get back to the "correct" value. >>>> >> >>>> >> This drop can be seen both in the "map input records" metric but >>>>it's >>>> >> correlated with the metrics that get computed by the MR job (so >>>>it's >>>> >>not a >>>> >> MR counter bug). >>>> >> >>>> >> There are no exceptions in the MR job, or in the region server and +
Cosmin Lehene 2012-02-15, 11:33
-
Re: MR job "randomly" scans up thousands of rows less than the it should.Cosmin Lehene 2012-03-02, 21:45
Following up on this.
Back porting HBASE-4485 didn't seem to help. We were a bit under pressure and I didn't have time to investigate deeper (there's a small chance I missed something during back port) We eventually upgraded to 0.92 which fixed the problem :) Thanks a lot for helping with this, Cosmin On 2/15/12 1:33 PM, "Cosmin Lehene" <[EMAIL PROTECTED]> wrote: >Amit, HBASE-4485 describes the behavior I'm seeing, thanks. > >Looking over the patches I'm under the impression that HBASE-4485 which >is a subtask of HBASE-2856 was back ported through HBASE-4838 to 0.92 by >Lars. >Am I wrong? > >Thanks, >Cosmin > > >On 2/14/12 11:06 PM, "Amitanand Aiyer" <[EMAIL PROTECTED]> wrote: > >>Hi Cosmin, >> https://issues.apache.org/jira/browse/HBASE-4485 might be applicable. >> >> The patch was included in the fix for 2856. >> >>Cheers, >>-Amit >> >>________________________________________ >>From: Cosmin Lehene [[EMAIL PROTECTED]] >>Sent: Tuesday, February 14, 2012 12:02 PM >>To: [EMAIL PROTECTED] >>Subject: Re: MR job "randomly" scans up thousands of rows less than the >>it should. >> >>I just got back on this issue. Initially the behavior we've seen (missing >>rows) wouldn't reproduce on 0.90 using TestAcidGuarantees. >>However, if the puts in the writer threads include additional rows the >>scanners will start reading less rows. This reproduces consistently on >>0.90 and seems to be working correctly on 0.92. >> >>HBASE-2856/HBASE-4838 are probably the solution, although there's a >>chance >>it's some other fix on 0.92 (ideas?) >> >>We're undecided whether backporting to 0.90 vs upgrading the affected >>clusters to 0.92 would be better? >>Also is there interest for this fix on 0.90? >> >>Thanks, >>Cosmin >> >>On 2/6/12 6:25 PM, "Cosmin Lehene" <[EMAIL PROTECTED]> wrote: >> >>>Thanks Ted! >>> >>>I wonder if it would make more sense to port it to 0.90.X or upgrade to >>>0.92. >>> >>>Cosmin >>> >>>On 2/2/12 5:03 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >>> >>>>HBASE-4838 ports HBASE-2856 to 0.92 >>>> >>>>FYI >>>> >>>>On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> >>>>wrote: >>>> >>>>> (sorry for the damaged subject :)) >>>>> >>>>> >>>>> Hey Jon, >>>>> We have two column families. >>>>> There are no filters and there's a full table scan. We're not >>>>>skipping >>>>> rows. >>>>> I did see however a single time that we had one qualifier "fault" in >>>>>the >>>>> job counters (it was missing, and it wasn't supposed to be missing). >>>>> However that was only once and it doesn't happen when we encounter >>>>>missing >>>>> rows. >>>>> >>>>> We're getting this behavior consistently although I couldn't figure a >>>>>way >>>>> to reproduce it. I'll try running multiple instances of the job in >>>>> parallel to figure out if that would affect the outcome. >>>>> I'll probably have to add more debugging for the affected rows and >>>>>dig >>>>> deeper. >>>>> >>>>> HBASE-2856 is a pretty large issue - do you think it could be related >>>>>to >>>>> what I'm seeing? If so it could help me reproduce it. >>>>> >>>>> Thanks, >>>>> Cosmin >>>>> >>>>> >>>>> >>>>> >>>>> On 2/1/12 11:30 PM, "Jonathan Hsieh" <[EMAIL PROTECTED]> wrote: >>>>> >>>>> >Cosmin, >>>>> > >>>>> >How many column families to you have in this table? Are you using >>>>>any >>>>> >filters in you HBase scans? Are you using skip rows that may not >>>>>have >>>>> >qualifiers present? >>>>> > >>>>> >There are a few known issues with multi-CF atomicity and a recent >>>>>one >>>>> >about >>>>> >flushes that may be related to this problem. There HBASE-2856, a >>>>>fix >>>>> >having to do with flushes which is pretty intricate and only in >>>>>0.92. >>>>> > >>>>> >Jon. >>>>> > >>>>> >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> >>>>>wrote: >>>>> > >>>>> >> We have a MR job that runs every few minutes on some time series >>>>>data >>>>> >> which is continuously updated (never deleted). >>>>> >> Every few (in the range of tens to hundreds) runs the map task +
Cosmin Lehene 2012-03-02, 21:45
|