Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> MR job "randomly" scans up thousands of rows less than the it should.


+
Cosmin Lehene 2012-02-02, 04:46
+
Jonathan Hsieh 2012-02-02, 07:30
+
Cosmin Lehene 2012-02-03, 00:46
+
Ted Yu 2012-02-03, 01:03
+
Cosmin Lehene 2012-02-06, 16:25
+
Cosmin Lehene 2012-02-14, 20:02
+
Amitanand Aiyer 2012-02-14, 21:06
Copy link to this message
-
Re: MR job "randomly" scans up thousands of rows less than the it should.
Amit, HBASE-4485 describes the behavior I'm seeing, thanks.

Looking over the patches I'm under the impression  that HBASE-4485 which
is a subtask of HBASE-2856 was back ported through HBASE-4838 to 0.92 by
Lars.
Am I wrong?

Thanks,
Cosmin
On 2/14/12 11:06 PM, "Amitanand Aiyer" <[EMAIL PROTECTED]> wrote:

>Hi Cosmin,
>  https://issues.apache.org/jira/browse/HBASE-4485 might be applicable.
>
>  The patch was included in the fix for 2856.
>
>Cheers,
>-Amit
>
>________________________________________
>From: Cosmin Lehene [[EMAIL PROTECTED]]
>Sent: Tuesday, February 14, 2012 12:02 PM
>To: [EMAIL PROTECTED]
>Subject: Re: MR job "randomly" scans up thousands of rows less than the
>it should.
>
>I just got back on this issue. Initially the behavior we've seen (missing
>rows) wouldn't reproduce on 0.90 using TestAcidGuarantees.
>However, if the puts in the writer threads include additional rows the
>scanners will start reading less rows. This reproduces consistently on
>0.90 and seems to be working correctly on 0.92.
>
>HBASE-2856/HBASE-4838 are probably the solution, although there's a chance
>it's some other fix on 0.92 (ideas?)
>
>We're undecided whether backporting to 0.90 vs upgrading the affected
>clusters to 0.92 would be better?
>Also is there interest for this fix on 0.90?
>
>Thanks,
>Cosmin
>
>On 2/6/12 6:25 PM, "Cosmin Lehene" <[EMAIL PROTECTED]> wrote:
>
>>Thanks Ted!
>>
>>I wonder if it would make more sense to port it to 0.90.X or upgrade to
>>0.92.
>>
>>Cosmin
>>
>>On 2/2/12 5:03 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote:
>>
>>>HBASE-4838 ports HBASE-2856 to 0.92
>>>
>>>FYI
>>>
>>>On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <[EMAIL PROTECTED]> wrote:
>>>
>>>> (sorry for the damaged subject :))
>>>>
>>>>
>>>> Hey Jon,
>>>> We have two column families.
>>>> There are no filters and there's a full table scan. We're not skipping
>>>> rows.
>>>> I did see however a single time that we had one qualifier "fault" in
>>>>the
>>>> job counters (it was missing, and it wasn't supposed to be missing).
>>>> However that was only once and it doesn't happen when we encounter
>>>>missing
>>>> rows.
>>>>
>>>> We're getting this behavior consistently although I couldn't figure a
>>>>way
>>>> to reproduce it. I'll try running multiple instances of the job in
>>>> parallel to figure out if that would affect the outcome.
>>>> I'll probably have to add more debugging for the affected rows and dig
>>>> deeper.
>>>>
>>>> HBASE-2856 is a pretty large issue - do you think it could be related
>>>>to
>>>> what I'm seeing? If so it could help me reproduce it.
>>>>
>>>> Thanks,
>>>> Cosmin
>>>>
>>>>
>>>>
>>>>
>>>> On 2/1/12 11:30 PM, "Jonathan Hsieh" <[EMAIL PROTECTED]> wrote:
>>>>
>>>> >Cosmin,
>>>> >
>>>> >How many column families to you have in this table?   Are you using
>>>>any
>>>> >filters in you HBase scans?  Are you using skip rows that may not
>>>>have
>>>> >qualifiers present?
>>>> >
>>>> >There are a few known issues with multi-CF atomicity and a recent one
>>>> >about
>>>> >flushes that may be related to this problem.  There HBASE-2856, a fix
>>>> >having to do with flushes which is pretty intricate and only in 0.92.
>>>> >
>>>> >Jon.
>>>> >
>>>> >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[EMAIL PROTECTED]>
>>>>wrote:
>>>> >
>>>> >> We have a MR job that runs every few minutes on some time series
>>>>data
>>>> >> which is continuously updated (never deleted).
>>>> >> Every few (in the range of tens to hundreds) runs the map task that
>>>> >>covers
>>>> >> the last region will get fewer input records (off by 500-5000 rows)
>>>> >>without
>>>> >> any splits happening. This lower number of input records could
>>>>persist
>>>> >>for
>>>> >> a few MR runs, but will eventually get back to the "correct" value.
>>>> >>
>>>> >> This drop can be seen both in the "map input records" metric but
>>>>it's
>>>> >> correlated with the metrics that get computed by the MR job (so
>>>>it's
>>>> >>not a
>>>> >> MR counter bug).
>>>> >>
>>>> >> There are no exceptions in the MR job, or in the region server and
+
Cosmin Lehene 2012-03-02, 21:45