Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> MR job "randomly" scans up thousands of rows less than the it should.


+
Cosmin Lehene 2012-02-02, 04:46
+
Jonathan Hsieh 2012-02-02, 07:30
+
Cosmin Lehene 2012-02-03, 00:46
+
Ted Yu 2012-02-03, 01:03
+
Cosmin Lehene 2012-02-06, 16:25
+
Cosmin Lehene 2012-02-14, 20:02
+
Amitanand Aiyer 2012-02-14, 21:06
+
Cosmin Lehene 2012-02-15, 11:33
Copy link to this message
-
Re: MR job "randomly" scans up thousands of rows less than the it should.
Following up on this.

Back porting HBASE-4485 didn't seem to help.
We were a bit under pressure and I didn't have time to investigate deeper
(there's a small chance I missed something during back port)

We eventually upgraded to 0.92 which fixed the problem :)

Thanks a lot for helping with this,
Cosmin

On 2/15/12 1:33 PM, "Cosmin Lehene" <[EMAIL PROTECTED]> wrote:

>Amit, HBASE-4485 describes the behavior I'm seeing, thanks.
>
>Looking over the patches I'm under the impression  that HBASE-4485 which
>is a subtask of HBASE-2856 was back ported through HBASE-4838 to 0.92 by
>Lars.
>Am I wrong?
>
>Thanks,
>Cosmin
>
>
>On 2/14/12 11:06 PM, "Amitanand Aiyer" <[EMAIL PROTECTED]> wrote:
>
>>Hi Cosmin,
>>  https://issues.apache.org/jira/browse/HBASE-4485 might be applicable.
>>
>>  The patch was included in the fix for 2856.
>>
>>Cheers,
>>-Amit
>>
>>________________________________________
>>From: Cosmin Lehene [[EMAIL PROTECTED]]
>>Sent: Tuesday, February 14, 2012 12:02 PM
>>To: [EMAIL PROTECTED]
>>Subject: Re: MR job "randomly" scans up thousands of rows less than the
>>it should.
>>
>>I just got back on this issue. Initially the behavior we've seen (missing
>>rows) wouldn't reproduce on 0.90 using TestAcidGuarantees.
>>However, if the puts in the writer threads include additional rows the
>>scanners will start reading less rows. This reproduces consistently on
>>0.90 and seems to be working correctly on 0.92.
>>
>>HBASE-2856/HBASE-4838 are probably the solution, although there's a
>>chance
>>it's some other fix on 0.92 (ideas?)
>>
>>We're undecided whether backporting to 0.90 vs upgrading the affected
>>clusters to 0.92 would be better?
>>Also is there interest for this fix on 0.90?
>>
>>Thanks,
>>Cosmin
>>
>>On 2/6/12 6:25 PM, "Cosmin Lehene" <[EMAIL PROTECTED]> wrote:
>>
>>>Thanks Ted!
>>>
>>>I wonder if it would make more sense to port it to 0.90.X or upgrade to
>>>0.92.
>>>
>>>Cosmin
>>>
>>>On 2/2/12 5:03 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote:
>>>
>>>>HBASE-4838 ports HBASE-2856 to 0.92
>>>>
>>>>FYI
>>>>
>>>>On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <[EMAIL PROTECTED]>
>>>>wrote:
>>>>
>>>>> (sorry for the damaged subject :))
>>>>>
>>>>>
>>>>> Hey Jon,
>>>>> We have two column families.
>>>>> There are no filters and there's a full table scan. We're not
>>>>>skipping
>>>>> rows.
>>>>> I did see however a single time that we had one qualifier "fault" in
>>>>>the
>>>>> job counters (it was missing, and it wasn't supposed to be missing).
>>>>> However that was only once and it doesn't happen when we encounter
>>>>>missing
>>>>> rows.
>>>>>
>>>>> We're getting this behavior consistently although I couldn't figure a
>>>>>way
>>>>> to reproduce it. I'll try running multiple instances of the job in
>>>>> parallel to figure out if that would affect the outcome.
>>>>> I'll probably have to add more debugging for the affected rows and
>>>>>dig
>>>>> deeper.
>>>>>
>>>>> HBASE-2856 is a pretty large issue - do you think it could be related
>>>>>to
>>>>> what I'm seeing? If so it could help me reproduce it.
>>>>>
>>>>> Thanks,
>>>>> Cosmin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2/1/12 11:30 PM, "Jonathan Hsieh" <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> >Cosmin,
>>>>> >
>>>>> >How many column families to you have in this table?   Are you using
>>>>>any
>>>>> >filters in you HBase scans?  Are you using skip rows that may not
>>>>>have
>>>>> >qualifiers present?
>>>>> >
>>>>> >There are a few known issues with multi-CF atomicity and a recent
>>>>>one
>>>>> >about
>>>>> >flushes that may be related to this problem.  There HBASE-2856, a
>>>>>fix
>>>>> >having to do with flushes which is pretty intricate and only in
>>>>>0.92.
>>>>> >
>>>>> >Jon.
>>>>> >
>>>>> >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[EMAIL PROTECTED]>
>>>>>wrote:
>>>>> >
>>>>> >> We have a MR job that runs every few minutes on some time series
>>>>>data
>>>>> >> which is continuously updated (never deleted).
>>>>> >> Every few (in the range of tens to hundreds) runs the map task