Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: Get on a row with multiple columns


Copy link to this message
-
Re: Get on a row with multiple columns
lars hofhansl 2013-02-09, 16:46
The answer is "probably" :)
It's disabled in 0.96 by default. Check out HBASE-7008 (https://issues.apache.org/jira/browse/HBASE-7008) and the discussion there.

Also check out the discussion in HBASE-5943 and HADOOP-8069 (https://issues.apache.org/jira/browse/HADOOP-8069)
-- Lars

________________________________
 From: Jean-Marc Spaggiari <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Saturday, February 9, 2013 5:02 AM
Subject: Re: Get on a row with multiple columns
 
Lars, should we always consider disabling Nagle? What's the down side?

JM

2013/2/9, Varun Sharma <[EMAIL PROTECTED]>:
> Yeah, I meant true...
>
> On Sat, Feb 9, 2013 at 12:17 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Should be set to true. If tcpnodelay is set to true, Nagle's is disabled.
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>>  From: Varun Sharma <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>> Sent: Saturday, February 9, 2013 12:11 AM
>> Subject: Re: Get on a row with multiple columns
>>
>>
>> Okay I did my research - these need to be set to false. I agree.
>>
>>
>> On Sat, Feb 9, 2013 at 12:05 AM, Varun Sharma <[EMAIL PROTECTED]>
>> wrote:
>>
>> I have ipc.client.tcpnodelay, ipc.server.tcpnodelay set to false and the
>> hbase one - [hbase].ipc.client.tcpnodelay set to true. Do these induce
>> network latency ?
>> >
>> >
>> >On Fri, Feb 8, 2013 at 11:57 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>> >
>> >Sorry.. I meant set these two config parameters to true (not false as I
>> state below).
>> >>
>> >>
>> >>
>> >>
>> >>----- Original Message -----
>> >>From: lars hofhansl <[EMAIL PROTECTED]>
>> >>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> >>Cc:
>> >>Sent: Friday, February 8, 2013 11:41 PM
>> >>Subject: Re: Get on a row with multiple columns
>> >>
>> >>Only somewhat related. Seeing the magic 40ms random read time there.
>> >> Did
>> you disable Nagle's?
>> >>(set hbase.ipc.client.tcpnodelay and ipc.server.tcpnodelay to false in
>> hbase-site.xml).
>> >>
>> >>________________________________
>> >>From: Varun Sharma <[EMAIL PROTECTED]>
>> >>To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>> >>Sent: Friday, February 8, 2013 10:45 PM
>> >>Subject: Re: Get on a row with multiple columns
>> >>
>> >>The use case is like your twitter feed. Tweets from people u follow.
>> >> When
>> >>someone unfollows, you need to delete a bunch of his tweets from the
>> >>following feed. So, its frequent, and we are essentially running into
>> some
>> >>extreme corner cases like the one above. We need high write throughput
>> for
>> >>this, since when someone tweets, we need to fanout the tweet to all the
>> >>followers. We need the ability to do fast deletes (unfollow) and fast
>> adds
>> >>(follow) and also be able to do fast random gets - when a real user
>> >> loads
>> >>the feed. I doubt we will able to play much with the schema here since
>> >> we
>> >>need to support a bunch of use cases.
>> >>
>> >>@lars: It does not take 30 seconds to place 300 delete markers. It
>> >> takes
>> 30
>> >>seconds to first find which of those 300 pins are in the set of columns
>> >>present - this invokes 300 gets and then place the appropriate delete
>> >>markers. Note that we can have tens of thousands of columns in a single
>> row
>> >>so a single get is not cheap.
>> >>
>> >>If we were to just place delete markers, that is very fast. But when
>> >>started doing that, our random read performance suffered because of too
>> >>many delete markers. The 90th percentile on random reads shot up from
>> >> 40
>> >>milliseconds to 150 milliseconds, which is not acceptable for our
>> usecase.
>> >>
>> >>Thanks
>> >>Varun
>> >>
>> >>On Fri, Feb 8, 2013 at 10:33 PM, lars hofhansl <[EMAIL PROTECTED]>
>> >> wrote:
>> >>
>> >>> Can you organize your columns and then delete by column family?
>> >>>
>> >>> deleteColumn without specifying a TS is expensive, since HBase first
>> has
>> >>> to figure out what the latest TS is.