Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Get on a row with multiple columns


+
Varun Sharma 2013-02-09, 05:22
+
lars hofhansl 2013-02-09, 05:34
+
Varun Sharma 2013-02-09, 05:44
+
Ted Yu 2013-02-09, 05:55
+
Varun Sharma 2013-02-09, 06:05
+
lars hofhansl 2013-02-09, 06:33
+
Varun Sharma 2013-02-09, 06:45
+
Varun Sharma 2013-02-09, 06:57
+
lars hofhansl 2013-02-09, 07:31
+
lars hofhansl 2013-02-09, 07:41
+
lars hofhansl 2013-02-09, 07:57
+
Varun Sharma 2013-02-09, 08:05
+
Varun Sharma 2013-02-09, 08:11
+
lars hofhansl 2013-02-09, 08:17
+
Varun Sharma 2013-02-09, 08:29
Copy link to this message
-
Re: Get on a row with multiple columns
Jean-Marc Spaggiari 2013-02-09, 13:02
Lars, should we always consider disabling Nagle? What's the down side?

JM

2013/2/9, Varun Sharma <[EMAIL PROTECTED]>:
> Yeah, I meant true...
>
> On Sat, Feb 9, 2013 at 12:17 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Should be set to true. If tcpnodelay is set to true, Nagle's is disabled.
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>>  From: Varun Sharma <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>> Sent: Saturday, February 9, 2013 12:11 AM
>> Subject: Re: Get on a row with multiple columns
>>
>>
>> Okay I did my research - these need to be set to false. I agree.
>>
>>
>> On Sat, Feb 9, 2013 at 12:05 AM, Varun Sharma <[EMAIL PROTECTED]>
>> wrote:
>>
>> I have ipc.client.tcpnodelay, ipc.server.tcpnodelay set to false and the
>> hbase one - [hbase].ipc.client.tcpnodelay set to true. Do these induce
>> network latency ?
>> >
>> >
>> >On Fri, Feb 8, 2013 at 11:57 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>> >
>> >Sorry.. I meant set these two config parameters to true (not false as I
>> state below).
>> >>
>> >>
>> >>
>> >>
>> >>----- Original Message -----
>> >>From: lars hofhansl <[EMAIL PROTECTED]>
>> >>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> >>Cc:
>> >>Sent: Friday, February 8, 2013 11:41 PM
>> >>Subject: Re: Get on a row with multiple columns
>> >>
>> >>Only somewhat related. Seeing the magic 40ms random read time there.
>> >> Did
>> you disable Nagle's?
>> >>(set hbase.ipc.client.tcpnodelay and ipc.server.tcpnodelay to false in
>> hbase-site.xml).
>> >>
>> >>________________________________
>> >>From: Varun Sharma <[EMAIL PROTECTED]>
>> >>To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>> >>Sent: Friday, February 8, 2013 10:45 PM
>> >>Subject: Re: Get on a row with multiple columns
>> >>
>> >>The use case is like your twitter feed. Tweets from people u follow.
>> >> When
>> >>someone unfollows, you need to delete a bunch of his tweets from the
>> >>following feed. So, its frequent, and we are essentially running into
>> some
>> >>extreme corner cases like the one above. We need high write throughput
>> for
>> >>this, since when someone tweets, we need to fanout the tweet to all the
>> >>followers. We need the ability to do fast deletes (unfollow) and fast
>> adds
>> >>(follow) and also be able to do fast random gets - when a real user
>> >> loads
>> >>the feed. I doubt we will able to play much with the schema here since
>> >> we
>> >>need to support a bunch of use cases.
>> >>
>> >>@lars: It does not take 30 seconds to place 300 delete markers. It
>> >> takes
>> 30
>> >>seconds to first find which of those 300 pins are in the set of columns
>> >>present - this invokes 300 gets and then place the appropriate delete
>> >>markers. Note that we can have tens of thousands of columns in a single
>> row
>> >>so a single get is not cheap.
>> >>
>> >>If we were to just place delete markers, that is very fast. But when
>> >>started doing that, our random read performance suffered because of too
>> >>many delete markers. The 90th percentile on random reads shot up from
>> >> 40
>> >>milliseconds to 150 milliseconds, which is not acceptable for our
>> usecase.
>> >>
>> >>Thanks
>> >>Varun
>> >>
>> >>On Fri, Feb 8, 2013 at 10:33 PM, lars hofhansl <[EMAIL PROTECTED]>
>> >> wrote:
>> >>
>> >>> Can you organize your columns and then delete by column family?
>> >>>
>> >>> deleteColumn without specifying a TS is expensive, since HBase first
>> has
>> >>> to figure out what the latest TS is.
>> >>>
>> >>> Should be better in 0.94.1 or later since deletes are batched like
>> >>> Puts
>> >>> (still need to retrieve the latest version, though).
>> >>>
>> >>> In 0.94.3 or later you can also the BulkDeleteEndPoint, which
>> >>> basically
>> >>> let's specify a scan condition and then place specific delete marker
>> for
>> >>> all KVs encountered.
>> >>>
>> >>>
>> >>> If you wanted to get really
>> >>> fancy, you could hook up a coprocessor to the compaction process and
+
lars hofhansl 2013-02-09, 16:46
+
Varun Sharma 2013-02-10, 22:35
+
Anoop Sam John 2013-02-11, 12:50
+
Varun Sharma 2013-02-11, 15:36
+
Varun Sharma 2013-02-11, 16:44
+
Varun Sharma 2013-02-11, 16:44
+
Ted Yu 2013-02-09, 06:09
+
Varun Sharma 2013-02-09, 06:16
+
Ted 2013-02-09, 06:29
+
lars hofhansl 2013-02-09, 06:34
+
Mrudula Madiraju 2013-08-14, 03:52