Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: Get on a row with multiple columns


Copy link to this message
-
Re: Get on a row with multiple columns
lars hofhansl 2013-02-09, 08:17
Should be set to true. If tcpnodelay is set to true, Nagle's is disabled.

-- Lars

________________________________
 From: Varun Sharma <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
Sent: Saturday, February 9, 2013 12:11 AM
Subject: Re: Get on a row with multiple columns
 

Okay I did my research - these need to be set to false. I agree.
On Sat, Feb 9, 2013 at 12:05 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

I have ipc.client.tcpnodelay, ipc.server.tcpnodelay set to false and the hbase one - [hbase].ipc.client.tcpnodelay set to true. Do these induce network latency ?
>
>
>On Fri, Feb 8, 2013 at 11:57 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>Sorry.. I meant set these two config parameters to true (not false as I state below).
>>
>>
>>
>>
>>----- Original Message -----
>>From: lars hofhansl <[EMAIL PROTECTED]>
>>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>>Cc:
>>Sent: Friday, February 8, 2013 11:41 PM
>>Subject: Re: Get on a row with multiple columns
>>
>>Only somewhat related. Seeing the magic 40ms random read time there. Did you disable Nagle's?
>>(set hbase.ipc.client.tcpnodelay and ipc.server.tcpnodelay to false in hbase-site.xml).
>>
>>________________________________
>>From: Varun Sharma <[EMAIL PROTECTED]>
>>To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>>Sent: Friday, February 8, 2013 10:45 PM
>>Subject: Re: Get on a row with multiple columns
>>
>>The use case is like your twitter feed. Tweets from people u follow. When
>>someone unfollows, you need to delete a bunch of his tweets from the
>>following feed. So, its frequent, and we are essentially running into some
>>extreme corner cases like the one above. We need high write throughput for
>>this, since when someone tweets, we need to fanout the tweet to all the
>>followers. We need the ability to do fast deletes (unfollow) and fast adds
>>(follow) and also be able to do fast random gets - when a real user loads
>>the feed. I doubt we will able to play much with the schema here since we
>>need to support a bunch of use cases.
>>
>>@lars: It does not take 30 seconds to place 300 delete markers. It takes 30
>>seconds to first find which of those 300 pins are in the set of columns
>>present - this invokes 300 gets and then place the appropriate delete
>>markers. Note that we can have tens of thousands of columns in a single row
>>so a single get is not cheap.
>>
>>If we were to just place delete markers, that is very fast. But when
>>started doing that, our random read performance suffered because of too
>>many delete markers. The 90th percentile on random reads shot up from 40
>>milliseconds to 150 milliseconds, which is not acceptable for our usecase.
>>
>>Thanks
>>Varun
>>
>>On Fri, Feb 8, 2013 at 10:33 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>
>>> Can you organize your columns and then delete by column family?
>>>
>>> deleteColumn without specifying a TS is expensive, since HBase first has
>>> to figure out what the latest TS is.
>>>
>>> Should be better in 0.94.1 or later since deletes are batched like Puts
>>> (still need to retrieve the latest version, though).
>>>
>>> In 0.94.3 or later you can also the BulkDeleteEndPoint, which basically
>>> let's specify a scan condition and then place specific delete marker for
>>> all KVs encountered.
>>>
>>>
>>> If you wanted to get really
>>> fancy, you could hook up a coprocessor to the compaction process and
>>> simply filter all KVs you no longer want (without ever placing any
>>> delete markers).
>>>
>>>
>>> Are you saying it takes 15 seconds to place 300 version delete markers?!
>>>
>>>
>>> -- Lars
>>>
>>>
>>>
>>> ________________________________
>>>  From: Varun Sharma <[EMAIL PROTECTED]>
>>> To: [EMAIL PROTECTED]
>>> Sent: Friday, February 8, 2013 10:05 PM
>>> Subject: Re: Get on a row with multiple columns
>>>
>>> We are given a set of 300 columns to delete. I tested two cases:
>>>
>>> 1) deleteColumns() - with the 's'
>>>
>>> This function simply adds delete markers for 300 columns, in our case,