Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: Get on a row with multiple columns


Copy link to this message
-
Re: Get on a row with multiple columns
Okay I did my research - these need to be set to false. I agree.

On Sat, Feb 9, 2013 at 12:05 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> I have ipc.client.tcpnodelay, ipc.server.tcpnodelay set to false and the
> hbase one - [hbase].ipc.client.tcpnodelay set to true. Do these induce
> network latency ?
>
> On Fri, Feb 8, 2013 at 11:57 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Sorry.. I meant set these two config parameters to true (not false as I
>> state below).
>>
>>
>>
>> ----- Original Message -----
>> From: lars hofhansl <[EMAIL PROTECTED]>
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Cc:
>> Sent: Friday, February 8, 2013 11:41 PM
>> Subject: Re: Get on a row with multiple columns
>>
>> Only somewhat related. Seeing the magic 40ms random read time there. Did
>> you disable Nagle's?
>> (set hbase.ipc.client.tcpnodelay and ipc.server.tcpnodelay to false in
>> hbase-site.xml).
>>
>> ________________________________
>> From: Varun Sharma <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>> Sent: Friday, February 8, 2013 10:45 PM
>> Subject: Re: Get on a row with multiple columns
>>
>> The use case is like your twitter feed. Tweets from people u follow. When
>> someone unfollows, you need to delete a bunch of his tweets from the
>> following feed. So, its frequent, and we are essentially running into some
>> extreme corner cases like the one above. We need high write throughput for
>> this, since when someone tweets, we need to fanout the tweet to all the
>> followers. We need the ability to do fast deletes (unfollow) and fast adds
>> (follow) and also be able to do fast random gets - when a real user loads
>> the feed. I doubt we will able to play much with the schema here since we
>> need to support a bunch of use cases.
>>
>> @lars: It does not take 30 seconds to place 300 delete markers. It takes
>> 30
>> seconds to first find which of those 300 pins are in the set of columns
>> present - this invokes 300 gets and then place the appropriate delete
>> markers. Note that we can have tens of thousands of columns in a single
>> row
>> so a single get is not cheap.
>>
>> If we were to just place delete markers, that is very fast. But when
>> started doing that, our random read performance suffered because of too
>> many delete markers. The 90th percentile on random reads shot up from 40
>> milliseconds to 150 milliseconds, which is not acceptable for our usecase.
>>
>> Thanks
>> Varun
>>
>> On Fri, Feb 8, 2013 at 10:33 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>
>> > Can you organize your columns and then delete by column family?
>> >
>> > deleteColumn without specifying a TS is expensive, since HBase first has
>> > to figure out what the latest TS is.
>> >
>> > Should be better in 0.94.1 or later since deletes are batched like Puts
>> > (still need to retrieve the latest version, though).
>> >
>> > In 0.94.3 or later you can also the BulkDeleteEndPoint, which basically
>> > let's specify a scan condition and then place specific delete marker for
>> > all KVs encountered.
>> >
>> >
>> > If you wanted to get really
>> > fancy, you could hook up a coprocessor to the compaction process and
>> > simply filter all KVs you no longer want (without ever placing any
>> > delete markers).
>> >
>> >
>> > Are you saying it takes 15 seconds to place 300 version delete markers?!
>> >
>> >
>> > -- Lars
>> >
>> >
>> >
>> > ________________________________
>> >  From: Varun Sharma <[EMAIL PROTECTED]>
>> > To: [EMAIL PROTECTED]
>> > Sent: Friday, February 8, 2013 10:05 PM
>> > Subject: Re: Get on a row with multiple columns
>> >
>> > We are given a set of 300 columns to delete. I tested two cases:
>> >
>> > 1) deleteColumns() - with the 's'
>> >
>> > This function simply adds delete markers for 300 columns, in our case,
>> > typically only a fraction of these columns are actually present - 10.
>> After
>> > starting to use deleteColumns, we starting seeing a drop in cluster wide