Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: Get on a row with multiple columns


Copy link to this message
-
Re: Get on a row with multiple columns
I have ipc.client.tcpnodelay, ipc.server.tcpnodelay set to false and the
hbase one - [hbase].ipc.client.tcpnodelay set to true. Do these induce
network latency ?

On Fri, Feb 8, 2013 at 11:57 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Sorry.. I meant set these two config parameters to true (not false as I
> state below).
>
>
>
> ----- Original Message -----
> From: lars hofhansl <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Cc:
> Sent: Friday, February 8, 2013 11:41 PM
> Subject: Re: Get on a row with multiple columns
>
> Only somewhat related. Seeing the magic 40ms random read time there. Did
> you disable Nagle's?
> (set hbase.ipc.client.tcpnodelay and ipc.server.tcpnodelay to false in
> hbase-site.xml).
>
> ________________________________
> From: Varun Sharma <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Friday, February 8, 2013 10:45 PM
> Subject: Re: Get on a row with multiple columns
>
> The use case is like your twitter feed. Tweets from people u follow. When
> someone unfollows, you need to delete a bunch of his tweets from the
> following feed. So, its frequent, and we are essentially running into some
> extreme corner cases like the one above. We need high write throughput for
> this, since when someone tweets, we need to fanout the tweet to all the
> followers. We need the ability to do fast deletes (unfollow) and fast adds
> (follow) and also be able to do fast random gets - when a real user loads
> the feed. I doubt we will able to play much with the schema here since we
> need to support a bunch of use cases.
>
> @lars: It does not take 30 seconds to place 300 delete markers. It takes 30
> seconds to first find which of those 300 pins are in the set of columns
> present - this invokes 300 gets and then place the appropriate delete
> markers. Note that we can have tens of thousands of columns in a single row
> so a single get is not cheap.
>
> If we were to just place delete markers, that is very fast. But when
> started doing that, our random read performance suffered because of too
> many delete markers. The 90th percentile on random reads shot up from 40
> milliseconds to 150 milliseconds, which is not acceptable for our usecase.
>
> Thanks
> Varun
>
> On Fri, Feb 8, 2013 at 10:33 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > Can you organize your columns and then delete by column family?
> >
> > deleteColumn without specifying a TS is expensive, since HBase first has
> > to figure out what the latest TS is.
> >
> > Should be better in 0.94.1 or later since deletes are batched like Puts
> > (still need to retrieve the latest version, though).
> >
> > In 0.94.3 or later you can also the BulkDeleteEndPoint, which basically
> > let's specify a scan condition and then place specific delete marker for
> > all KVs encountered.
> >
> >
> > If you wanted to get really
> > fancy, you could hook up a coprocessor to the compaction process and
> > simply filter all KVs you no longer want (without ever placing any
> > delete markers).
> >
> >
> > Are you saying it takes 15 seconds to place 300 version delete markers?!
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Varun Sharma <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Friday, February 8, 2013 10:05 PM
> > Subject: Re: Get on a row with multiple columns
> >
> > We are given a set of 300 columns to delete. I tested two cases:
> >
> > 1) deleteColumns() - with the 's'
> >
> > This function simply adds delete markers for 300 columns, in our case,
> > typically only a fraction of these columns are actually present - 10.
> After
> > starting to use deleteColumns, we starting seeing a drop in cluster wide
> > random read performance - 90th percentile latency worsened, so did 99th
> > probably because of having to traverse delete markers. I attribute this
> to
> > profusion of delete markers in the cluster. Major compactions slowed down
> > by almost 50 percent probably because of having to clean out