Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Slow full-table scans


Copy link to this message
-
Re: Slow full-table scans
It's possible that there is a bad or slower disk on Gurjeet's machine. I
think details of iostat and cpu would clear things up.

On Tue, Aug 21, 2012 at 4:33 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> I get roughly the same (~1.8s) - 100 rows, 200.000 columns, segment size
> 100
>
>
>
> ________________________________
>  From: Gurjeet Singh <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Tuesday, August 21, 2012 11:31 AM
>  Subject: Re: Slow full-table scans
>
> How does that compare with the newScanTable on your build ?
>
> Gurjeet
>
> On Tue, Aug 21, 2012 at 11:18 AM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> > Hmm... So I tried in HBase (current trunk).
> > I created 100 rows with 200.000 columns each (using your oldMakeTable).
> The creation took a bit, but scanning finished in 1.8s. (HBase in pseudo
> distributed mode - with your oldScanTable).
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: lars hofhansl <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > Cc:
> > Sent: Monday, August 20, 2012 7:50 PM
> > Subject: Re: Slow full-table scans
> >
> > Thanks Gurjeet,
> >
> > I'll (hopefully) have a look tomorrow.
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Gurjeet Singh <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> > Cc:
> > Sent: Monday, August 20, 2012 7:42 PM
> > Subject: Re: Slow full-table scans
> >
> > Hi Lars,
> >
> > Here is a testcase:
> >
> > https://gist.github.com/3410948
> >
> > Benchmarking code:
> >
> > https://gist.github.com/3410952
> >
> > Try running it with numRows = 100, numCols = 200000, segmentSize = 1000
> >
> > Gurjeet
> >
> >
> > On Thu, Aug 16, 2012 at 11:40 AM, Gurjeet Singh <[EMAIL PROTECTED]>
> wrote:
> >> Sure - I can create a minimal testcase and send it along.
> >>
> >> Gurjeet
> >>
> >> On Thu, Aug 16, 2012 at 11:36 AM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >>> That's interesting.
> >>> Could you share your old and new schema. I would like to track down
> the performance problems you saw.
> >>> (If you had a demo program that populates your rows with 200.000
> columns in a way where you saw the performance issues, that'd be even
> better, but not necessary).
> >>>
> >>>
> >>> -- Lars
> >>>
> >>>
> >>>
> >>> ________________________________
> >>>  From: Gurjeet Singh <[EMAIL PROTECTED]>
> >>> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> >>> Sent: Thursday, August 16, 2012 11:26 AM
> >>> Subject: Re: Slow full-table scans
> >>>
> >>> Sorry for the delay guys.
> >>>
> >>> Here are a few results:
> >>>
> >>> 1. Regions in the table = 11
> >>> 2. The region servers don't appear to be very busy with the query ~5%
> >>> CPU (but with parallelization, they are all busy)
> >>>
> >>> Finally, I changed the format of my data, such that each cell in HBase
> >>> contains a chunk of a row instead of the single value it had. So,
> >>> stuffing each Hbase cell with 500 columns of a row, gave me a
> >>> performance boost of 1000x. It seems that the underlying issue was IO
> >>> overhead per byte of actual data stored.
> >>>
> >>>
> >>> On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >>>> Yeah... It looks OK.
> >>>> Maybe 2G of heap is a bit low when dealing with 200.000 column rows.
> >>>>
> >>>>
> >>>> If you can I'd like to know how busy your regionservers are during
> these operations. That would be an indication on whether the
> parallelization is good or not.
> >>>>
> >>>> -- Lars
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>> From: Stack <[EMAIL PROTECTED]>
> >>>> To: [EMAIL PROTECTED]
> >>>> Cc:
> >>>> Sent: Wednesday, August 15, 2012 3:13 PM
> >>>> Subject: Re: Slow full-table scans
> >>>>
> >>>> On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh <[EMAIL PROTECTED]>
> wrote:
> >>>>> I am beginning to think that this is a configuration issue on my
> >>>>> cluster. Do the following configuration files seem sane ?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB