Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Slow full-table scans


+
Lars H 2012-08-23, 01:38
+
Gurjeet Singh 2012-08-23, 02:01
+
lars hofhansl 2012-08-24, 17:27
+
Gurjeet Singh 2012-08-12, 06:04
+
lars hofhansl 2012-08-12, 22:24
+
Gurjeet Singh 2012-08-12, 22:51
+
lars hofhansl 2012-08-12, 23:00
+
Gurjeet Singh 2012-08-13, 05:10
+
Stack 2012-08-13, 07:27
+
Gurjeet Singh 2012-08-13, 07:51
+
Gurjeet Singh 2012-08-13, 22:12
+
lars hofhansl 2012-08-14, 00:30
+
Gurjeet Singh 2012-08-14, 01:10
+
Stack 2012-08-15, 22:13
+
lars hofhansl 2012-08-16, 00:16
+
Gurjeet Singh 2012-08-16, 18:26
+
lars hofhansl 2012-08-16, 18:36
+
Gurjeet Singh 2012-08-16, 18:40
+
Gurjeet Singh 2012-08-21, 02:42
+
lars hofhansl 2012-08-21, 02:50
+
lars hofhansl 2012-08-21, 18:18
+
Gurjeet Singh 2012-08-21, 18:31
+
lars hofhansl 2012-08-21, 23:33
Copy link to this message
-
Re: Slow full-table scans
Mohit Anchlia 2012-08-22, 00:56
It's possible that there is a bad or slower disk on Gurjeet's machine. I
think details of iostat and cpu would clear things up.

On Tue, Aug 21, 2012 at 4:33 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> I get roughly the same (~1.8s) - 100 rows, 200.000 columns, segment size
> 100
>
>
>
> ________________________________
>  From: Gurjeet Singh <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Tuesday, August 21, 2012 11:31 AM
>  Subject: Re: Slow full-table scans
>
> How does that compare with the newScanTable on your build ?
>
> Gurjeet
>
> On Tue, Aug 21, 2012 at 11:18 AM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> > Hmm... So I tried in HBase (current trunk).
> > I created 100 rows with 200.000 columns each (using your oldMakeTable).
> The creation took a bit, but scanning finished in 1.8s. (HBase in pseudo
> distributed mode - with your oldScanTable).
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: lars hofhansl <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > Cc:
> > Sent: Monday, August 20, 2012 7:50 PM
> > Subject: Re: Slow full-table scans
> >
> > Thanks Gurjeet,
> >
> > I'll (hopefully) have a look tomorrow.
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Gurjeet Singh <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> > Cc:
> > Sent: Monday, August 20, 2012 7:42 PM
> > Subject: Re: Slow full-table scans
> >
> > Hi Lars,
> >
> > Here is a testcase:
> >
> > https://gist.github.com/3410948
> >
> > Benchmarking code:
> >
> > https://gist.github.com/3410952
> >
> > Try running it with numRows = 100, numCols = 200000, segmentSize = 1000
> >
> > Gurjeet
> >
> >
> > On Thu, Aug 16, 2012 at 11:40 AM, Gurjeet Singh <[EMAIL PROTECTED]>
> wrote:
> >> Sure - I can create a minimal testcase and send it along.
> >>
> >> Gurjeet
> >>
> >> On Thu, Aug 16, 2012 at 11:36 AM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >>> That's interesting.
> >>> Could you share your old and new schema. I would like to track down
> the performance problems you saw.
> >>> (If you had a demo program that populates your rows with 200.000
> columns in a way where you saw the performance issues, that'd be even
> better, but not necessary).
> >>>
> >>>
> >>> -- Lars
> >>>
> >>>
> >>>
> >>> ________________________________
> >>>  From: Gurjeet Singh <[EMAIL PROTECTED]>
> >>> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> >>> Sent: Thursday, August 16, 2012 11:26 AM
> >>> Subject: Re: Slow full-table scans
> >>>
> >>> Sorry for the delay guys.
> >>>
> >>> Here are a few results:
> >>>
> >>> 1. Regions in the table = 11
> >>> 2. The region servers don't appear to be very busy with the query ~5%
> >>> CPU (but with parallelization, they are all busy)
> >>>
> >>> Finally, I changed the format of my data, such that each cell in HBase
> >>> contains a chunk of a row instead of the single value it had. So,
> >>> stuffing each Hbase cell with 500 columns of a row, gave me a
> >>> performance boost of 1000x. It seems that the underlying issue was IO
> >>> overhead per byte of actual data stored.
> >>>
> >>>
> >>> On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >>>> Yeah... It looks OK.
> >>>> Maybe 2G of heap is a bit low when dealing with 200.000 column rows.
> >>>>
> >>>>
> >>>> If you can I'd like to know how busy your regionservers are during
> these operations. That would be an indication on whether the
> parallelization is good or not.
> >>>>
> >>>> -- Lars
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>> From: Stack <[EMAIL PROTECTED]>
> >>>> To: [EMAIL PROTECTED]
> >>>> Cc:
> >>>> Sent: Wednesday, August 15, 2012 3:13 PM
> >>>> Subject: Re: Slow full-table scans
> >>>>
> >>>> On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh <[EMAIL PROTECTED]>
> wrote:
> >>>>> I am beginning to think that this is a configuration issue on my
> >>>>> cluster. Do the following configuration files seem sane ?
+
J Mohamed Zahoor 2012-08-22, 05:00
+
Gurjeet Singh 2012-08-22, 16:42
+
Mohammad Tariq 2012-08-12, 22:49
+
Gurjeet Singh 2012-08-12, 22:52
+
Mohammad Tariq 2012-08-12, 23:00
+
Jacques 2012-08-12, 23:13
+
Gurjeet Singh 2012-08-13, 04:41
+
Mohammad Tariq 2012-08-12, 23:34
+
Jacques 2012-08-12, 22:59
+
Stack 2012-08-12, 08:17
+
Gurjeet Singh 2012-08-12, 12:32
+
Ted Yu 2012-08-12, 14:11
+
Gurjeet Singh 2012-08-12, 14:23
+
Jacques 2012-08-12, 21:05
+
Gurjeet Singh 2012-08-12, 22:46