Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Early comparisons between 0.90 and 0.92


Copy link to this message
-
Re: Early comparisons between 0.90 and 0.92
I was hoping to rule out changes in IPC handlers and other upper layers and
narrow it down to the difference between HFileV1 and HFileV2, but it sounds
like you have a lot of moving pieces.
On Thu, Dec 15, 2011 at 12:24 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> Trying this now.
>
> J-D
>
> On Thu, Dec 15, 2011 at 11:35 AM, Lars <[EMAIL PROTECTED]> wrote:
> > Do you see the same slowdown with the default 64k block size?
> >
> > Lars <[EMAIL PROTECTED]> schrieb:
> >
> >>I'll be busy today... I'll double check my scanning related changes as
> soon as i can.
> >>
> >>Jean-Daniel Cryans <[EMAIL PROTECTED]> schrieb:
> >>
> >>>Yes and yes.
> >>>
> >>>J-D
> >>>On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote:
> >>>
> >>>> Regions are major compacted and have empty memstores, so no merging of
> >>>> stores when reading?
> >>>>
> >>>>
> >>>> 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]>
> >>>>
> >>>> > Yes sorry 1.1M
> >>>> >
> >>>> > This is PE, the table is set to a block size of 4KB and block
> caching
> >>>> > is disabled. Nothing else special in there.
> >>>> >
> >>>> > J-D
> >>>> >
> >>>> > 2011/12/14  <[EMAIL PROTECTED]>:
> >>>> > > Thanks for the info, J-D.
> >>>> > >
> >>>> > > I guess the 1.1 below is in millions.
> >>>> > >
> >>>> > > Can you tell us more about your tables - bloom filters, etc ?
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]>
> 写道:
> >>>> > >
> >>>> > >> Hey guys,
> >>>> > >>
> >>>> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly
> >>>> > >> regarding reads. The numbers are kinda irrelevant but the
> differences
> >>>> > >> are. BTW this is on CDH3u3 with random reads.
> >>>> > >>
> >>>> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to
> about
> >>>> > >> 1.7M rows scanned per second.
> >>>> > >>
> >>>> > >> In 0.92.0, scanning those same rows (meaning that I didn't run
> >>>> > >> compactions after migrating so it's picking the same data from
> the OS
> >>>> > >> cache), I scan about 1.1 rows per second.
> >>>> > >>
> >>>> > >> 0.92 is 50% slower when scanning.
> >>>> > >>
> >>>> > >> In 0.90.0 random reading 50M rows that are OS cached I can do
> about
> >>>> > >> 200k reads per second.
> >>>> > >>
> >>>> > >> In 0.92.0, again with those same rows, I can go up to 260k per
> second.
> >>>> > >>
> >>>> > >> 0.92 is 30% faster when random reading.
> >>>> > >>
> >>>> > >> I've been playing with that data set for a while and the numbers
> in
> >>>> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning
> that
> >>>> > >> something else changed or the code that's generic to both did.
> >>>> > >>
> >>>> > >>
> >>>> > >> I'd like to be able to associate those differences to code
> changes in
> >>>> > >> order to understand what's going on. I would really appreciate if
> >>>> > >> others also took some time to test it out or to think about what
> could
> >>>> > >> cause this.
> >>>> > >>
> >>>> > >> Thx,
> >>>> > >>
> >>>> > >> J-D
> >>>> >
> >>>>
>