Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Poor HBase random read performance


Copy link to this message
-
Re: Poor HBase random read performance
bq.  lets say you compact a really old file with a new file.

I think stripe compaction is supposed to handle the above scenario. Take a
look at:
https://issues.apache.org/jira/browse/HBASE-7667

Please also refer to Sergey's talk @ HBaseCon.

Cheers

On Mon, Jul 1, 2013 at 4:10 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Going back to leveldb vs hbase, I am not sure if we can come with a clean
> way to identify HFiles containing more recent data in the wake of
> compactions
>
> I though wonder if this works with minor compactions, lets say you compact
> a really old file with a new file. Now since this file's most recent
> timestamp is very recent because of the new file, you look into this file,
> but then retrieve something from the "old" portion of this file. So you end
> with older data.
>
> I guess one way would be just order the files by time ranges. Non
> intersecting time range files can be ordered in reverse time order.
> Intersecting stuff can be seeked together.
>
>      File1
> |-----------------|
>                           File2
>                      |---------------|
>                                        File3
>                              |-----------------------------|
>                                                                      File4
>
>  |--------------------|
>
> So in this case, we seek
>
> [File1], [File2, File3], [File4]
>
> I think for random single key value looks (row, col)->key - this could lead
> to good savings for time ordered clients (which are quite common). Unless
> File1 and File4 get compacted, in which case, we always need to seek into
> both.
>
>
>
> On Mon, Jul 1, 2013 at 12:10 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > Sorry. Hit enter too early.
> >
> > Some discussion here:
> > http://apache-hbase.679495.n3.nabble.com/keyvalue-cache-td3882628.html
> > but no actionable outcome.
> >
> > -- Lars
> > ________________________________
> > From: lars hofhansl <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > Sent: Monday, July 1, 2013 12:05 PM
> > Subject: Re: Poor HBase random read performance
> >
> >
> > This came up a few times before.
> >
> >
> >
> > ________________________________
> > From: Vladimir Rodionov <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> > [EMAIL PROTECTED]>
> > Sent: Monday, July 1, 2013 11:08 AM
> > Subject: RE: Poor HBase random read performance
> >
> >
> > I would like to remind that in original BigTable's design  there is scan
> > cache to take care of random reads and this
> > important feature is still missing in HBase.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: [EMAIL PROTECTED]
> >
> > ________________________________________
> > From: lars hofhansl [[EMAIL PROTECTED]]
> > Sent: Saturday, June 29, 2013 3:24 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Poor HBase random read performance
> >
> > Should also say that random reads this way are somewhat of a worst case
> > scenario.
> >
> > If the working set is much larger than the block cache and the reads are
> > random, then each read will likely have to bring in an entirely new block
> > from the OS cache,
> > even when the KVs are much smaller than a block.
> >
> > So in order to read a (say) 1k KV HBase needs to bring 64k (default block
> > size) from the OS cache.
> > As long as the dataset fits into the block cache this difference in size
> > has no performance impact, but as soon as the dataset does not fit, we
> have
> > to bring much more data from the OS cache than we're actually interested
> in.
> >
> > Indeed in my test I found that HBase brings in about 60x the data size
> > from the OS cache (used PE with ~1k KVs). This can be improved with
> smaller
> > block sizes; and with a more efficient way to instantiate HFile blocks in
> > Java (which we need to work on).
> >
> >
> > -- Lars
> >
> > ________________________________