Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Slow full-table scans


Copy link to this message
-
Re: Slow full-table scans
Something to consider is that HBase stores and retrieves the row key (8
bytes in your case) + timestamp (8 bytes) + column qualifier (?) for every
single value.  The schemaless nature of HBase generally means that this
data has to be stored for each row (certain kinds of newer block level
compression can make this less).  So depending on your column qualifiers,
you're going to be looking at potentially a huge amount of overhead when
you're dealing with 200,000 cells in a single row.  I also wonder whether
you're dealing with a large amount of overhead simply on the
serialization/deserialization/instantiation side if you're pulling back
that many values.

I'm not sure how many people are using that many cells in a single row and
trying to read or write them all at once.

Other's may have more thoughts.

Jacques

On Sun, Aug 12, 2012 at 7:23 AM, Gurjeet Singh <[EMAIL PROTECTED]> wrote:

> Hi Ted,
>
> Yes, I am using the cloudera distribution 3.
>
> Gurjeet
>
> Sent from my iPad
>
> On Aug 12, 2012, at 7:11 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Gurjeet:
> > Can you tell us which HBase version you are using ?
> >
> > Thanks
> >
> > On Sun, Aug 12, 2012 at 5:32 AM, Gurjeet Singh <[EMAIL PROTECTED]>
> wrote:
> >
> >> Thanks for the reply Stack. My comments are inline.
> >>
> >>> You've checked out the perf section of the refguide?
> >>>
> >>> http://hbase.apache.org/book.html#performance
> >>
> >> Yes. HBase has 8GB RAM both on my cluster as well as my dev machine.
> >> Both configurations are backed by SSDs and Hbase options are set to
> >>
> >> HBASE_OPTS="-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
> >>
> >> The data that I am dealing with is static. The table never changes
> >> after the first load.
> >>
> >> Even some of my GET requests are taking up to a full 60 seconds when
> >> the row sizes reach ~10MB. In general, taking 5 seconds to fetch a
> >> single row (~1MB) seems a extremely high to me.
> >>
> >> Thanks again for your help.
> >>
>