Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Slow full-table scans


+
Lars H 2012-08-23, 01:38
+
Gurjeet Singh 2012-08-23, 02:01
+
lars hofhansl 2012-08-24, 17:27
+
Gurjeet Singh 2012-08-12, 06:04
+
lars hofhansl 2012-08-12, 22:24
+
Gurjeet Singh 2012-08-12, 22:51
+
lars hofhansl 2012-08-12, 23:00
+
Gurjeet Singh 2012-08-13, 05:10
+
Stack 2012-08-13, 07:27
+
Gurjeet Singh 2012-08-13, 07:51
+
Gurjeet Singh 2012-08-13, 22:12
+
lars hofhansl 2012-08-14, 00:30
+
Gurjeet Singh 2012-08-14, 01:10
+
Stack 2012-08-15, 22:13
+
lars hofhansl 2012-08-16, 00:16
+
Gurjeet Singh 2012-08-16, 18:26
+
lars hofhansl 2012-08-16, 18:36
+
Gurjeet Singh 2012-08-16, 18:40
+
Gurjeet Singh 2012-08-21, 02:42
+
lars hofhansl 2012-08-21, 02:50
+
lars hofhansl 2012-08-21, 18:18
+
Gurjeet Singh 2012-08-21, 18:31
+
lars hofhansl 2012-08-21, 23:33
+
Mohit Anchlia 2012-08-22, 00:56
+
J Mohamed Zahoor 2012-08-22, 05:00
+
Gurjeet Singh 2012-08-22, 16:42
+
Mohammad Tariq 2012-08-12, 22:49
+
Gurjeet Singh 2012-08-12, 22:52
+
Mohammad Tariq 2012-08-12, 23:00
+
Jacques 2012-08-12, 23:13
+
Gurjeet Singh 2012-08-13, 04:41
+
Mohammad Tariq 2012-08-12, 23:34
+
Jacques 2012-08-12, 22:59
+
Stack 2012-08-12, 08:17
+
Gurjeet Singh 2012-08-12, 12:32
+
Ted Yu 2012-08-12, 14:11
+
Gurjeet Singh 2012-08-12, 14:23
Copy link to this message
-
Re: Slow full-table scans
Jacques 2012-08-12, 21:05
Something to consider is that HBase stores and retrieves the row key (8
bytes in your case) + timestamp (8 bytes) + column qualifier (?) for every
single value.  The schemaless nature of HBase generally means that this
data has to be stored for each row (certain kinds of newer block level
compression can make this less).  So depending on your column qualifiers,
you're going to be looking at potentially a huge amount of overhead when
you're dealing with 200,000 cells in a single row.  I also wonder whether
you're dealing with a large amount of overhead simply on the
serialization/deserialization/instantiation side if you're pulling back
that many values.

I'm not sure how many people are using that many cells in a single row and
trying to read or write them all at once.

Other's may have more thoughts.

Jacques

On Sun, Aug 12, 2012 at 7:23 AM, Gurjeet Singh <[EMAIL PROTECTED]> wrote:

> Hi Ted,
>
> Yes, I am using the cloudera distribution 3.
>
> Gurjeet
>
> Sent from my iPad
>
> On Aug 12, 2012, at 7:11 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Gurjeet:
> > Can you tell us which HBase version you are using ?
> >
> > Thanks
> >
> > On Sun, Aug 12, 2012 at 5:32 AM, Gurjeet Singh <[EMAIL PROTECTED]>
> wrote:
> >
> >> Thanks for the reply Stack. My comments are inline.
> >>
> >>> You've checked out the perf section of the refguide?
> >>>
> >>> http://hbase.apache.org/book.html#performance
> >>
> >> Yes. HBase has 8GB RAM both on my cluster as well as my dev machine.
> >> Both configurations are backed by SSDs and Hbase options are set to
> >>
> >> HBASE_OPTS="-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
> >>
> >> The data that I am dealing with is static. The table never changes
> >> after the first load.
> >>
> >> Even some of my GET requests are taking up to a full 60 seconds when
> >> the row sizes reach ~10MB. In general, taking 5 seconds to fetch a
> >> single row (~1MB) seems a extremely high to me.
> >>
> >> Thanks again for your help.
> >>
>
+
Gurjeet Singh 2012-08-12, 22:46