Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Hbase scans taking a lot of time


+
Vibhav Mundra 2013-01-25, 09:10
+
Luke Lu 2013-01-25, 17:31
+
Vibhav Mundra 2013-01-25, 17:59
+
Adrien Mogenet 2013-01-25, 18:04
+
Jean-Marc Spaggiari 2013-01-25, 18:06
+
Vibhav Mundra 2013-01-25, 18:14
+
Jean-Marc Spaggiari 2013-01-25, 18:23
+
lars hofhansl 2013-01-25, 22:00
+
lars hofhansl 2013-01-25, 23:56
+
Alok Kumar 2013-01-26, 06:07
+
Shashwat Shriparv 2013-01-25, 19:13
Copy link to this message
-
Re: Hbase scans taking a lot of time
I did use the following but it didnt help either.

SET hbase.client.scanner.caching=30000;
SET hive.hbase.client.scanner.caching=30000;

-Vibhav
On Sat, Jan 26, 2013 at 12:43 AM, Shashwat Shriparv <
[EMAIL PROTECTED]> wrote:

>
>
> Try to use caching for query
>
>
> Regards
> §
> Shashwat Shriparv
>
>
> Sent from Samsung GalaxyJean-Marc Spaggiari <[EMAIL PROTECTED]>
> wrote:You're better to put the data based on the way you will access it.
>
> If you always read data from columns A, B, C and D together, then
> bundle them in a single column. And all of that in a single CF...
>
> JM
>
> 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>:
> > This is what I think, Sorry for my ignorance.
> >
> > I want to use the property of Hbase( i.e columnar DB) so that only the
> > required columns are accessed. For this I kept a large number of column
> > families.
> >
> > But I am still not understanding....what is happening as there is no disk
> > I/O only High CPU and some network activity.
> > Why is the scan taking more time than the time to populate the Hbase.
> >
> > -Vibhav
> >
> >
> > On Fri, Jan 25, 2013 at 11:36 PM, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Hi Vibhav,
> >>
> >> Do you really need 13 diffefent columns familly? Can't you find a way
> >> to bundle that into 1 or 2 max CF? Maybe by prefixing the colument
> >> name?
> >>
> >> That might help...
> >>
> >> JM
> >>
> >> 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>:
> >> > The number of column families I have is 13, which I guess is okie?
> >> >
> >> > -Vibhav
> >> >
> >> >
> >> > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote:
> >> >
> >> >> You'll have this problem if you have a large number of column
> families
> >> >> being scanned/populated at the same time. Make sure the data you
> >> >> scan/populate frequently are in the same column family (you can have
> >> many
> >> >> columns in a column family). Unlike BigTable/Hypertable which has the
> >> >> concept of locality/access groups, HBase always stores column
> families
> >> in
> >> >> separate files, essentially making column family not only a logic
> >> >> grouping
> >> >> mechanism but also a physical locality group.
> >> >>
> >> >>
> >> >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]>
> >> wrote:
> >> >>
> >> >> > I am facing a very strange problem with HBase.
> >> >> >
> >> >> > This what I did:
> >> >> > a) Create a table, using pre partioned splits.
> >> >> > b) Also the column familes are zipped with lzo compression.
> >> >> > c) Using the above configuration I am able to populate 2 million
> row
> >> >> > per
> >> >> > min in the Hbase.
> >> >> > d) I have created a table with 300 million odd rows, which roughy
> >> >> > took
> >> >> me 3
> >> >> > hours to populate and the data size is of 25GB.
> >> >> >
> >> >> > e) But when I query for data the performance I am getting is very
> >> >> > bad.
> >> >> >    Basically this is what I am seeing: High CPU, no disk I/O and
> >> >> > network
> >> >> > I/O is happening at the rate of 6~7MB secs.
> >> >> >
> >> >> >
> >> >> > Because of this, if I scan the entries of the table using Hive it
> is
> >> >> taking
> >> >> > ages.
> >> >> > Basically it is taking around 24 hours to scan the table. Any idea,
> >> >> > of
> >> >> how
> >> >> > to debug.
> >> >> >
> >> >> >
> >> >> > -Vibhav
> >> >> >
> >> >>
> >> >
> >>
> >
>
+
Shashwat Shriparv 2013-01-25, 19:31
+
Vibhav Mundra 2013-01-25, 19:37