Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Hbase scans taking a lot of time


+
Vibhav Mundra 2013-01-25, 09:10
+
Luke Lu 2013-01-25, 17:31
+
Vibhav Mundra 2013-01-25, 17:59
+
Adrien Mogenet 2013-01-25, 18:04
+
Jean-Marc Spaggiari 2013-01-25, 18:06
+
Vibhav Mundra 2013-01-25, 18:14
+
Jean-Marc Spaggiari 2013-01-25, 18:23
+
lars hofhansl 2013-01-25, 22:00
+
lars hofhansl 2013-01-25, 23:56
+
Alok Kumar 2013-01-26, 06:07
+
Shashwat Shriparv 2013-01-25, 19:13
+
Vibhav Mundra 2013-01-25, 19:25
+
Shashwat Shriparv 2013-01-25, 19:31
Copy link to this message
-
Re: Hbase scans taking a lot of time
I am new to the Hbase-HIve.
Am I missing something. If would be great if you can point me to some
documents about caching.

-Vibhav
On Sat, Jan 26, 2013 at 1:01 AM, Shashwat Shriparv <
[EMAIL PROTECTED]> wrote:

> I would suggest u to look onto caching techniques
>
>
>
>
> Regards
> §
> Shashwat Shriparv
>
>
> Sent from Samsung GalaxyAdrien Mogenet <[EMAIL PROTECTED]>
> wrote:Definitely not, you should keep it under 3 maximum. Keep in mind that
> 1 CF
> == 1 Store == at least that many big files to read.
>
>
> On Fri, Jan 25, 2013 at 6:59 PM, Vibhav Mundra <[EMAIL PROTECTED]> wrote:
>
> > The number of column families I have is 13, which I guess is okie?
> >
> > -Vibhav
> >
> >
> > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote:
> >
> > > You'll have this problem if you have a large number of column families
> > > being scanned/populated at the same time. Make sure the data you
> > > scan/populate frequently are in the same column family (you can have
> many
> > > columns in a column family). Unlike BigTable/Hypertable which has the
> > > concept of locality/access groups, HBase always stores column families
> in
> > > separate files, essentially making column family not only a logic
> > grouping
> > > mechanism but also a physical locality group.
> > >
> > >
> > > On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > I am facing a very strange problem with HBase.
> > > >
> > > > This what I did:
> > > > a) Create a table, using pre partioned splits.
> > > > b) Also the column familes are zipped with lzo compression.
> > > > c) Using the above configuration I am able to populate 2 million row
> > per
> > > > min in the Hbase.
> > > > d) I have created a table with 300 million odd rows, which roughy
> took
> > > me 3
> > > > hours to populate and the data size is of 25GB.
> > > >
> > > > e) But when I query for data the performance I am getting is very
> bad.
> > > >    Basically this is what I am seeing: High CPU, no disk I/O and
> > network
> > > > I/O is happening at the rate of 6~7MB secs.
> > > >
> > > >
> > > > Because of this, if I scan the entries of the table using Hive it is
> > > taking
> > > > ages.
> > > > Basically it is taking around 24 hours to scan the table. Any idea,
> of
> > > how
> > > > to debug.
> > > >
> > > >
> > > > -Vibhav
> > > >
> > >
> >
>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB