Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Hbase scans taking a lot of time


+
Vibhav Mundra 2013-01-25, 09:10
+
Luke Lu 2013-01-25, 17:31
+
Vibhav Mundra 2013-01-25, 17:59
+
Adrien Mogenet 2013-01-25, 18:04
+
Jean-Marc Spaggiari 2013-01-25, 18:06
+
Vibhav Mundra 2013-01-25, 18:14
+
Jean-Marc Spaggiari 2013-01-25, 18:23
+
lars hofhansl 2013-01-25, 22:00
+
lars hofhansl 2013-01-25, 23:56
+
Alok Kumar 2013-01-26, 06:07
+
Shashwat Shriparv 2013-01-25, 19:13
+
Vibhav Mundra 2013-01-25, 19:25
+
Shashwat Shriparv 2013-01-25, 19:31
Copy link to this message
-
Re: Hbase scans taking a lot of time
I am new to the Hbase-HIve.
Am I missing something. If would be great if you can point me to some
documents about caching.

-Vibhav
On Sat, Jan 26, 2013 at 1:01 AM, Shashwat Shriparv <
[EMAIL PROTECTED]> wrote:

> I would suggest u to look onto caching techniques
>
>
>
>
> Regards
> §
> Shashwat Shriparv
>
>
> Sent from Samsung GalaxyAdrien Mogenet <[EMAIL PROTECTED]>
> wrote:Definitely not, you should keep it under 3 maximum. Keep in mind that
> 1 CF
> == 1 Store == at least that many big files to read.
>
>
> On Fri, Jan 25, 2013 at 6:59 PM, Vibhav Mundra <[EMAIL PROTECTED]> wrote:
>
> > The number of column families I have is 13, which I guess is okie?
> >
> > -Vibhav
> >
> >
> > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote:
> >
> > > You'll have this problem if you have a large number of column families
> > > being scanned/populated at the same time. Make sure the data you
> > > scan/populate frequently are in the same column family (you can have
> many
> > > columns in a column family). Unlike BigTable/Hypertable which has the
> > > concept of locality/access groups, HBase always stores column families
> in
> > > separate files, essentially making column family not only a logic
> > grouping
> > > mechanism but also a physical locality group.
> > >
> > >
> > > On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > I am facing a very strange problem with HBase.
> > > >
> > > > This what I did:
> > > > a) Create a table, using pre partioned splits.
> > > > b) Also the column familes are zipped with lzo compression.
> > > > c) Using the above configuration I am able to populate 2 million row
> > per
> > > > min in the Hbase.
> > > > d) I have created a table with 300 million odd rows, which roughy
> took
> > > me 3
> > > > hours to populate and the data size is of 25GB.
> > > >
> > > > e) But when I query for data the performance I am getting is very
> bad.
> > > >    Basically this is what I am seeing: High CPU, no disk I/O and
> > network
> > > > I/O is happening at the rate of 6~7MB secs.
> > > >
> > > >
> > > > Because of this, if I scan the entries of the table using Hive it is
> > > taking
> > > > ages.
> > > > Basically it is taking around 24 hours to scan the table. Any idea,
> of
> > > how
> > > > to debug.
> > > >
> > > >
> > > > -Vibhav
> > > >
> > >
> >
>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>