Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase scans taking a lot of time


Copy link to this message
-
Re: Hbase scans taking a lot of time
I did use the following but it didnt help either.

SET hbase.client.scanner.caching=30000;
SET hive.hbase.client.scanner.caching=30000;

-Vibhav
On Sat, Jan 26, 2013 at 12:43 AM, Shashwat Shriparv <
[EMAIL PROTECTED]> wrote:

>
>
> Try to use caching for query
>
>
> Regards
> §
> Shashwat Shriparv
>
>
> Sent from Samsung GalaxyJean-Marc Spaggiari <[EMAIL PROTECTED]>
> wrote:You're better to put the data based on the way you will access it.
>
> If you always read data from columns A, B, C and D together, then
> bundle them in a single column. And all of that in a single CF...
>
> JM
>
> 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>:
> > This is what I think, Sorry for my ignorance.
> >
> > I want to use the property of Hbase( i.e columnar DB) so that only the
> > required columns are accessed. For this I kept a large number of column
> > families.
> >
> > But I am still not understanding....what is happening as there is no disk
> > I/O only High CPU and some network activity.
> > Why is the scan taking more time than the time to populate the Hbase.
> >
> > -Vibhav
> >
> >
> > On Fri, Jan 25, 2013 at 11:36 PM, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Hi Vibhav,
> >>
> >> Do you really need 13 diffefent columns familly? Can't you find a way
> >> to bundle that into 1 or 2 max CF? Maybe by prefixing the colument
> >> name?
> >>
> >> That might help...
> >>
> >> JM
> >>
> >> 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>:
> >> > The number of column families I have is 13, which I guess is okie?
> >> >
> >> > -Vibhav
> >> >
> >> >
> >> > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote:
> >> >
> >> >> You'll have this problem if you have a large number of column
> families
> >> >> being scanned/populated at the same time. Make sure the data you
> >> >> scan/populate frequently are in the same column family (you can have
> >> many
> >> >> columns in a column family). Unlike BigTable/Hypertable which has the
> >> >> concept of locality/access groups, HBase always stores column
> families
> >> in
> >> >> separate files, essentially making column family not only a logic
> >> >> grouping
> >> >> mechanism but also a physical locality group.
> >> >>
> >> >>
> >> >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]>
> >> wrote:
> >> >>
> >> >> > I am facing a very strange problem with HBase.
> >> >> >
> >> >> > This what I did:
> >> >> > a) Create a table, using pre partioned splits.
> >> >> > b) Also the column familes are zipped with lzo compression.
> >> >> > c) Using the above configuration I am able to populate 2 million
> row
> >> >> > per
> >> >> > min in the Hbase.
> >> >> > d) I have created a table with 300 million odd rows, which roughy
> >> >> > took
> >> >> me 3
> >> >> > hours to populate and the data size is of 25GB.
> >> >> >
> >> >> > e) But when I query for data the performance I am getting is very
> >> >> > bad.
> >> >> >    Basically this is what I am seeing: High CPU, no disk I/O and
> >> >> > network
> >> >> > I/O is happening at the rate of 6~7MB secs.
> >> >> >
> >> >> >
> >> >> > Because of this, if I scan the entries of the table using Hive it
> is
> >> >> taking
> >> >> > ages.
> >> >> > Basically it is taking around 24 hours to scan the table. Any idea,
> >> >> > of
> >> >> how
> >> >> > to debug.
> >> >> >
> >> >> >
> >> >> > -Vibhav
> >> >> >
> >> >>
> >> >
> >>
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB