Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Loading data, hbase slower than Hive?


Copy link to this message
-
RE: Loading data, hbase slower than Hive?
Austin,
        You are using HFileOutputFormat or TableOutputFormat?

-Anoop-
________________________________________
From: Austin Chungath [[EMAIL PROTECTED]]
Sent: Monday, January 21, 2013 11:15 AM
To: [EMAIL PROTECTED]
Subject: Re: Loading data, hbase slower than Hive?

Thank you Tariq.
I will let you know how things went after I implement these suggestions.

Regards,
Austin

On Sun, Jan 20, 2013 at 2:42 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello Austin,
>
>           I am sorry for the late response.
>
> Asaf has made a very valid point. Rowkwey design is very crucial.
> Specially if the data is gonna be sequential(timeseries kinda thing).
> You may end up with hotspotting problem. Use pre-splitted tables
> or hash the keys to avoid that. It'll also allow you to fetch the results
> faster.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Sun, Jan 20, 2013 at 1:20 AM, Asaf Mesika <[EMAIL PROTECTED]>
> wrote:
>
> > Start by telling us your row key design.
> > Check for pre splitting your table regions.
> > I managed to get to 25mb/sec write throughput in Hbase using 1 region
> > server. If your data is evenly spread you can get around 7 times that in
> a
> > 10 regions server environment. Should mean that 1 gig should take 4 sec.
> >
> >
> > On Friday, January 18, 2013, praveenesh kumar wrote:
> >
> > > Hey,
> > > Can someone throw some pointers on what would be the best practice for
> > bulk
> > > imports in hbase ?
> > > That would be really helpful.
> > >
> > > Regards,
> > > Praveenesh
> > >
> > > On Thu, Jan 17, 2013 at 11:16 PM, Mohammad Tariq <[EMAIL PROTECTED]
> > <javascript:;>>
> > > wrote:
> > >
> > > > Just to add to whatever all the heavyweights have said above, your MR
> > job
> > > > may not be as efficient as the MR job corresponding to your Hive
> query.
> > > You
> > > > can enhance the performance by setting the mapred config parameters
> > > wisely
> > > > and by tuning your MR job.
> > > >
> > > > Warm Regards,
> > > > Tariq
> > > > https://mtariq.jux.com/
> > > > cloudfront.blogspot.com
> > > >
> > > >
> > > > On Thu, Jan 17, 2013 at 10:39 PM, ramkrishna vasudevan <
> > > > [EMAIL PROTECTED] <javascript:;>> wrote:
> > > >
> > > > > Hive is more for batch and HBase is for more of real time data.
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > > On Thu, Jan 17, 2013 at 10:30 PM, Anoop John <
> [EMAIL PROTECTED]
> > <javascript:;>
> > > >
> > > > > wrote:
> > > > >
> > > > > > In case of Hive data insertion means placing the file under table
> > > path
> > > > in
> > > > > > HDFS.  HBase need to read the data and convert it into its
> format.
> > > > > (HFiles)
> > > > > > MR is doing this work..  So this makes it clear that HBase will
> be
> > > > > slower.
> > > > > > :)  As Michael said the read operation...
> > > > > >
> > > > > >
> > > > > >
> > > > > > -Anoop-
> > > > > >
> > > > > > On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath <
> > > [EMAIL PROTECTED] <javascript:;>
> > > > > > >wrote:
> > > > > >
> > > > > > >   Hi,
> > > > > > > Problem: hive took 6 mins to load a data set, hbase took 1 hr
> 14
> > > > mins.
> > > > > > > It's a 20 gb data set approx 230 million records. The data is
> in
> > > > hdfs,
> > > > > > > single text file. The cluster is 11 nodes, 8 cores.
> > > > > > >
> > > > > > > I loaded this in hive, partitioned by date and bucketed into 32
> > and
> > > > > > sorted.
> > > > > > > Time taken is 6 mins.
> > > > > > >
> > > > > > > I loaded the same data into hbase, in the same cluster by
> > writing a
> > > > map
> > > > > > > reduce code. It took 1hr 14 mins. The cluster wasn't running
> > > anything
> > > > > > else
> > > > > > > and assuming that the code that i wrote is good enough, what is
> > it
> > > > that
> > > > > > > makes hbase slower than hive in loading the data?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Austin
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB