Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Loading data, hbase slower than Hive?


+
Austin Chungath 2013-01-17, 16:44
+
Michael Segel 2013-01-17, 16:48
+
Anoop John 2013-01-17, 17:00
+
ramkrishna vasudevan 2013-01-17, 17:09
+
Mohammad Tariq 2013-01-17, 17:46
+
praveenesh kumar 2013-01-18, 17:57
+
Doug Meil 2013-01-18, 18:00
+
Asaf Mesika 2013-01-19, 19:50
+
Mohammad Tariq 2013-01-19, 21:12
+
Doug Meil 2013-01-20, 15:13
Copy link to this message
-
Re: Loading data, hbase slower than Hive?
Vikas Jadhav 2013-01-20, 18:04
According to me
HBase need to store more metadata than hive (For each value it stores
seperately row key , col_family ,col_name,value)
and file size of original hdfs file may increase in size
I also wondered this if
anyone has got better result for hbase than hive let us know.

Thank You
On Sun, Jan 20, 2013 at 8:43 PM, Doug Meil <[EMAIL PROTECTED]>wrote:

>
> Hi there-
>
> On top of what everybody else said, for more info on rowkey design and
> pre-splitting see http://hbase.apache.org/book.html#schema (as well as
> other threads in this dist-list on that topic).
>
>
>
>
>
> On 1/19/13 4:12 PM, "Mohammad Tariq" <[EMAIL PROTECTED]> wrote:
>
> >Hello Austin,
> >
> >          I am sorry for the late response.
> >
> >Asaf has made a very valid point. Rowkwey design is very crucial.
> >Specially if the data is gonna be sequential(timeseries kinda thing).
> >You may end up with hotspotting problem. Use pre-splitted tables
> >or hash the keys to avoid that. It'll also allow you to fetch the results
> >faster.
> >
> >Warm Regards,
> >Tariq
> >https://mtariq.jux.com/
> >cloudfront.blogspot.com
> >
> >
> >On Sun, Jan 20, 2013 at 1:20 AM, Asaf Mesika <[EMAIL PROTECTED]>
> >wrote:
> >
> >> Start by telling us your row key design.
> >> Check for pre splitting your table regions.
> >> I managed to get to 25mb/sec write throughput in Hbase using 1 region
> >> server. If your data is evenly spread you can get around 7 times that
> >>in a
> >> 10 regions server environment. Should mean that 1 gig should take 4 sec.
> >>
> >>
> >> On Friday, January 18, 2013, praveenesh kumar wrote:
> >>
> >> > Hey,
> >> > Can someone throw some pointers on what would be the best practice for
> >> bulk
> >> > imports in hbase ?
> >> > That would be really helpful.
> >> >
> >> > Regards,
> >> > Praveenesh
> >> >
> >> > On Thu, Jan 17, 2013 at 11:16 PM, Mohammad Tariq <[EMAIL PROTECTED]
> >> <javascript:;>>
> >> > wrote:
> >> >
> >> > > Just to add to whatever all the heavyweights have said above, your
> >>MR
> >> job
> >> > > may not be as efficient as the MR job corresponding to your Hive
> >>query.
> >> > You
> >> > > can enhance the performance by setting the mapred config parameters
> >> > wisely
> >> > > and by tuning your MR job.
> >> > >
> >> > > Warm Regards,
> >> > > Tariq
> >> > > https://mtariq.jux.com/
> >> > > cloudfront.blogspot.com
> >> > >
> >> > >
> >> > > On Thu, Jan 17, 2013 at 10:39 PM, ramkrishna vasudevan <
> >> > > [EMAIL PROTECTED] <javascript:;>> wrote:
> >> > >
> >> > > > Hive is more for batch and HBase is for more of real time data.
> >> > > >
> >> > > > Regards
> >> > > > Ram
> >> > > >
> >> > > > On Thu, Jan 17, 2013 at 10:30 PM, Anoop John
> >><[EMAIL PROTECTED]
> >> <javascript:;>
> >> > >
> >> > > > wrote:
> >> > > >
> >> > > > > In case of Hive data insertion means placing the file under
> >>table
> >> > path
> >> > > in
> >> > > > > HDFS.  HBase need to read the data and convert it into its
> >>format.
> >> > > > (HFiles)
> >> > > > > MR is doing this work..  So this makes it clear that HBase will
> >>be
> >> > > > slower.
> >> > > > > :)  As Michael said the read operation...
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > -Anoop-
> >> > > > >
> >> > > > > On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath <
> >> > [EMAIL PROTECTED] <javascript:;>
> >> > > > > >wrote:
> >> > > > >
> >> > > > > >   Hi,
> >> > > > > > Problem: hive took 6 mins to load a data set, hbase took 1 hr
> >>14
> >> > > mins.
> >> > > > > > It's a 20 gb data set approx 230 million records. The data is
> >>in
> >> > > hdfs,
> >> > > > > > single text file. The cluster is 11 nodes, 8 cores.
> >> > > > > >
> >> > > > > > I loaded this in hive, partitioned by date and bucketed into
> >>32
> >> and
> >> > > > > sorted.
> >> > > > > > Time taken is 6 mins.
> >> > > > > >
> >> > > > > > I loaded the same data into hbase, in the same cluster by
> >> writing a
> >> > > map
> >> > > > > > reduce code. It took 1hr 14 mins. The cluster wasn't running
*
*
*

Thanx and Regards*
* Vikas Jadhav*
+
Austin Chungath 2013-01-21, 05:45
+
Anoop Sam John 2013-01-21, 05:54
+
Austin Chungath 2013-01-21, 06:16
+
Mohammad Tariq 2013-01-21, 06:31
+
Anoop Sam John 2013-01-21, 06:36
+
Mohammad Tariq 2013-01-21, 06:39