Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Loading data, hbase slower than Hive?


+
Austin Chungath 2013-01-17, 16:44
+
Michael Segel 2013-01-17, 16:48
+
Anoop John 2013-01-17, 17:00
+
ramkrishna vasudevan 2013-01-17, 17:09
+
Mohammad Tariq 2013-01-17, 17:46
+
praveenesh kumar 2013-01-18, 17:57
Copy link to this message
-
Re: Loading data, hbase slower than Hive?
Doug Meil 2013-01-18, 18:00

Hi there,

See this section of the HBase RefGuide for information about bulk loading.

http://hbase.apache.org/book.html#arch.bulk.load
On 1/18/13 12:57 PM, "praveenesh kumar" <[EMAIL PROTECTED]> wrote:

>Hey,
>Can someone throw some pointers on what would be the best practice for
>bulk
>imports in hbase ?
>That would be really helpful.
>
>Regards,
>Praveenesh
>
>On Thu, Jan 17, 2013 at 11:16 PM, Mohammad Tariq <[EMAIL PROTECTED]>
>wrote:
>
>> Just to add to whatever all the heavyweights have said above, your MR
>>job
>> may not be as efficient as the MR job corresponding to your Hive query.
>>You
>> can enhance the performance by setting the mapred config parameters
>>wisely
>> and by tuning your MR job.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Jan 17, 2013 at 10:39 PM, ramkrishna vasudevan <
>> [EMAIL PROTECTED]> wrote:
>>
>> > Hive is more for batch and HBase is for more of real time data.
>> >
>> > Regards
>> > Ram
>> >
>> > On Thu, Jan 17, 2013 at 10:30 PM, Anoop John <[EMAIL PROTECTED]>
>> > wrote:
>> >
>> > > In case of Hive data insertion means placing the file under table
>>path
>> in
>> > > HDFS.  HBase need to read the data and convert it into its format.
>> > (HFiles)
>> > > MR is doing this work..  So this makes it clear that HBase will be
>> > slower.
>> > > :)  As Michael said the read operation...
>> > >
>> > >
>> > >
>> > > -Anoop-
>> > >
>> > > On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath
>><[EMAIL PROTECTED]
>> > > >wrote:
>> > >
>> > > >   Hi,
>> > > > Problem: hive took 6 mins to load a data set, hbase took 1 hr 14
>> mins.
>> > > > It's a 20 gb data set approx 230 million records. The data is in
>> hdfs,
>> > > > single text file. The cluster is 11 nodes, 8 cores.
>> > > >
>> > > > I loaded this in hive, partitioned by date and bucketed into 32
>>and
>> > > sorted.
>> > > > Time taken is 6 mins.
>> > > >
>> > > > I loaded the same data into hbase, in the same cluster by writing
>>a
>> map
>> > > > reduce code. It took 1hr 14 mins. The cluster wasn't running
>>anything
>> > > else
>> > > > and assuming that the code that i wrote is good enough, what is it
>> that
>> > > > makes hbase slower than hive in loading the data?
>> > > >
>> > > > Thanks,
>> > > > Austin
>> > > >
>> > >
>> >
>>
+
Asaf Mesika 2013-01-19, 19:50
+
Mohammad Tariq 2013-01-19, 21:12
+
Doug Meil 2013-01-20, 15:13
+
Vikas Jadhav 2013-01-20, 18:04
+
Austin Chungath 2013-01-21, 05:45
+
Anoop Sam John 2013-01-21, 05:54
+
Austin Chungath 2013-01-21, 06:16
+
Mohammad Tariq 2013-01-21, 06:31
+
Anoop Sam John 2013-01-21, 06:36
+
Mohammad Tariq 2013-01-21, 06:39