Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase import Tsv performance (slow import)


Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
@Jonathan,

As per Anoop and Ram, WAL is not used with bulk loading so turning off WAL
wont have any impact on performance.

On Thu, Oct 25, 2012 at 1:33 PM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Nicolas,
>
> As per my experience you wont get good performance if you run 3 Map task
> simultaneously on one Hard Drive. That seems like a lot of I/O on one disk.
>
> HBase performs well when you have at least 5 nodes in cluster. So, running
> HBase on 3 nodes is not something you would do in prod.
>
> Thanks,
> Anil
>
> On Thu, Oct 25, 2012 at 8:57 AM, Jonathan Bishop <[EMAIL PROTECTED]>wrote:
>
>> Nicolas,
>>
>> I just went through the same exercise. There are many ways to get this to
>> go faster, but eventually I decided that bulk loading is the best solution
>> as run times scaled with the number machines in my cluster when I used
>> that
>> approach.
>>
>> One thing you can try is to turn off hbase's write ahead log (WAL). But be
>> aware that regionserver failure will cause data loss if you do this.
>>
>> Jon
>>
>> On Tue, Oct 23, 2012 at 8:48 AM, Nick maillard <
>> [EMAIL PROTECTED]> wrote:
>>
>> > Hi everyone
>> >
>> > I'm starting with hbase and testing for our needs. I have set up a
>> hadoop
>> > cluster of Three machines and A Hbase cluster atop on the same three
>> > machines,
>> > one master two slaves.
>> >
>> > I am testing the Import of a 5GB csv file with the importTsv tool. I
>> > import the
>> > file in the HDFS and use the importTsv tool to import in Hbase.
>> >
>> > Right now it takes a little over an hour to complete. It creates around
>> 2
>> > million entries in one table with a single family.
>> > If I use bulk uploading it goes down to 20 minutes.
>> >
>> > My hadoop has 21 map tasks but they all seem to be taking a very long
>> time
>> > to
>> > finish many tasks end up in time out.
>> >
>> > I am wondering what I have missed in my configuration. I have followed
>> the
>> > different prerequisites in the documentations but I am really unsure as
>> to
>> > what
>> > is causing this slow down. If I were to apply the wordcount example to
>> the
>> > same
>> > file it takes only minutes to complete so I am guessing the issue lies
>> in
>> > my
>> > Hbase configuration.
>> >
>> > Any help or pointers would by appreciated
>> >
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

--
Thanks & Regards,
Anil Gupta