Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase import Tsv performance (slow import)


Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
@Jonathan,

As per Anoop and Ram, WAL is not used with bulk loading so turning off WAL
wont have any impact on performance.

On Thu, Oct 25, 2012 at 1:33 PM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Nicolas,
>
> As per my experience you wont get good performance if you run 3 Map task
> simultaneously on one Hard Drive. That seems like a lot of I/O on one disk.
>
> HBase performs well when you have at least 5 nodes in cluster. So, running
> HBase on 3 nodes is not something you would do in prod.
>
> Thanks,
> Anil
>
> On Thu, Oct 25, 2012 at 8:57 AM, Jonathan Bishop <[EMAIL PROTECTED]>wrote:
>
>> Nicolas,
>>
>> I just went through the same exercise. There are many ways to get this to
>> go faster, but eventually I decided that bulk loading is the best solution
>> as run times scaled with the number machines in my cluster when I used
>> that
>> approach.
>>
>> One thing you can try is to turn off hbase's write ahead log (WAL). But be
>> aware that regionserver failure will cause data loss if you do this.
>>
>> Jon
>>
>> On Tue, Oct 23, 2012 at 8:48 AM, Nick maillard <
>> [EMAIL PROTECTED]> wrote:
>>
>> > Hi everyone
>> >
>> > I'm starting with hbase and testing for our needs. I have set up a
>> hadoop
>> > cluster of Three machines and A Hbase cluster atop on the same three
>> > machines,
>> > one master two slaves.
>> >
>> > I am testing the Import of a 5GB csv file with the importTsv tool. I
>> > import the
>> > file in the HDFS and use the importTsv tool to import in Hbase.
>> >
>> > Right now it takes a little over an hour to complete. It creates around
>> 2
>> > million entries in one table with a single family.
>> > If I use bulk uploading it goes down to 20 minutes.
>> >
>> > My hadoop has 21 map tasks but they all seem to be taking a very long
>> time
>> > to
>> > finish many tasks end up in time out.
>> >
>> > I am wondering what I have missed in my configuration. I have followed
>> the
>> > different prerequisites in the documentations but I am really unsure as
>> to
>> > what
>> > is causing this slow down. If I were to apply the wordcount example to
>> the
>> > same
>> > file it takes only minutes to complete so I am guessing the issue lies
>> in
>> > my
>> > Hbase configuration.
>> >
>> > Any help or pointers would by appreciated
>> >
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

--
Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB