Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Hbase import Tsv performance (slow import)


+
Nick maillard 2012-10-24, 11:40
+
ramkrishna vasudevan 2012-10-24, 13:47
+
Nick maillard 2012-10-24, 10:15
+
Sonal Goyal 2012-10-24, 11:18
+
Nick maillard 2012-10-24, 10:05
+
Nick maillard 2012-10-24, 09:23
+
Nick maillard 2012-10-24, 14:35
+
Kevin Odell 2012-10-24, 16:18
+
anil gupta 2012-10-24, 16:30
+
Nick maillard 2012-10-24, 16:29
+
nick maillard 2012-10-24, 19:08
+
Nick maillard 2012-10-23, 17:13
+
Nicolas Liochon 2012-10-23, 17:32
+
Kevin Odell 2012-10-23, 17:47
+
lars hofhansl 2012-10-25, 04:10
+
Nick maillard 2012-10-23, 15:48
+
Anoop John 2012-10-24, 03:29
+
ramkrishna vasudevan 2012-10-24, 04:55
+
anil gupta 2012-10-24, 05:09
+
Anoop John 2012-10-24, 05:11
+
Anoop John 2012-10-24, 05:14
+
anil gupta 2012-10-24, 05:28
+
Anoop John 2012-10-24, 06:07
+
anil gupta 2012-10-24, 06:14
+
Anoop John 2012-10-24, 06:31
+
anil gupta 2012-10-24, 06:43
+
ramkrishna vasudevan 2012-10-24, 05:52
+
anil gupta 2012-10-24, 06:11
+
Jonathan Bishop 2012-10-25, 15:57
Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
anil gupta 2012-10-25, 20:33
Hi Nicolas,

As per my experience you wont get good performance if you run 3 Map task
simultaneously on one Hard Drive. That seems like a lot of I/O on one disk.

HBase performs well when you have at least 5 nodes in cluster. So, running
HBase on 3 nodes is not something you would do in prod.

Thanks,
Anil

On Thu, Oct 25, 2012 at 8:57 AM, Jonathan Bishop <[EMAIL PROTECTED]>wrote:

> Nicolas,
>
> I just went through the same exercise. There are many ways to get this to
> go faster, but eventually I decided that bulk loading is the best solution
> as run times scaled with the number machines in my cluster when I used that
> approach.
>
> One thing you can try is to turn off hbase's write ahead log (WAL). But be
> aware that regionserver failure will cause data loss if you do this.
>
> Jon
>
> On Tue, Oct 23, 2012 at 8:48 AM, Nick maillard <
> [EMAIL PROTECTED]> wrote:
>
> > Hi everyone
> >
> > I'm starting with hbase and testing for our needs. I have set up a hadoop
> > cluster of Three machines and A Hbase cluster atop on the same three
> > machines,
> > one master two slaves.
> >
> > I am testing the Import of a 5GB csv file with the importTsv tool. I
> > import the
> > file in the HDFS and use the importTsv tool to import in Hbase.
> >
> > Right now it takes a little over an hour to complete. It creates around 2
> > million entries in one table with a single family.
> > If I use bulk uploading it goes down to 20 minutes.
> >
> > My hadoop has 21 map tasks but they all seem to be taking a very long
> time
> > to
> > finish many tasks end up in time out.
> >
> > I am wondering what I have missed in my configuration. I have followed
> the
> > different prerequisites in the documentations but I am really unsure as
> to
> > what
> > is causing this slow down. If I were to apply the wordcount example to
> the
> > same
> > file it takes only minutes to complete so I am guessing the issue lies in
> > my
> > Hbase configuration.
> >
> > Any help or pointers would by appreciated
> >
> >
>

--
Thanks & Regards,
Anil Gupta
+
anil gupta 2012-10-25, 20:35
+
Anoop Sam John 2012-10-26, 04:07
+
Nicolas Liochon 2012-10-23, 16:46