Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Hbase import Tsv performance (slow import)


+
Nick maillard 2012-10-24, 11:40
+
ramkrishna vasudevan 2012-10-24, 13:47
+
Nick maillard 2012-10-24, 10:15
+
Sonal Goyal 2012-10-24, 11:18
+
Nick maillard 2012-10-24, 10:05
+
Nick maillard 2012-10-24, 09:23
+
Nick maillard 2012-10-24, 14:35
+
Kevin Odell 2012-10-24, 16:18
+
anil gupta 2012-10-24, 16:30
+
Nick maillard 2012-10-24, 16:29
+
nick maillard 2012-10-24, 19:08
+
Nick maillard 2012-10-23, 17:13
+
Nicolas Liochon 2012-10-23, 17:32
+
Kevin Odell 2012-10-23, 17:47
+
lars hofhansl 2012-10-25, 04:10
+
Nick maillard 2012-10-23, 15:48
+
Anoop John 2012-10-24, 03:29
+
ramkrishna vasudevan 2012-10-24, 04:55
+
anil gupta 2012-10-24, 05:09
Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
Hi Anil
On Wed, Oct 24, 2012 at 10:39 AM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Anoop,
>
> As per your last email, did you mean that WAL is not used while using HBase
> Bulk Loader? If yes, then how we ensure "no data loss" in case of
> RegionServer failure?
>
> Thanks,
> Anil Gupta
>
> On Tue, Oct 23, 2012 at 9:55 PM, ramkrishna vasudevan <
> [EMAIL PROTECTED]> wrote:
>
> > As Kevin suggested we can make use of bulk load that goes thro WAL and
> > Memstore.  Or the second option will be to use the o/p of mappers to
> create
> > HFiles directly.
> >
> > Regards
> > Ram
> >
> > On Wed, Oct 24, 2012 at 8:59 AM, Anoop John <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi
> > >     Using ImportTSV tool you are trying to bulk load your data. Can you
> > see
> > > and tell how many mappers and reducers were there. Out of total time
> what
> > > is the time taken by the mapper phase and by the reducer phase.  Seems
> > like
> > > MR related issue (may be some conf issue). In this bulk load case most
> of
> > > the work is done by the MR job. It will read the raw data and convert
> it
> > > into Puts and write to HFiles. MR o/p is HFiles itself. The next part
> in
> > > ImportTSV will just put the HFiles under the table region store..
>  There
> > > wont be WAL usage in this bulk load.
> > >
> > > -Anoop-
> > >
> > > On Tue, Oct 23, 2012 at 9:18 PM, Nick maillard <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Hi everyone
> > > >
> > > > I'm starting with hbase and testing for our needs. I have set up a
> > hadoop
> > > > cluster of Three machines and A Hbase cluster atop on the same three
> > > > machines,
> > > > one master two slaves.
> > > >
> > > > I am testing the Import of a 5GB csv file with the importTsv tool. I
> > > > import the
> > > > file in the HDFS and use the importTsv tool to import in Hbase.
> > > >
> > > > Right now it takes a little over an hour to complete. It creates
> > around 2
> > > > million entries in one table with a single family.
> > > > If I use bulk uploading it goes down to 20 minutes.
> > > >
> > > > My hadoop has 21 map tasks but they all seem to be taking a very long
> > > time
> > > > to
> > > > finish many tasks end up in time out.
> > > >
> > > > I am wondering what I have missed in my configuration. I have
> followed
> > > the
> > > > different prerequisites in the documentations but I am really unsure
> as
> > > to
> > > > what
> > > > is causing this slow down. If I were to apply the wordcount example
> to
> > > the
> > > > same
> > > > file it takes only minutes to complete so I am guessing the issue
> lies
> > in
> > > > my
> > > > Hbase configuration.
> > > >
> > > > Any help or pointers would by appreciated
> > > >
> > > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>
+
Anoop John 2012-10-24, 05:14
+
anil gupta 2012-10-24, 05:28
+
Anoop John 2012-10-24, 06:07
+
anil gupta 2012-10-24, 06:14
+
Anoop John 2012-10-24, 06:31
+
anil gupta 2012-10-24, 06:43
+
ramkrishna vasudevan 2012-10-24, 05:52
+
anil gupta 2012-10-24, 06:11
+
Jonathan Bishop 2012-10-25, 15:57
+
anil gupta 2012-10-25, 20:33
+
anil gupta 2012-10-25, 20:35
+
Anoop Sam John 2012-10-26, 04:07
+
Nicolas Liochon 2012-10-23, 16:46