Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Hbase import Tsv performance (slow import)


Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
ramkrishna vasudevan 2012-10-24, 04:55
As Kevin suggested we can make use of bulk load that goes thro WAL and
Memstore.  Or the second option will be to use the o/p of mappers to create
HFiles directly.

Regards
Ram

On Wed, Oct 24, 2012 at 8:59 AM, Anoop John <[EMAIL PROTECTED]> wrote:

> Hi
>     Using ImportTSV tool you are trying to bulk load your data. Can you see
> and tell how many mappers and reducers were there. Out of total time what
> is the time taken by the mapper phase and by the reducer phase.  Seems like
> MR related issue (may be some conf issue). In this bulk load case most of
> the work is done by the MR job. It will read the raw data and convert it
> into Puts and write to HFiles. MR o/p is HFiles itself. The next part in
> ImportTSV will just put the HFiles under the table region store..  There
> wont be WAL usage in this bulk load.
>
> -Anoop-
>
> On Tue, Oct 23, 2012 at 9:18 PM, Nick maillard <
> [EMAIL PROTECTED]> wrote:
>
> > Hi everyone
> >
> > I'm starting with hbase and testing for our needs. I have set up a hadoop
> > cluster of Three machines and A Hbase cluster atop on the same three
> > machines,
> > one master two slaves.
> >
> > I am testing the Import of a 5GB csv file with the importTsv tool. I
> > import the
> > file in the HDFS and use the importTsv tool to import in Hbase.
> >
> > Right now it takes a little over an hour to complete. It creates around 2
> > million entries in one table with a single family.
> > If I use bulk uploading it goes down to 20 minutes.
> >
> > My hadoop has 21 map tasks but they all seem to be taking a very long
> time
> > to
> > finish many tasks end up in time out.
> >
> > I am wondering what I have missed in my configuration. I have followed
> the
> > different prerequisites in the documentations but I am really unsure as
> to
> > what
> > is causing this slow down. If I were to apply the wordcount example to
> the
> > same
> > file it takes only minutes to complete so I am guessing the issue lies in
> > my
> > Hbase configuration.
> >
> > Any help or pointers would by appreciated
> >
> >
>