Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase import Tsv performance (slow import)


Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
Hi
    Using ImportTSV tool you are trying to bulk load your data. Can you see
and tell how many mappers and reducers were there. Out of total time what
is the time taken by the mapper phase and by the reducer phase.  Seems like
MR related issue (may be some conf issue). In this bulk load case most of
the work is done by the MR job. It will read the raw data and convert it
into Puts and write to HFiles. MR o/p is HFiles itself. The next part in
ImportTSV will just put the HFiles under the table region store..  There
wont be WAL usage in this bulk load.

-Anoop-

On Tue, Oct 23, 2012 at 9:18 PM, Nick maillard <
[EMAIL PROTECTED]> wrote:

> Hi everyone
>
> I'm starting with hbase and testing for our needs. I have set up a hadoop
> cluster of Three machines and A Hbase cluster atop on the same three
> machines,
> one master two slaves.
>
> I am testing the Import of a 5GB csv file with the importTsv tool. I
> import the
> file in the HDFS and use the importTsv tool to import in Hbase.
>
> Right now it takes a little over an hour to complete. It creates around 2
> million entries in one table with a single family.
> If I use bulk uploading it goes down to 20 minutes.
>
> My hadoop has 21 map tasks but they all seem to be taking a very long time
> to
> finish many tasks end up in time out.
>
> I am wondering what I have missed in my configuration. I have followed the
> different prerequisites in the documentations but I am really unsure as to
> what
> is causing this slow down. If I were to apply the wordcount example to the
> same
> file it takes only minutes to complete so I am guessing the issue lies in
> my
> Hbase configuration.
>
> Any help or pointers would by appreciated
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB