Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Hbase import Tsv performance (slow import)


Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
ramkrishna vasudevan 2012-10-24, 13:47
'Yeah, we never used HBase client api(puts) for loading a batch of millions
of records. Can you tell me by default where the o/p HFile(s) from MR job
are stored in HDFS?'
Hi Anil
The o/p HFiles are stored in the path created for the corresponding HBase
table.
/table_name/store_name/region_name/file_name.
The location will be the same that will be used when a normal flush thro
HBase happens.

Hope this helps.
Regards
Ram

On Wed, Oct 24, 2012 at 5:10 PM, Nick maillard <
[EMAIL PROTECTED]> wrote:

> Looking my task logs there is a big gap in time I do not understand.
> The task connects to zookeeper creates the entries and from:
>  2012-10-24 12:25:24 to 2012-10-24 13:08:03 logs nothing.
> Doing map reduce I guess.
>
>
> 2012-10-24 12:25:23,323 INFO org.apache.zookeeper.ClientCnxn:
> Sessionestablishment complete on server
> 2012-10-24 12:25:24,266 INFO
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat:
> Created table instance for conf2_events
> 2012-10-24 12:25:24,361 INFO org.apache.hadoop.util.ProcessTree:
> setsid exited with exit code 0
> 2012-10-24 12:25:24,461 INFO org.apache.hadoop.mapred.Task:
> Using ResourceCalculatorPlugin
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@13394344
> 2012-10-24 12:25:24,615 WARN
> org.apache.hadoop.io.compress.snappy.LoadSnappy
> Snappy native library not loaded
> 2012-10-24 13:08:03,738 INFO
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Closed zookeeper sessionid=0x13a91f1e41000c0
> 2012-10-24 13:08:03,751 INFO org.apache.zookeeper.ZooKeeper:
>  Session:0x13a91f1e41000c0 closed
> 2012-10-24 13:08:03,751 INFO org.apache.zookeeper.ClientCnxn:
> EventThread shut down
> 2012-10-24 13:08:03,751 INFO org.apache.hadoop.mapred.Task:
> Task:attempt_201210241044_0005_m_000000_0 is done. And is in the process of
> commiting
>
> Map reduce side the job is being run
>
> 2012-10-24 12:25:19,212 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_..
>  given task: attempt_201210241044_0005_m_000002_0
> 2012-10-24 12:25:19,308 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_..
>  given task: attempt_201210241044_0005_m_000012_0
> 2012-10-24 12:25:19,347 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_..
> given task: attempt_201210241044_0005_m_000003_0
>
> 2012-10-24 12:25:19,510 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_..
> given task: attempt_201210241044_0005_m_000010_0
> 2012-10-24 12:25:19,525 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_201210241044_0005_m_899418193
> given task: attempt_201210241044_0005_m_000007_0
> 2012-10-24 12:25:19,526 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_201210241044_0005_m_-1509383641
> given task: attempt_201210241044_0005_m_000001_0
> 2012-10-24 12:25:19,708 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_201210241044_0005_m_-19778997
> given task: attempt_201210241044_0005_m_000004_0
> 2012-10-24 12:25:19,822 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_201210241044_0005_m_-4189743
> given task: attempt_201210241044_0005_m_000009_0
> 2012-10-24 12:25:19,980 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_201210241044_0005_m_-661677671
> given task: attempt_201210241044_0005_m_000005_0
> 2012-10-24 12:25:20,044 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_201210241044_0005_m_1898916331
> given task: attempt_201210241044_0005_m_000000_0
>
> 2012-10-24 12:25:20,167 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_201210241044_0005_m_1123667416
> given task: attempt_201210241044_0005_m_000008_0
> 2012-10-24 12:25:20,392 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM with ID: jvm_201210241044_0005_m_1621934208
> given task: attempt_201210241044_0005_m_000006_0
> 2012-10-24 12:25:20,500 INFO org.apache.hadoop.mapred.TaskTracker:
>  JVM with ID: jvm_201210241044_0005_m_-538140840
> given task: attempt_201210241044_0005_m_000013_0
> 2012-10-24 12:25:20,602 INFO org.apache.hadoop.mapred.TaskTracker: