Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Hbase import Tsv performance (slow import)


+
Nick maillard 2012-10-24, 11:40
+
ramkrishna vasudevan 2012-10-24, 13:47
+
Nick maillard 2012-10-24, 10:15
+
Sonal Goyal 2012-10-24, 11:18
+
Nick maillard 2012-10-24, 10:05
+
Nick maillard 2012-10-24, 09:23
+
Nick maillard 2012-10-24, 14:35
+
Kevin Odell 2012-10-24, 16:18
+
anil gupta 2012-10-24, 16:30
+
Nick maillard 2012-10-24, 16:29
+
nick maillard 2012-10-24, 19:08
+
Nick maillard 2012-10-23, 17:13
Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
Nicolas Liochon 2012-10-23, 17:32
Thanks, checking the schema itself is still interesting (cf. the link sent)
As well, with 3 machines and a replication factor of 3, all the machines
are used during a write. As HBase writes all entries into a write-ahead-log
for safety, the number of writes is also doubled. So may be your machine is
just dying under the load. Anyway, here your cluster is going at the speed
of the least powerful machine, and this machine has a workload multiplied
by 6 compared to a single machine config (i.e. just writing a file locally).

On Tue, Oct 23, 2012 at 7:13 PM, Nick maillard <
[EMAIL PROTECTED]> wrote:

> Thanks for the help!
>
> My conf files are : Hadoop:
> hdfs-site
>
> <configuration>
>  <property>
>   <name>dfs.replication</name>
>   <value>3</value
>   <description>Default block replication.
>   The actual number of replications can be specified when the file is
> created.
>   The default is used if replication is not specified in create time.
>   </description>
> </property>
> <property>
>   <name>dfs.data.dir</name>
>   <value>/home/runner/app/hadoop/dfs/data</value>
>   <description>Default block replication.
>   The actual number of replications can be specified when the file is
> created.
>   The default is used if replication is not specified in create time.
>   </description>
> </property>
> <property>
>         <name>dfs.datanode.max.xcievers</name>
>         <value>4096</value>
>       </property>
> </configuration>
>
>
> Mapred-site.xml
>
> <configuration>
>  <property>
>   <name>mapred.job.tracker</name>
>   <value>master:54311</value>
>   <description>The host and port that the MapReduce job tracker runs
>   at.  If "local", then jobs are run in-process as a single map
>   and reduce task.
>   </description>
> </property>
> <property>
>   <name>mapred.tasktracker.map.tasks.maximum</name>
>   <value>14</value>
>   <description>The maximum number of map tasks that will be run
>   simultaneously by a task tracker.
>   </description>
> </property>
>
> <property>
>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
>   <value>14</value>
>   <description>The maximum number of reduce tasks that will be run
>   simultaneously by a task tracker.
>   </description>
> </property>
> <property>
> <name>mapred.child.java.opts</name>
>   <value>-Xmx400m</value>
>   <description>Java opts for the task tracker child processes.
>   The following symbol, if present, will be interpolated: @taskid@ is
> replaced
>   by current TaskID. Any other occurrences of '@' will go unchanged.
>   For example, to enable verbose gc logging to a file named for the taskid
> in
>   /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
>         -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
>
>   The configuration variable mapred.child.ulimit can be used to control the
>   maximum virtual memory of the child processes.
>   </description>
> </property>
> </configuration>
>
>
> core-site.xml
>
> <configuration>
>  <property>
>   <name>hadoop.tmp.dir</name>
>   <value>/home/runner/app/hadoop/tmp</value>
>   <description>A base for other temporary directories.</description>
> </property>
>
> <property>
>   <name>fs.default.name</name>
>   <value>hdfs://master:54310</value>
>   <description>The name of the default file system.  A URI whose
>   scheme and authority determine the FileSystem implementation.  The
>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>   the FileSystem implementation class.  The uri's authority is used to
>   determine the host, port, etc. for a filesystem.</description>
> </property>
>
>
> For Hbase:
> hbase-site:
> <configuration>
>  <property>
>     <name>hbase.rootdir</name>
>     <value>hdfs://master:54310/hbase</value>
>  </property>
>   <property>
>     <name>hbase.cluster.distributed</name>
>     <value>true</value>
>     <description>The mode the cluster will be in. Possible values are
>       false: standalone and pseudo-distributed setups with managed
> Zookeeper
>       true: fully-distributed with unmanaged Zookeeper Quorum (see
+
Kevin Odell 2012-10-23, 17:47
+
lars hofhansl 2012-10-25, 04:10
+
Nick maillard 2012-10-23, 15:48
+
Anoop John 2012-10-24, 03:29
+
ramkrishna vasudevan 2012-10-24, 04:55
+
anil gupta 2012-10-24, 05:09
+
Anoop John 2012-10-24, 05:11
+
Anoop John 2012-10-24, 05:14
+
anil gupta 2012-10-24, 05:28
+
Anoop John 2012-10-24, 06:07
+
anil gupta 2012-10-24, 06:14
+
Anoop John 2012-10-24, 06:31
+
anil gupta 2012-10-24, 06:43
+
ramkrishna vasudevan 2012-10-24, 05:52
+
anil gupta 2012-10-24, 06:11
+
Jonathan Bishop 2012-10-25, 15:57
+
anil gupta 2012-10-25, 20:33
+
anil gupta 2012-10-25, 20:35
+
Anoop Sam John 2012-10-26, 04:07
+
Nicolas Liochon 2012-10-23, 16:46