Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: Best Hbase Storage for PIG


Copy link to this message
-
Re: Best Hbase Storage for PIG
32 cores w 32GB of Ram?

Pig isn't fast, but I have to question what you are using for hardware.
Who makes a 32 core box?
Assuming you mean 16 physical cores.

7 drives? Not enough spindles for the number of cores.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 26, 2012, at 6:38 AM, Rajgopal Vaithiyanathan <[EMAIL PROTECTED]> wrote:

> Hey all,
>
> The default - HBaseStorage() takes hell lot of time for puts.
>
> In a cluster of 5 machines, insertion of 175 Million records took 4Hours 45
> minutes
> Question -  Is this good enough ?
> each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's heap
> has been configured to 8GB.
> If the put speed is low, how can i improve them..?
>
> I tried tweaking the TableOutputFormat by increasing the WriteBufferSize to
> 24MB, and adding the multi put feature (by adding 10,000 puts in ArrayList
> and putting it as a batch).  After doing this,  it started throwing
>
> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41135remote=slave1/
> 172.21.208.176:60020]
>
> Which i assume is because, the clients took too long to put.
>
> The detailed log is as follows from one of the reduce job is as follows.
>
> I've 'censored' some of the details. which i assume is Okay.! :P
> 2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
> Loaded the native-hadoop library
> 2012-04-23 20:07:13,097 WARN
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
> exists!
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:host.name=*****.*****
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_22
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.vendor=Sun Microsystems Inc.
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.class.path=****************************
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.library.path=**********************
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=***************************
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.name=Linux
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.arch=amd64
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.version=2.6.38-8-server
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.name=raj
>
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.home=*********
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.dir=**********************:
> 2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=master:2181 sessionTimeout=180000
> watcher=hconnection
> 2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server /172.21.208.180:2181
> 2012-04-23 20:07:13,823 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
> this process is [EMAIL PROTECTED]e1
> 2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to master/172.21.208.180:2181, initiating session
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB