Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Best Hbase Storage for PIG


+
Michel Segel 2012-04-26, 11:48
+
Rajgopal Vaithiyanathan 2012-04-26, 12:09
Copy link to this message
-
Re: Best Hbase Storage for PIG
Ok...
5 machines...
Total cluster? Is that 5 DN?
Each machine 1quad core, 32gb ram, 7 x600GB not sure what types of drives.
so let's assume 1control node running NN, JT, HM, ZK
And 4 DN running DN,TT,RS.

We don't know your Schema, row size, or network. ( 10GBe, 1GBe, 100MBe?)

We also don't know if you've tuned GC implemented MSLABS ... Etc.

So 4 hours for 175Million rows? Could be ok.
Write your insert using a java M/R and see how long it takes.

Nor do we know how many. Slots you have on each box.
10k rows in a batch put() not really a good idea.
What's your region size?
Lots to think about before you can ask if you are doing the right thing, or if PIG is the bottleneck.
Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 26, 2012, at 7:09 AM, Rajgopal Vaithiyanathan <[EMAIL PROTECTED]> wrote:

> My bad.
>
> I had used cat /proc/cpuinfo | grep "processor"  | wc -l
> cat /proc/cpuinfo | grep “physical id” | sort | uniq | wc -l   => 4
>
> so its 4 physical cores then!
>
> and free -m gives me this.
>             total       used       free     shared    buffers     cached
> Mem:         32174      31382        792          0        123      27339
> -/+ buffers/cache:       3918      28256
> Swap:        24575          0      24575
>
>
>
> On Thu, Apr 26, 2012 at 5:18 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> 32 cores w 32GB of Ram?
>>
>> Pig isn't fast, but I have to question what you are using for hardware.
>> Who makes a 32 core box?
>> Assuming you mean 16 physical cores.
>>
>> 7 drives? Not enough spindles for the number of cores.
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Apr 26, 2012, at 6:38 AM, Rajgopal Vaithiyanathan <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hey all,
>>>
>>> The default - HBaseStorage() takes hell lot of time for puts.
>>>
>>> In a cluster of 5 machines, insertion of 175 Million records took 4Hours
>> 45
>>> minutes
>>> Question -  Is this good enough ?
>>> each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's
>> heap
>>> has been configured to 8GB.
>>> If the put speed is low, how can i improve them..?
>>>
>>> I tried tweaking the TableOutputFormat by increasing the WriteBufferSize
>> to
>>> 24MB, and adding the multi put feature (by adding 10,000 puts in
>> ArrayList
>>> and putting it as a batch).  After doing this,  it started throwing
>>>
>>> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>>> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected
>>> local=/172.21.208.176:41135remote=slave1/
>>> 172.21.208.176:60020]
>>>
>>> Which i assume is because, the clients took too long to put.
>>>
>>> The detailed log is as follows from one of the reduce job is as follows.
>>>
>>> I've 'censored' some of the details. which i assume is Okay.! :P
>>> 2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
>>> Loaded the native-hadoop library
>>> 2012-04-23 20:07:13,097 WARN
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
>> already
>>> exists!
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46
>> GMT
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:host.name=*****.*****
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.version=1.6.0_22
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.vendor=Sun Microsystems Inc.
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.class.path=****************************
+
Rajgopal Vaithiyanathan 2012-04-27, 06:13
+
Raghu Angadi 2012-04-27, 16:38
+
Rajgopal Vaithiyanathan 2012-04-28, 07:08
+
M. C. Srivas 2012-04-28, 15:16
+
Subir S 2012-05-12, 08:56
+
Doug Meil 2012-04-26, 13:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB