Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: Best Hbase Storage for PIG


Copy link to this message
-
Re: Best Hbase Storage for PIG
Michel Segel 2012-04-26, 13:41
Ok...
5 machines...
Total cluster? Is that 5 DN?
Each machine 1quad core, 32gb ram, 7 x600GB not sure what types of drives.
so let's assume 1control node running NN, JT, HM, ZK
And 4 DN running DN,TT,RS.

We don't know your Schema, row size, or network. ( 10GBe, 1GBe, 100MBe?)

We also don't know if you've tuned GC implemented MSLABS ... Etc.

So 4 hours for 175Million rows? Could be ok.
Write your insert using a java M/R and see how long it takes.

Nor do we know how many. Slots you have on each box.
10k rows in a batch put() not really a good idea.
What's your region size?
Lots to think about before you can ask if you are doing the right thing, or if PIG is the bottleneck.
Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 26, 2012, at 7:09 AM, Rajgopal Vaithiyanathan <[EMAIL PROTECTED]> wrote:

> My bad.
>
> I had used cat /proc/cpuinfo | grep "processor"  | wc -l
> cat /proc/cpuinfo | grep “physical id” | sort | uniq | wc -l   => 4
>
> so its 4 physical cores then!
>
> and free -m gives me this.
>             total       used       free     shared    buffers     cached
> Mem:         32174      31382        792          0        123      27339
> -/+ buffers/cache:       3918      28256
> Swap:        24575          0      24575
>
>
>
> On Thu, Apr 26, 2012 at 5:18 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> 32 cores w 32GB of Ram?
>>
>> Pig isn't fast, but I have to question what you are using for hardware.
>> Who makes a 32 core box?
>> Assuming you mean 16 physical cores.
>>
>> 7 drives? Not enough spindles for the number of cores.
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Apr 26, 2012, at 6:38 AM, Rajgopal Vaithiyanathan <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hey all,
>>>
>>> The default - HBaseStorage() takes hell lot of time for puts.
>>>
>>> In a cluster of 5 machines, insertion of 175 Million records took 4Hours
>> 45
>>> minutes
>>> Question -  Is this good enough ?
>>> each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's
>> heap
>>> has been configured to 8GB.
>>> If the put speed is low, how can i improve them..?
>>>
>>> I tried tweaking the TableOutputFormat by increasing the WriteBufferSize
>> to
>>> 24MB, and adding the multi put feature (by adding 10,000 puts in
>> ArrayList
>>> and putting it as a batch).  After doing this,  it started throwing
>>>
>>> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>>> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected
>>> local=/172.21.208.176:41135remote=slave1/
>>> 172.21.208.176:60020]
>>>
>>> Which i assume is because, the clients took too long to put.
>>>
>>> The detailed log is as follows from one of the reduce job is as follows.
>>>
>>> I've 'censored' some of the details. which i assume is Okay.! :P
>>> 2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
>>> Loaded the native-hadoop library
>>> 2012-04-23 20:07:13,097 WARN
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
>> already
>>> exists!
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46
>> GMT
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:host.name=*****.*****
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.version=1.6.0_22
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.vendor=Sun Microsystems Inc.
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.class.path=****************************