Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> coprocessor enabled put very slow, help please~~~


Copy link to this message
-
Re: coprocessor enabled put very slow, help please~~~
I should follow up with that I was asking why he was using an HTable Pool, not saying that it was wrong.

Still. I think in the pool the writes shouldn't have to go to the WAL.
On Feb 19, 2013, at 10:01 AM, Michael Segel <[EMAIL PROTECTED]> wrote:

> Good question..
>
> You create a class MyRO.
>
> How many instances of  MyRO exist per RS?
>
> How many queries can access the instance MyRO at the same time?
>
>
>
>
> On Feb 19, 2013, at 9:15 AM, Wei Tan <[EMAIL PROTECTED]> wrote:
>
>> A side question: if HTablePool is not encouraged to be used... how we
>> handle the thread safeness in using HTable? Any replacement for
>> HTablePool, in plan?
>> Thanks,
>>
>>
>> Best Regards,
>> Wei
>>
>>
>>
>>
>> From:   Michel Segel <[EMAIL PROTECTED]>
>> To:     "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>,
>> Date:   02/18/2013 09:23 AM
>> Subject:        Re: coprocessor enabled put very slow, help please~~~
>>
>>
>>
>> Why are you using an HTable Pool?
>> Why are you closing the table after each iteration through?
>>
>> Try using 1 HTable object. Turn off WAL
>> Initiate in start()
>> Close in Stop()
>> Surround the use in a try / catch
>> If exception caught, re instantiate new HTable connection.
>>
>> Maybe want to flush the connection after puts.
>>
>>
>> Again not sure why you are using check and put on the base table. Your
>> count could be off.
>>
>> As an example look at poem/rhyme 'Marry had a little lamb'.
>> Then check your word count.
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Feb 18, 2013, at 7:21 AM, prakash kadel <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Thank you guys for your replies,
>>> Michael,
>>> I think i didnt make it clear. Here is my use case,
>>>
>>> I have text documents to insert in the hbase. (With possible duplicates)
>>> Suppose i have a document as : " I am working. He is not working"
>>>
>>> I want to insert this document to a table in hbase, say table "doc"
>>>
>>> =doc table>>> -----
>>> rowKey : doc_id
>>> cf: doc_content
>>> value: "I am working. He is not working"
>>>
>>> Now, i to create another table that stores the word count, say "doc_idx"
>>>
>>> doc_idx table
>>> ---
>>> rowKey : I, cf: count, value: 1
>>> rowKey : am, cf: count, value: 1
>>> rowKey : working, cf: count, value: 2
>>> rowKey : He, cf: count, value: 1
>>> rowKey : is, cf: count, value: 1
>>> rowKey : not, cf: count, value: 1
>>>
>>> My MR job code:
>>> =============>>>
>>> if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) {
>>>  for(String word : doc_content.split("\\s+")) {
>>>     Increment inc = new Increment(Bytes.toBytes(word));
>>>     inc.addColumn("count", "", 1);
>>>  }
>>> }
>>>
>>> Now, i wanted to do some experiments with coprocessors. So, i modified
>>> the code as follows.
>>>
>>> My MR job code:
>>> ==============>>>
>>> doc.checkAndPut(rowKey, doc_content, "", null, putDoc);
>>>
>>> Coprocessor code:
>>> ==============>>>
>>>  public void start(CoprocessorEnvironment env)  {
>>>      pool = new HTablePool(conf, 100);
>>>  }
>>>
>>>  public boolean postCheckAndPut(c,  row,  family, byte[] qualifier,
>>> compareOp,     comparator,  put, result) {
>>>
>>>              if(!result) return true; // check if the put succeeded
>>>
>>>      HTableInterface table_idx = pool.getTable("doc_idx");
>>>
>>>      try {
>>>
>>>          for(KeyValue contentKV = put.get("doc_content", "")) {
>>>                          for(String word :
>>> contentKV.getValue().split("\\s+")) {
>>>                              Increment inc = new
>>> Increment(Bytes.toBytes(word));
>>>                              inc.addColumn("count", "", 1);
>>>                              table_idx.increment(inc);
>>>                          }
>>>                     }
>>>      } finally {
>>>          table_idx.close();
>>>      }
>>>      return true;
>>>  }
>>>
>>>  public void stop(env) {
>>>      pool.close();
>>>  }
>>>
>>> I am a newbee to HBASE. I am not sure this is the way to do.