Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> coprocessor enabled put very slow, help please~~~


Copy link to this message
-
Re: coprocessor enabled put very slow, help please~~~
Good question..

You create a class MyRO.

How many instances of  MyRO exist per RS?

How many queries can access the instance MyRO at the same time?
On Feb 19, 2013, at 9:15 AM, Wei Tan <[EMAIL PROTECTED]> wrote:

> A side question: if HTablePool is not encouraged to be used... how we
> handle the thread safeness in using HTable? Any replacement for
> HTablePool, in plan?
> Thanks,
>
>
> Best Regards,
> Wei
>
>
>
>
> From:   Michel Segel <[EMAIL PROTECTED]>
> To:     "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>,
> Date:   02/18/2013 09:23 AM
> Subject:        Re: coprocessor enabled put very slow, help please~~~
>
>
>
> Why are you using an HTable Pool?
> Why are you closing the table after each iteration through?
>
> Try using 1 HTable object. Turn off WAL
> Initiate in start()
> Close in Stop()
> Surround the use in a try / catch
> If exception caught, re instantiate new HTable connection.
>
> Maybe want to flush the connection after puts.
>
>
> Again not sure why you are using check and put on the base table. Your
> count could be off.
>
> As an example look at poem/rhyme 'Marry had a little lamb'.
> Then check your word count.
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Feb 18, 2013, at 7:21 AM, prakash kadel <[EMAIL PROTECTED]>
> wrote:
>
>> Thank you guys for your replies,
>> Michael,
>>  I think i didnt make it clear. Here is my use case,
>>
>> I have text documents to insert in the hbase. (With possible duplicates)
>> Suppose i have a document as : " I am working. He is not working"
>>
>> I want to insert this document to a table in hbase, say table "doc"
>>
>> =doc table>> -----
>> rowKey : doc_id
>> cf: doc_content
>> value: "I am working. He is not working"
>>
>> Now, i to create another table that stores the word count, say "doc_idx"
>>
>> doc_idx table
>> ---
>> rowKey : I, cf: count, value: 1
>> rowKey : am, cf: count, value: 1
>> rowKey : working, cf: count, value: 2
>> rowKey : He, cf: count, value: 1
>> rowKey : is, cf: count, value: 1
>> rowKey : not, cf: count, value: 1
>>
>> My MR job code:
>> =============>>
>> if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) {
>>   for(String word : doc_content.split("\\s+")) {
>>      Increment inc = new Increment(Bytes.toBytes(word));
>>      inc.addColumn("count", "", 1);
>>   }
>> }
>>
>> Now, i wanted to do some experiments with coprocessors. So, i modified
>> the code as follows.
>>
>> My MR job code:
>> ==============>>
>> doc.checkAndPut(rowKey, doc_content, "", null, putDoc);
>>
>> Coprocessor code:
>> ==============>>
>>   public void start(CoprocessorEnvironment env)  {
>>       pool = new HTablePool(conf, 100);
>>   }
>>
>>   public boolean postCheckAndPut(c,  row,  family, byte[] qualifier,
>> compareOp,     comparator,  put, result) {
>>
>>               if(!result) return true; // check if the put succeeded
>>
>>       HTableInterface table_idx = pool.getTable("doc_idx");
>>
>>       try {
>>
>>           for(KeyValue contentKV = put.get("doc_content", "")) {
>>                           for(String word :
>> contentKV.getValue().split("\\s+")) {
>>                               Increment inc = new
>> Increment(Bytes.toBytes(word));
>>                               inc.addColumn("count", "", 1);
>>                               table_idx.increment(inc);
>>                           }
>>                      }
>>       } finally {
>>           table_idx.close();
>>       }
>>       return true;
>>   }
>>
>>   public void stop(env) {
>>       pool.close();
>>   }
>>
>> I am a newbee to HBASE. I am not sure this is the way to do.
>> Given that, why is the cooprocessor enabled version much slower than
>> the one without?
>>
>>
>> Sincerely,
>> Prakash Kadel
>>
>>
>> On Mon, Feb 18, 2013 at 9:11 PM, Michael Segel
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> The  issue I was talking about was the use of a check and put.
>>> The OP wrote:
>>>>>>> each map inserts to doc table.(checkAndPut)