Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> coprocessor enabled put very slow, help please~~~


Copy link to this message
-
Re: coprocessor enabled put very slow, help please~~~
Why are you using an HTable Pool?
Why are you closing the table after each iteration through?

Try using 1 HTable object. Turn off WAL
Initiate in start()
Close in Stop()
Surround the use in a try / catch
If exception caught, re instantiate new HTable connection.

Maybe want to flush the connection after puts.
Again not sure why you are using check and put on the base table. Your count could be off.

As an example look at poem/rhyme 'Marry had a little lamb'.
Then check your word count.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 18, 2013, at 7:21 AM, prakash kadel <[EMAIL PROTECTED]> wrote:

> Thank you guys for your replies,
> Michael,
>   I think i didnt make it clear. Here is my use case,
>
> I have text documents to insert in the hbase. (With possible duplicates)
> Suppose i have a document as : " I am working. He is not working"
>
> I want to insert this document to a table in hbase, say table "doc"
>
> =doc table> -----
> rowKey : doc_id
> cf: doc_content
> value: "I am working. He is not working"
>
> Now, i to create another table that stores the word count, say "doc_idx"
>
> doc_idx table
> ---
> rowKey : I, cf: count, value: 1
> rowKey : am, cf: count, value: 1
> rowKey : working, cf: count, value: 2
> rowKey : He, cf: count, value: 1
> rowKey : is, cf: count, value: 1
> rowKey : not, cf: count, value: 1
>
> My MR job code:
> =============>
> if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) {
>    for(String word : doc_content.split("\\s+")) {
>       Increment inc = new Increment(Bytes.toBytes(word));
>       inc.addColumn("count", "", 1);
>    }
> }
>
> Now, i wanted to do some experiments with coprocessors. So, i modified
> the code as follows.
>
> My MR job code:
> ==============>
> doc.checkAndPut(rowKey, doc_content, "", null, putDoc);
>
> Coprocessor code:
> ==============>
>    public void start(CoprocessorEnvironment env)  {
>        pool = new HTablePool(conf, 100);
>    }
>
>    public boolean postCheckAndPut(c,  row,  family, byte[] qualifier,
> compareOp,     comparator,  put, result) {
>
>                if(!result) return true; // check if the put succeeded
>
>        HTableInterface table_idx = pool.getTable("doc_idx");
>
>        try {
>
>            for(KeyValue contentKV = put.get("doc_content", "")) {
>                            for(String word :
> contentKV.getValue().split("\\s+")) {
>                                Increment inc = new
> Increment(Bytes.toBytes(word));
>                                inc.addColumn("count", "", 1);
>                                table_idx.increment(inc);
>                            }
>                       }
>        } finally {
>            table_idx.close();
>        }
>        return true;
>    }
>
>    public void stop(env) {
>        pool.close();
>    }
>
> I am a newbee to HBASE. I am not sure this is the way to do.
> Given that, why is the cooprocessor enabled version much slower than
> the one without?
>
>
> Sincerely,
> Prakash Kadel
>
>
> On Mon, Feb 18, 2013 at 9:11 PM, Michael Segel
> <[EMAIL PROTECTED]> wrote:
>>
>> The  issue I was talking about was the use of a check and put.
>> The OP wrote:
>>>>>> each map inserts to doc table.(checkAndPut)
>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some rows to
>>>>>> a index table.
>>
>> My question is why does the OP use a checkAndPut, and the RegionObserver's postChecAndPut?
>>
>>
>> Here's a good example... http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put
>>
>> The OP doesn't really get in to the use case, so we don't know why the Check and Put in the M/R job.
>> He should just be using put() and then a postPut().
>>
>> Another issue... since he's writing to  a different HTable... how? Does he create an HTable instance in the start() method of his RO object and then reference it later? Or does he create the instance of the HTable on the fly in each postCheckAndPut() ?