Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> coprocessor enabled put very slow, help please~~~


+
Prakash Kadel 2013-02-18, 00:48
+
Prakash Kadel 2013-02-18, 00:52
+
lars hofhansl 2013-02-18, 01:07
+
Prakash Kadel 2013-02-18, 01:13
+
lars hofhansl 2013-02-18, 01:17
+
Prakash Kadel 2013-02-18, 01:26
+
lars hofhansl 2013-02-18, 02:31
+
Michael Segel 2013-02-18, 09:31
+
Michael Segel 2013-02-18, 01:31
+
Prakash Kadel 2013-02-18, 02:01
+
Prakash Kadel 2013-02-18, 01:32
+
Wei Tan 2013-02-18, 05:52
+
Prakash Kadel 2013-02-18, 09:01
+
Michael Segel 2013-02-18, 09:35
+
yonghu 2013-02-18, 10:57
+
Michael Segel 2013-02-18, 12:11
+
yonghu 2013-02-18, 12:22
+
Michael Segel 2013-02-18, 12:45
+
prakash kadel 2013-02-18, 13:21
+
Michel Segel 2013-02-18, 14:13
Copy link to this message
-
Re: coprocessor enabled put very slow, help please~~~
A side question: if HTablePool is not encouraged to be used... how we
handle the thread safeness in using HTable? Any replacement for
HTablePool, in plan?
Thanks,
Best Regards,
Wei
From:   Michel Segel <[EMAIL PROTECTED]>
To:     "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>,
Date:   02/18/2013 09:23 AM
Subject:        Re: coprocessor enabled put very slow, help please~~~

Why are you using an HTable Pool?
Why are you closing the table after each iteration through?

Try using 1 HTable object. Turn off WAL
Initiate in start()
Close in Stop()
Surround the use in a try / catch
If exception caught, re instantiate new HTable connection.

Maybe want to flush the connection after puts.
Again not sure why you are using check and put on the base table. Your
count could be off.

As an example look at poem/rhyme 'Marry had a little lamb'.
Then check your word count.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 18, 2013, at 7:21 AM, prakash kadel <[EMAIL PROTECTED]>
wrote:

> Thank you guys for your replies,
> Michael,
>   I think i didnt make it clear. Here is my use case,
>
> I have text documents to insert in the hbase. (With possible duplicates)
> Suppose i have a document as : " I am working. He is not working"
>
> I want to insert this document to a table in hbase, say table "doc"
>
> =doc table> -----
> rowKey : doc_id
> cf: doc_content
> value: "I am working. He is not working"
>
> Now, i to create another table that stores the word count, say "doc_idx"
>
> doc_idx table
> ---
> rowKey : I, cf: count, value: 1
> rowKey : am, cf: count, value: 1
> rowKey : working, cf: count, value: 2
> rowKey : He, cf: count, value: 1
> rowKey : is, cf: count, value: 1
> rowKey : not, cf: count, value: 1
>
> My MR job code:
> =============>
> if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) {
>    for(String word : doc_content.split("\\s+")) {
>       Increment inc = new Increment(Bytes.toBytes(word));
>       inc.addColumn("count", "", 1);
>    }
> }
>
> Now, i wanted to do some experiments with coprocessors. So, i modified
> the code as follows.
>
> My MR job code:
> ==============>
> doc.checkAndPut(rowKey, doc_content, "", null, putDoc);
>
> Coprocessor code:
> ==============>
>    public void start(CoprocessorEnvironment env)  {
>        pool = new HTablePool(conf, 100);
>    }
>
>    public boolean postCheckAndPut(c,  row,  family, byte[] qualifier,
> compareOp,     comparator,  put, result) {
>
>                if(!result) return true; // check if the put succeeded
>
>        HTableInterface table_idx = pool.getTable("doc_idx");
>
>        try {
>
>            for(KeyValue contentKV = put.get("doc_content", "")) {
>                            for(String word :
> contentKV.getValue().split("\\s+")) {
>                                Increment inc = new
> Increment(Bytes.toBytes(word));
>                                inc.addColumn("count", "", 1);
>                                table_idx.increment(inc);
>                            }
>                       }
>        } finally {
>            table_idx.close();
>        }
>        return true;
>    }
>
>    public void stop(env) {
>        pool.close();
>    }
>
> I am a newbee to HBASE. I am not sure this is the way to do.
> Given that, why is the cooprocessor enabled version much slower than
> the one without?
>
>
> Sincerely,
> Prakash Kadel
>
>
> On Mon, Feb 18, 2013 at 9:11 PM, Michael Segel
> <[EMAIL PROTECTED]> wrote:
>>
>> The  issue I was talking about was the use of a check and put.
>> The OP wrote:
>>>>>> each map inserts to doc table.(checkAndPut)
>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some
rows to
>>>>>> a index table.
>>
>> My question is why does the OP use a checkAndPut, and the
RegionObserver's postChecAndPut?
>>
>>
>> Here's a good example...
http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put

Check and Put in the M/R job.
he create an HTable instance in the start() method of his RO object and
then reference it later? Or does he create the instance of the HTable on
the fly in each postCheckAndPut() ?
the M/R call to put will wait until the second row is inserted.
write to the index.  You can always run a M/R job that rebuilds the index
should something occur to the system where you might lose the data.
Indexes *ARE* expendable. ;-)
Pessimistic code really isn't recommended if you are worried about
performance.
co-processor, what would cause the initial write to fail?
wrote:
indicating if the Put succeeded. Incase if the put success, i insert a row
in another table
nature of
mapreduce. To
to use
rows to
+
Michael Segel 2013-02-19, 16:01
+
Michael Segel 2013-02-19, 16:29
+
Andrew Purtell 2013-02-19, 20:05
+
prakash kadel 2013-02-19, 07:41
+
Asaf Mesika 2013-02-19, 21:53
+
Michel Segel 2013-02-20, 13:00
+
Prakash Kadel 2013-02-20, 13:26
+
Michel Segel 2013-02-20, 14:14
+
Prakash Kadel 2013-02-20, 15:10
+
Wei Tan 2013-02-18, 17:56
+
Michael Segel 2013-02-18, 18:42
+
yonghu 2013-02-18, 09:01
+
yonghu 2013-02-18, 09:02