Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> coprocessor enabled put very slow, help please~~~


Copy link to this message
-
Re: coprocessor enabled put very slow, help please~~~
1. Try batching your increment calls to a List<Row> and use batch() to
execute it. Should reduce RPC calls by 2 magnitudes.
2. Combine batching with scanning more words, thus aggregating your count
for a certain word thus less Increment commands.
3. Enable Bloom Filters. Should speed up Increment by a factor of 2 at
least.
4. Don't use keyValue.getValue(). It does a System.arraycopy behind the
scenes. Use getBuffer() and getValueOffset() and getValueLength() and
iterate on the existing array. Write your own Split without going into
using String functions which goes through encoding (expensive). Just find
your delimiter by byte comparison.
5. Enable BloomFilters on doc table. It should speed up the checkAndPut.
6. I wouldn't give up WAL. It ain't your bottleneck IMO.

On Monday, February 18, 2013, prakash kadel wrote:

> Thank you guys for your replies,
> Michael,
>    I think i didnt make it clear. Here is my use case,
>
> I have text documents to insert in the hbase. (With possible duplicates)
> Suppose i have a document as : " I am working. He is not working"
>
> I want to insert this document to a table in hbase, say table "doc"
>
> =doc table> -----
> rowKey : doc_id
> cf: doc_content
> value: "I am working. He is not working"
>
> Now, i to create another table that stores the word count, say "doc_idx"
>
> doc_idx table
> ---
> rowKey : I, cf: count, value: 1
> rowKey : am, cf: count, value: 1
> rowKey : working, cf: count, value: 2
> rowKey : He, cf: count, value: 1
> rowKey : is, cf: count, value: 1
> rowKey : not, cf: count, value: 1
>
> My MR job code:
> =============>
> if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) {
>     for(String word : doc_content.split("\\s+")) {
>        Increment inc = new Increment(Bytes.toBytes(word));
>        inc.addColumn("count", "", 1);
>     }
> }
>
> Now, i wanted to do some experiments with coprocessors. So, i modified
> the code as follows.
>
> My MR job code:
> ==============>
> doc.checkAndPut(rowKey, doc_content, "", null, putDoc);
>
> Coprocessor code:
> ==============>
>         public void start(CoprocessorEnvironment env)  {
>                 pool = new HTablePool(conf, 100);
>         }
>
>         public boolean postCheckAndPut(c,  row,  family, byte[] qualifier,
> compareOp,       comparator,  put, result) {
>
>                 if(!result) return true; // check if the put succeeded
>
>                 HTableInterface table_idx = pool.getTable("doc_idx");
>
>                 try {
>
>                         for(KeyValue contentKV = put.get("doc_content",
> "")) {
>                             for(String word :
> contentKV.getValue().split("\\s+")) {
>                                 Increment inc = new
> Increment(Bytes.toBytes(word));
>                                 inc.addColumn("count", "", 1);
>                                 table_idx.increment(inc);
>                             }
>                        }
>                 } finally {
>                         table_idx.close();
>                 }
>                 return true;
>         }
>
>         public void stop(env) {
>                 pool.close();
>         }
>
> I am a newbee to HBASE. I am not sure this is the way to do.
> Given that, why is the cooprocessor enabled version much slower than
> the one without?
>
>
> Sincerely,
> Prakash Kadel
>
>
> On Mon, Feb 18, 2013 at 9:11 PM, Michael Segel
> <[EMAIL PROTECTED] <javascript:;>> wrote:
> >
> > The  issue I was talking about was the use of a check and put.
> > The OP wrote:
> >>>>> each map inserts to doc table.(checkAndPut)
> >>>>> regionobserver coprocessor does a postCheckAndPut and inserts some
> rows to
> >>>>> a index table.
> >
> > My question is why does the OP use a checkAndPut, and the
> RegionObserver's postChecAndPut?
> >
> >
> > Here's a good example...
> http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put
> >
> > The OP doesn't really get in to the use case, so we don't know why the
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB