Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> coprocessor enabled put very slow, help please~~~


Copy link to this message
-
Re: coprocessor enabled put very slow, help please~~~
I was suggesting removing the write to WAL on your write to the index table only.

The thing you have to realize that true low latency systems use databases as a sink. It's the end of the line so to speak.

So if you're worried about a small latency between the writing to your doc table, and then the write of your index.. You are designing the wrong system.

Consider that it takes some time t to write the base record and then to write the indexes.
For that period, you have a Schrödinger's cat problem as to if the row exists or not. Since HBase lacks transactions and ACID, trying to write a solution where you require the low latency... You are using the wrong tool.

Remember that HBase was designed as a distributed system for managing very large data sets. Your speed from using secondary indexes like an inverted table is in the read and not the write.

If you had append working, you could create an index if you could create a fixed sized key buffer. Or something down that path... Sorry, just thinking something out loud...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 19, 2013, at 1:53 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> 1. Try batching your increment calls to a List<Row> and use batch() to
> execute it. Should reduce RPC calls by 2 magnitudes.
> 2. Combine batching with scanning more words, thus aggregating your count
> for a certain word thus less Increment commands.
> 3. Enable Bloom Filters. Should speed up Increment by a factor of 2 at
> least.
> 4. Don't use keyValue.getValue(). It does a System.arraycopy behind the
> scenes. Use getBuffer() and getValueOffset() and getValueLength() and
> iterate on the existing array. Write your own Split without going into
> using String functions which goes through encoding (expensive). Just find
> your delimiter by byte comparison.
> 5. Enable BloomFilters on doc table. It should speed up the checkAndPut.
> 6. I wouldn't give up WAL. It ain't your bottleneck IMO.
>
> On Monday, February 18, 2013, prakash kadel wrote:
>
>> Thank you guys for your replies,
>> Michael,
>>   I think i didnt make it clear. Here is my use case,
>>
>> I have text documents to insert in the hbase. (With possible duplicates)
>> Suppose i have a document as : " I am working. He is not working"
>>
>> I want to insert this document to a table in hbase, say table "doc"
>>
>> =doc table>> -----
>> rowKey : doc_id
>> cf: doc_content
>> value: "I am working. He is not working"
>>
>> Now, i to create another table that stores the word count, say "doc_idx"
>>
>> doc_idx table
>> ---
>> rowKey : I, cf: count, value: 1
>> rowKey : am, cf: count, value: 1
>> rowKey : working, cf: count, value: 2
>> rowKey : He, cf: count, value: 1
>> rowKey : is, cf: count, value: 1
>> rowKey : not, cf: count, value: 1
>>
>> My MR job code:
>> =============>>
>> if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) {
>>    for(String word : doc_content.split("\\s+")) {
>>       Increment inc = new Increment(Bytes.toBytes(word));
>>       inc.addColumn("count", "", 1);
>>    }
>> }
>>
>> Now, i wanted to do some experiments with coprocessors. So, i modified
>> the code as follows.
>>
>> My MR job code:
>> ==============>>
>> doc.checkAndPut(rowKey, doc_content, "", null, putDoc);
>>
>> Coprocessor code:
>> ==============>>
>>        public void start(CoprocessorEnvironment env)  {
>>                pool = new HTablePool(conf, 100);
>>        }
>>
>>        public boolean postCheckAndPut(c,  row,  family, byte[] qualifier,
>> compareOp,       comparator,  put, result) {
>>
>>                if(!result) return true; // check if the put succeeded
>>
>>                HTableInterface table_idx = pool.getTable("doc_idx");
>>
>>                try {
>>
>>                        for(KeyValue contentKV = put.get("doc_content",
>> "")) {
>>                            for(String word :
>> contentKV.getValue().split("\\s+")) {
>>                                Increment inc = new
>> Increment(Bytes.toBytes(word));
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB