Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> storing custom bloomfilter/BitSet


Copy link to this message
-
storing custom bloomfilter/BitSet
Hi,

Is there a way to store a custom BitSet for every row and add new bits
while importing? I can't use the bloomfilter that is already there because
in every columnnames are 2 elements.

Here is my scenario:
My table looks like this:
rowKey1 -> cf:<data1,data2>,  cf:<data3,data4>, ...
rowKey2 -> cf:<data234,data5>. ...

the columname includes data1 and data2.

This setup  works for me now, but I try to imrpove it. I'm using the
BulkLoad feature. At first I import a CSV file that looks like this:
ROWKEY     COLUMNFAMILY     COLUMNAME     HASH_INDEX_1     HASH_INDEX_2
rowKey1       cf                            <data1,data2>     5
              12
rowKey1       cf                            <data3,data4>     8
               5

For every hash in HASH_INDEX_1/2 I creat a new column with the index as a
name and the columnfamily "bloomfilter1" or "bloomfilter2". I store the
columname as a 4byte Integer String. For the Example above I would store
this: bloomfilter1:5 and bloomfilter2:12. This method works fine, but the
export and backtransformation to a BitSet become very slow if the
bloomfilter is to big (> 1 million). So a better solution would be to store
only the BitSet instead of a 4byte Integer for every index.

Does anyone now if it is possible to create this filter while importing the
data?

thanks
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB