Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Fastest way to find is a row exist?


Copy link to this message
-
Re: Fastest way to find is a row exist?
I want to remove it because I have set it up on the wrong column ;) I
should have used NAME => 'a' instead of ='@' ;)

I have setup the kof on the code and redeployed. I have also added the
bloom on the right column. I will remove the wrong one later.

As soon as the compaction is done I will restart my MR and keep
fingers crossed...

2013/1/4, Bryan Beaudreault <[EMAIL PROTECTED]>:
> Why do you want to remove the bloom filter?  I think you should keep the
> bloom filter but also use the KeyOnlyFilter to cut down on data transferred
> over the wire.
>
>
> On Fri, Jan 4, 2013 at 3:28 PM, Jean-Marc Spaggiari
> <[EMAIL PROTECTED]
>> wrote:
>
>> Ok. I have activate them on 2 of my main tables and I will re-run the
>> job and see.
>>
>> 2 other questions then ;)
>>
>> 1) I have activated them that way: alter 'work_proposed', NAME => '@',
>> BLOOMFILTER => 'ROW' how can I remove them?
>> 2) Should I major_compact to make sure all the hash are stored?
>>
>> Thanks,
>>
>> JM
>>
>> 2013/1/4, Adrien Mogenet <[EMAIL PROTECTED]>:
>> > On every Get, BloomFilter is acting as a filter (!) on top of each
>> > HFile
>> > and allows to check if a key is absent from the HFile. So yes, you will
>> > benefit from these filters.
>> >
>> >
>> > On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari
>> > <[EMAIL PROTECTED]
>> >> wrote:
>> >
>> >> Is KeyOnlyFilter using the BloomFilters too?
>> >>
>> >> Here is, with more details, what I'm doing.
>> >>
>> >> Few questions.
>> >> - Can I create one single KeyOnlyFilter and give the same filter to
>> >> all the gets?
>> >> - Will bloom filters benefit in such scenario? My key is small. Let's
>> >> say average 128 bytes.
>> >>
>> >> The goal here is to check about 500 entries at a time to validate if
>> >> they already exist or not.
>> >>
>> >> In my MR, I'm starting when I have more than 100K lines to handle, and
>> >> each line car have up to 1K entries. So it can result up to 100M
>> >> gets... Job took initially 500 minutes to complete. I have added few
>> >> pretty good nodes and it's not taking less than 300 minutes. But I
>> >> would like to get under 100 minutes if I can...
>> >>
>> >> Thanks,
>> >>
>> >> JM
>> >>
>> >>         Vector<Get> gets_entry_exist = new Vector<Get>();
>> >>         for (Entry entry : entries.getEntries())
>> >>         {
>> >>                 Get entry_exist = new Get(entry.toKey());
>> >>                 entry_exist.setFilter(new KeyOnlyFilter());
>> >>                 gets_entry_exist.add(entry_exist);
>> >>         }
>> >>
>> >>         Result[] result_entry_exist >> >> table_entry.get(gets_entry_exist);
>> >>
>> >>         int index = 0;
>> >>         for (Entry entry : entries.getEntries())
>> >>         {
>> >>                 boolean isEmpty >>  result_entry_exist[index++].isEmpty();
>> >>                 if (isEmpty)
>> >>                 {
>> >>                         // Process here
>> >>                 }
>> >>         }
>> >>                                                 {
>> >>
>> >>
>> >> 2013/1/4, Damien Hardy <[EMAIL PROTECTED]>:
>> >> > Hello Jean-Marc,
>> >> >
>> >> > BloomFilters are just designed for that.
>> >> >
>> >> > But they say if a row doesn't exist with a ash of the key (not the
>> >> oposit,
>> >> > 2 rowkeys could have the same ash result).
>> >> >
>> >> > If you want to be sure the rowkey exists you have to search for it
>> >> > in
>> >> > the
>> >> > HFile ( the whole mechanism is transparent with the get() ).
>> >> >
>> >> > Their is also an KeOnlyFilter
>> >> >
>> >>
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
>> >> > preventing from getting the whole columns of the existing key as
>> return
>> >> > (which could be heavy).
>> >> >
>> >> > Cheers,
>> >> >
>> >> > --
>> >> > Damien
>> >> >
>> >> >
>> >> > 2013/1/4 Jean-Marc Spaggiari <[EMAIL PROTECTED]>
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> What's the fastest way to know if a row exist?
>> >> >>
>> >> >> Today I'm doing that:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB