Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Questions about HBase


+
Pankaj Gupta 2013-06-05, 02:15
+
Ted Yu 2013-06-05, 03:44
+
ramkrishna vasudevan 2013-06-05, 03:14
+
Ted Yu 2013-06-05, 04:29
Copy link to this message
-
Re: Questions about HBase
>4. This one is related to what I read in the HBase definitive guide
   bloom filter section
   Given a random row key you are looking for, it is very likely that this
   key will fall in between two block start keys. The only way for HBase to
   figure out if the key actually exists is by loading the block and
scanning
   it to find the key.
   The above excerpt seems to imply to me that the search for key inside a
   block is linear and I feel I must be reading it wrong. I would expect the
   scan to be a binary search.

Yes as Ram said, using the RK the HFile data block where this key *might*
be present can be found out and the same is loaded and then we seek to
exact RK. This is a linear read.  You can take a look at Prefix Tree
encoder which is available in 95. This one tries to avoid this linear read
within a block.

On Wed, Jun 5, 2013 at 9:59 AM, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq. But i am not very sure if we can control the files getting selected for
> compaction in the older verisons.
>
> Same mechanism is available in 0.94
>
> Take a look
> at
> src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
> where you would find the following methods (and more):
>
>   public void preCompactSelection(final
> ObserverContext<RegionCoprocessorEnvironment> c,
>       final Store store, final List<StoreFile> candidates, final
> CompactionRequest request)
>   public InternalScanner
> preCompact(ObserverContext<RegionCoprocessorEnvironment> e,
>       final Store store, final InternalScanner scanner) throws IOException
> {
>
> Cheers
>
> On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan <
> [EMAIL PROTECTED]> wrote:
>
> > >>Does Minor compaction remove HFiles in which all entries are out of
> >    TTL or does only Major compaction do that
> > Yes it applies for Minor compactions.
> > >>Is there a way of configuring major compaction to compact only files
> >    older than a certain time or to compress all the files except the
> latest
> >    few?
> > In the latest trunk version the compaction algo itself can be plugged.
> >  There are some coprocessor hooks that gives control on the scanner that
> > gets created for compaction with which we can control the KVs being
> > selected. But i am not very sure if we can control the files getting
> > selected for compaction in the older verisons.
> > >> The above excerpt seems to imply to me that the search for key inside
> a
> > block
> > is linear and I feel I must be reading it wrong. I would expect the scan
> to
> > be a binary search.
> > Once the data block is identified for a key, we seek to the beginning of
> > the block and then do a linear search until we reach the exact key that
> we
> > are looking out for.  Because internally the data (KVs) are stored as
> byte
> > buffers per block and it follows this pattern
> > <keylength><valuelength><keybytearray><valuebytearray>
> > >>Is there a way to warm up the bloom filter and block index cache for
> >    a table?
> > You always want the bloom and block index to be in cache?
> >
> >
> > On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi,
> > >
> > > I have a few small questions regarding HBase. I've searched the forum
> but
> > > couldn't find clear answers hence asking them here:
> > >
> > >
> > >    1. Does Minor compaction remove HFiles in which all entries are out
> of
> > >    TTL or does only Major compaction do that? I found this jira:
> > >    https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know
> if
> > > the
> > >    compaction being talked about there is minor or major.
> > >    2. Is there a way of configuring major compaction to compact only
> > files
> > >    older than a certain time or to compress all the files except the
> > latest
> > >    few? We basically want to use the time based filtering optimization
> in
> > >    HBase to get the latest additions to the table and since major
> > > compaction
> > >    bunches everything into one file, it would defeat the optimization.
+
Pankaj Gupta 2013-06-05, 05:09
+
Asaf Mesika 2013-06-05, 05:27
+
ramkrishna vasudevan 2013-06-05, 05:43
+
Asaf Mesika 2013-06-05, 05:52
+
ramkrishna vasudevan 2013-06-05, 06:05
+
Pankaj Gupta 2013-06-05, 06:16
+
Pankaj Gupta 2013-06-05, 06:26
+
ramkrishna vasudevan 2013-06-05, 07:07
+
Anoop John 2013-06-05, 08:24
+
Pankaj Gupta 2013-06-06, 02:52
+
ramkrishna vasudevan 2013-06-06, 03:15
+
Anoop John 2013-06-06, 05:38
+
Pankaj Gupta 2013-06-05, 06:06
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB