Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> reason to do major compaction after split


Copy link to this message
-
Re: reason to do major compaction after split
> Sounds like a step toward using a block pool directly and avoiding the
filesystem layer (Hadoop 2+).

This has come up previously. With federation, we should be able to embed NN
as a first cut, and own all the blocks in the hbase namespace.

Enis
On Fri, Mar 8, 2013 at 11:32 AM, Sergey Shelukhin <[EMAIL PROTECTED]>wrote:

> +1.
> That gives us a lot of freedom to do stuff in many scenarios.
>
> On Thu, Mar 7, 2013 at 5:42 PM, Andrew Purtell <[EMAIL PROTECTED]>
> wrote:
>
> > > also, if instead of files you think about handling blocks directly you
> > can end up doing more stuff, like a proper compaction that require less
> I/O
> > if N blocks are not changed, some crazy deduplication on tables with same
> > content & similar...
> >
> > Sounds like a step toward using a block pool directly and avoiding the
> > filesystem layer (Hadoop 2+).
> >
> >
> > On Fri, Mar 8, 2013 at 7:36 AM, Matteo Bertozzi <[EMAIL PROTECTED]
> > >wrote:
> >
> > > sure having the hardlink support
> > > (HDFS-3370<https://issues.apache.org/jira/browse/HDFS-3370>)
> > > solve the HFileLink hack
> > > but you still need to add extra metadata for splits (reference files)
> > >
> > > also, if instead of files you think about handling blocks directly
> > > you can end up doing more stuff, like a proper compaction that
> > > require less I/O if N blocks are not changed, some crazy deduplication
> > > on tables with same content & similar...
> > >
> > > On Thu, Mar 7, 2013 at 11:22 PM, Sergey Shelukhin <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hmm... ranges sounds good, but for files, it would be nice if there
> > were
> > > a
> > > > hardlink mechanism.
> > > > It should be trivial to do in HDFS if blocks could belong to several
> > > files.
> > > > Then we don't have to have private cleanup code.
> > > >
> > > > On Thu, Mar 7, 2013 at 2:28 PM, Matteo Bertozzi <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > This is seems to going in a super messy direction.
> > > > > With HBASE-7806 the ideas was to cleanup all this crazy stuff
> > > (HFileLink,
> > > > > References, ...)
> > > > >
> > > > > unfortunately the initial decision of tight together the fs layout
> > > > > and the tables/regions/families is bringing to all this workaround
> to
> > > > have
> > > > > something cool.
> > > > >
> > > > > If you put the files in one place, and the association in another
> >  you
> > > > can
> > > > > avoid all this complexity.
> > > > >
> > > > > /hbase/data/[file1, file 2, file 3, file N]
> > > > >
> > > > > table 1/region 1: [file 2]
> > > > > table 1/region 2: [file 1 (from 0 to 50)]
> > > > > table 1/region 3: [file 1 (from 50 to 100)]
> > > > > table 2/region 1: [file 1, file 2]
> > > > >
> > > > > On Thu, Mar 7, 2013 at 10:13 PM, Stack <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > Yes.  That is a few trips to the NN listing directory contents
> and
> > > then
> > > > > > some edits/reading of .META.  We would have to introduce a
> > > QuarterHFile
> > > > > to
> > > > > > go with our HalfHFile (or rename HalfHFile as PieceO'HFile).
> > > > > >
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB