Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> reason to do major compaction after split


+
Sergey Shelukhin 2013-03-07, 18:50
+
Stack 2013-03-07, 18:58
+
Enis Söztutar 2013-03-07, 19:03
+
Sergey Shelukhin 2013-03-07, 20:58
+
Enis Söztutar 2013-03-07, 21:14
+
Stack 2013-03-07, 22:13
+
Matteo Bertozzi 2013-03-07, 22:28
+
Stack 2013-03-07, 22:56
+
Matteo Bertozzi 2013-03-07, 23:09
+
Sergey Shelukhin 2013-03-07, 23:22
+
Matteo Bertozzi 2013-03-07, 23:36
+
Enis Söztutar 2013-03-08, 00:11
Copy link to this message
-
Re: reason to do major compaction after split
> also, if instead of files you think about handling blocks directly you
can end up doing more stuff, like a proper compaction that require less I/O
if N blocks are not changed, some crazy deduplication on tables with same
content & similar...

Sounds like a step toward using a block pool directly and avoiding the
filesystem layer (Hadoop 2+).
On Fri, Mar 8, 2013 at 7:36 AM, Matteo Bertozzi <[EMAIL PROTECTED]>wrote:

> sure having the hardlink support
> (HDFS-3370<https://issues.apache.org/jira/browse/HDFS-3370>)
> solve the HFileLink hack
> but you still need to add extra metadata for splits (reference files)
>
> also, if instead of files you think about handling blocks directly
> you can end up doing more stuff, like a proper compaction that
> require less I/O if N blocks are not changed, some crazy deduplication
> on tables with same content & similar...
>
> On Thu, Mar 7, 2013 at 11:22 PM, Sergey Shelukhin <[EMAIL PROTECTED]
> >wrote:
>
> > Hmm... ranges sounds good, but for files, it would be nice if there were
> a
> > hardlink mechanism.
> > It should be trivial to do in HDFS if blocks could belong to several
> files.
> > Then we don't have to have private cleanup code.
> >
> > On Thu, Mar 7, 2013 at 2:28 PM, Matteo Bertozzi <[EMAIL PROTECTED]
> > >wrote:
> >
> > > This is seems to going in a super messy direction.
> > > With HBASE-7806 the ideas was to cleanup all this crazy stuff
> (HFileLink,
> > > References, ...)
> > >
> > > unfortunately the initial decision of tight together the fs layout
> > > and the tables/regions/families is bringing to all this workaround to
> > have
> > > something cool.
> > >
> > > If you put the files in one place, and the association in another  you
> > can
> > > avoid all this complexity.
> > >
> > > /hbase/data/[file1, file 2, file 3, file N]
> > >
> > > table 1/region 1: [file 2]
> > > table 1/region 2: [file 1 (from 0 to 50)]
> > > table 1/region 3: [file 1 (from 50 to 100)]
> > > table 2/region 1: [file 1, file 2]
> > >
> > > On Thu, Mar 7, 2013 at 10:13 PM, Stack <[EMAIL PROTECTED]> wrote:
> > >
> > > > Yes.  That is a few trips to the NN listing directory contents and
> then
> > > > some edits/reading of .META.  We would have to introduce a
> QuarterHFile
> > > to
> > > > go with our HalfHFile (or rename HalfHFile as PieceO'HFile).
> > > >
> > > >
> > > > St.Ack
> > > >
> > >
> >
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
+
Sergey Shelukhin 2013-03-08, 19:32
+
Enis Söztutar 2013-03-08, 20:06
+
Sergey Shelukhin 2013-03-07, 23:20
+
Jean-Daniel Cryans 2013-03-07, 18:54