|
Sergey Shelukhin
2013-03-07, 18:50
Stack
2013-03-07, 18:58
Enis Söztutar
2013-03-07, 19:03
Sergey Shelukhin
2013-03-07, 20:58
Enis Söztutar
2013-03-07, 21:14
Stack
2013-03-07, 22:13
Matteo Bertozzi
2013-03-07, 22:28
Stack
2013-03-07, 22:56
Matteo Bertozzi
2013-03-07, 23:09
Sergey Shelukhin
2013-03-07, 23:22
Matteo Bertozzi
2013-03-07, 23:36
Enis Söztutar
2013-03-08, 00:11
Andrew Purtell
2013-03-08, 01:42
Sergey Shelukhin
2013-03-08, 19:32
Enis Söztutar
2013-03-08, 20:06
Sergey Shelukhin
2013-03-07, 23:20
Jean-Daniel Cryans
2013-03-07, 18:54
|
-
reason to do major compaction after splitSergey Shelukhin 2013-03-07, 18:50
Hi.
Is there a reason to do major compaction after split, instead of allowing the reference files to go away gradually as the normal compactions happen? I could think up two reasons - region with reference files currently cannot be split again (not clear why not though, could just create more references); and avoiding load on the same datanodes from both new regions. Are there some other reasons? +
Sergey Shelukhin 2013-03-07, 18:50
-
Re: reason to do major compaction after splitStack 2013-03-07, 18:58
On Thu, Mar 7, 2013 at 10:50 AM, Sergey Shelukhin <[EMAIL PROTECTED]>wrote:
> Hi. > Is there a reason to do major compaction after split, instead of allowing > the reference files to go away gradually as the normal compactions happen? > I could think up two reasons - region with reference files currently cannot > be split again (not clear why not though, could just create more > references); and avoiding load on the same datanodes from both new regions. > Are there some other reasons? > We could do references to references but was afraid the linkage would be too fragile and would break in hard-to-trace ways. St.Ack +
Stack 2013-03-07, 18:58
-
Re: reason to do major compaction after splitEnis Söztutar 2013-03-07, 19:03
I was thinking of allowing regions with refs to split again, but the
cleaning parent logic will get messy a lot. Enis On Thu, Mar 7, 2013 at 10:58 AM, Stack <[EMAIL PROTECTED]> wrote: > On Thu, Mar 7, 2013 at 10:50 AM, Sergey Shelukhin <[EMAIL PROTECTED] > >wrote: > > > Hi. > > Is there a reason to do major compaction after split, instead of allowing > > the reference files to go away gradually as the normal compactions > happen? > > I could think up two reasons - region with reference files currently > cannot > > be split again (not clear why not though, could just create more > > references); and avoiding load on the same datanodes from both new > regions. > > Are there some other reasons? > > > > > We could do references to references but was afraid the linkage would be > too fragile and would break in hard-to-trace ways. > St.Ack > +
Enis Söztutar 2013-03-07, 19:03
-
Re: reason to do major compaction after splitSergey Shelukhin 2013-03-07, 20:58
Can you create same-level references instead of references to references?
On Thu, Mar 7, 2013 at 11:03 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote: > I was thinking of allowing regions with refs to split again, but the > cleaning parent logic will get messy a lot. > > Enis > > > On Thu, Mar 7, 2013 at 10:58 AM, Stack <[EMAIL PROTECTED]> wrote: > > > On Thu, Mar 7, 2013 at 10:50 AM, Sergey Shelukhin < > [EMAIL PROTECTED] > > >wrote: > > > > > Hi. > > > Is there a reason to do major compaction after split, instead of > allowing > > > the reference files to go away gradually as the normal compactions > > happen? > > > I could think up two reasons - region with reference files currently > > cannot > > > be split again (not clear why not though, could just create more > > > references); and avoiding load on the same datanodes from both new > > regions. > > > Are there some other reasons? > > > > > > > > > We could do references to references but was afraid the linkage would be > > too fragile and would break in hard-to-trace ways. > > St.Ack > > > +
Sergey Shelukhin 2013-03-07, 20:58
-
Re: reason to do major compaction after splitEnis Söztutar 2013-03-07, 21:14
We do not have to created references to references. We can find the
original file, and directly create a ref at the grand daughters. The messy part, is in the cleanup for parent region, where we have to recursively search for all successors to decide whether we can delete this region, and delete the hfile. Enis On Thu, Mar 7, 2013 at 12:58 PM, Sergey Shelukhin <[EMAIL PROTECTED]>wrote: > Can you create same-level references instead of references to references? > > On Thu, Mar 7, 2013 at 11:03 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote: > > > I was thinking of allowing regions with refs to split again, but the > > cleaning parent logic will get messy a lot. > > > > Enis > > > > > > On Thu, Mar 7, 2013 at 10:58 AM, Stack <[EMAIL PROTECTED]> wrote: > > > > > On Thu, Mar 7, 2013 at 10:50 AM, Sergey Shelukhin < > > [EMAIL PROTECTED] > > > >wrote: > > > > > > > Hi. > > > > Is there a reason to do major compaction after split, instead of > > allowing > > > > the reference files to go away gradually as the normal compactions > > > happen? > > > > I could think up two reasons - region with reference files currently > > > cannot > > > > be split again (not clear why not though, could just create more > > > > references); and avoiding load on the same datanodes from both new > > > regions. > > > > Are there some other reasons? > > > > > > > > > > > > > We could do references to references but was afraid the linkage would > be > > > too fragile and would break in hard-to-trace ways. > > > St.Ack > > > > > > +
Enis Söztutar 2013-03-07, 21:14
-
Re: reason to do major compaction after splitStack 2013-03-07, 22:13
On Thu, Mar 7, 2013 at 1:14 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:
> We do not have to created references to references. We can find the > original file, and directly create a ref at the grand daughters. The messy > part, is in the cleanup for parent region, where we have to recursively > search for all successors to decide whether we can delete this region, and > delete the hfile. > Yes. That is a few trips to the NN listing directory contents and then some edits/reading of .META. We would have to introduce a QuarterHFile to go with our HalfHFile (or rename HalfHFile as PieceO'HFile). St.Ack +
Stack 2013-03-07, 22:13
-
Re: reason to do major compaction after splitMatteo Bertozzi 2013-03-07, 22:28
This is seems to going in a super messy direction.
With HBASE-7806 the ideas was to cleanup all this crazy stuff (HFileLink, References, ...) unfortunately the initial decision of tight together the fs layout and the tables/regions/families is bringing to all this workaround to have something cool. If you put the files in one place, and the association in another you can avoid all this complexity. /hbase/data/[file1, file 2, file 3, file N] table 1/region 1: [file 2] table 1/region 2: [file 1 (from 0 to 50)] table 1/region 3: [file 1 (from 50 to 100)] table 2/region 1: [file 1, file 2] On Thu, Mar 7, 2013 at 10:13 PM, Stack <[EMAIL PROTECTED]> wrote: > Yes. That is a few trips to the NN listing directory contents and then > some edits/reading of .META. We would have to introduce a QuarterHFile to > go with our HalfHFile (or rename HalfHFile as PieceO'HFile). > > > St.Ack > +
Matteo Bertozzi 2013-03-07, 22:28
-
Re: reason to do major compaction after splitStack 2013-03-07, 22:56
On Thu, Mar 7, 2013 at 2:28 PM, Matteo Bertozzi <[EMAIL PROTECTED]>wrote:
> This is seems to going in a super messy direction. > Smile. I was thinking you'd show up on this thread. Agree. > With HBASE-7806 the ideas was to cleanup all this crazy stuff (HFileLink, > References, ...) > > unfortunately the initial decision of tight together the fs layout > and the tables/regions/families is bringing to all this workaround to have > something cool. > > If you put the files in one place, and the association in another you can > avoid all this complexity. > > /hbase/data/[file1, file 2, file 3, file N] > > table 1/region 1: [file 2] > table 1/region 2: [file 1 (from 0 to 50)] > table 1/region 3: [file 1 (from 50 to 100)] > table 2/region 1: [file 1, file 2] > > Any ideas on what migration from current format to the above would be like Matteo? We'd read current layout, use it to populate a files table, new files would be written to a the new /hbase/data/ dir, and for a while we'd span the old and new locations? St.Ack +
Stack 2013-03-07, 22:56
-
Re: reason to do major compaction after splitMatteo Bertozzi 2013-03-07, 23:09
On Thu, Mar 7, 2013 at 10:56 PM, Stack <[EMAIL PROTECTED]> wrote:
> Any ideas on what migration from current format to the above would be like > Matteo? We'd read current layout, use it to populate a files table, new > files would be written to a the new /hbase/data/ dir, and for a while we'd > span the old and new locations? > If you have the possibility to shutdown the whole cluster, the way is easy move all the hfiles in /hbase/data and populate the "files table". If you can't, you just have to keep the current code able to been able read the current fs layout and archiving if there's something in that directory reads from that as today if not goes to the file table. on write (flush compactions) adds the new file to the "files table" and /hbase/data +
Matteo Bertozzi 2013-03-07, 23:09
-
Re: reason to do major compaction after splitSergey Shelukhin 2013-03-07, 23:22
Hmm... ranges sounds good, but for files, it would be nice if there were a
hardlink mechanism. It should be trivial to do in HDFS if blocks could belong to several files. Then we don't have to have private cleanup code. On Thu, Mar 7, 2013 at 2:28 PM, Matteo Bertozzi <[EMAIL PROTECTED]>wrote: > This is seems to going in a super messy direction. > With HBASE-7806 the ideas was to cleanup all this crazy stuff (HFileLink, > References, ...) > > unfortunately the initial decision of tight together the fs layout > and the tables/regions/families is bringing to all this workaround to have > something cool. > > If you put the files in one place, and the association in another you can > avoid all this complexity. > > /hbase/data/[file1, file 2, file 3, file N] > > table 1/region 1: [file 2] > table 1/region 2: [file 1 (from 0 to 50)] > table 1/region 3: [file 1 (from 50 to 100)] > table 2/region 1: [file 1, file 2] > > On Thu, Mar 7, 2013 at 10:13 PM, Stack <[EMAIL PROTECTED]> wrote: > > > Yes. That is a few trips to the NN listing directory contents and then > > some edits/reading of .META. We would have to introduce a QuarterHFile > to > > go with our HalfHFile (or rename HalfHFile as PieceO'HFile). > > > > > > St.Ack > > > +
Sergey Shelukhin 2013-03-07, 23:22
-
Re: reason to do major compaction after splitMatteo Bertozzi 2013-03-07, 23:36
sure having the hardlink support
(HDFS-3370<https://issues.apache.org/jira/browse/HDFS-3370>) solve the HFileLink hack but you still need to add extra metadata for splits (reference files) also, if instead of files you think about handling blocks directly you can end up doing more stuff, like a proper compaction that require less I/O if N blocks are not changed, some crazy deduplication on tables with same content & similar... On Thu, Mar 7, 2013 at 11:22 PM, Sergey Shelukhin <[EMAIL PROTECTED]>wrote: > Hmm... ranges sounds good, but for files, it would be nice if there were a > hardlink mechanism. > It should be trivial to do in HDFS if blocks could belong to several files. > Then we don't have to have private cleanup code. > > On Thu, Mar 7, 2013 at 2:28 PM, Matteo Bertozzi <[EMAIL PROTECTED] > >wrote: > > > This is seems to going in a super messy direction. > > With HBASE-7806 the ideas was to cleanup all this crazy stuff (HFileLink, > > References, ...) > > > > unfortunately the initial decision of tight together the fs layout > > and the tables/regions/families is bringing to all this workaround to > have > > something cool. > > > > If you put the files in one place, and the association in another you > can > > avoid all this complexity. > > > > /hbase/data/[file1, file 2, file 3, file N] > > > > table 1/region 1: [file 2] > > table 1/region 2: [file 1 (from 0 to 50)] > > table 1/region 3: [file 1 (from 50 to 100)] > > table 2/region 1: [file 1, file 2] > > > > On Thu, Mar 7, 2013 at 10:13 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > Yes. That is a few trips to the NN listing directory contents and then > > > some edits/reading of .META. We would have to introduce a QuarterHFile > > to > > > go with our HalfHFile (or rename HalfHFile as PieceO'HFile). > > > > > > > > > St.Ack > > > > > > +
Matteo Bertozzi 2013-03-07, 23:36
-
Re: reason to do major compaction after splitEnis Söztutar 2013-03-08, 00:11
> /hbase/data/[file1, file 2, file 3, file N]
> > table 1/region 1: [file 2] > table 1/region 2: [file 1 (from 0 to 50)] > table 1/region 3: [file 1 (from 50 to 100)] > table 2/region 1: [file 1, file 2] We do not necessarily have to have a separate dir for files. We can just keep the files in the region dir, until no more references. The problem comes from the fact that we rely on hdfs ls for regions rather than META being the one and only authoritative source. Enis On Thu, Mar 7, 2013 at 3:36 PM, Matteo Bertozzi <[EMAIL PROTECTED]>wrote: > sure having the hardlink support > (HDFS-3370<https://issues.apache.org/jira/browse/HDFS-3370>) > solve the HFileLink hack > but you still need to add extra metadata for splits (reference files) > > also, if instead of files you think about handling blocks directly > you can end up doing more stuff, like a proper compaction that > require less I/O if N blocks are not changed, some crazy deduplication > on tables with same content & similar... > > On Thu, Mar 7, 2013 at 11:22 PM, Sergey Shelukhin <[EMAIL PROTECTED] > >wrote: > > > Hmm... ranges sounds good, but for files, it would be nice if there were > a > > hardlink mechanism. > > It should be trivial to do in HDFS if blocks could belong to several > files. > > Then we don't have to have private cleanup code. > > > > On Thu, Mar 7, 2013 at 2:28 PM, Matteo Bertozzi <[EMAIL PROTECTED] > > >wrote: > > > > > This is seems to going in a super messy direction. > > > With HBASE-7806 the ideas was to cleanup all this crazy stuff > (HFileLink, > > > References, ...) > > > > > > unfortunately the initial decision of tight together the fs layout > > > and the tables/regions/families is bringing to all this workaround to > > have > > > something cool. > > > > > > If you put the files in one place, and the association in another you > > can > > > avoid all this complexity. > > > > > > /hbase/data/[file1, file 2, file 3, file N] > > > > > > table 1/region 1: [file 2] > > > table 1/region 2: [file 1 (from 0 to 50)] > > > table 1/region 3: [file 1 (from 50 to 100)] > > > table 2/region 1: [file 1, file 2] > > > > > > On Thu, Mar 7, 2013 at 10:13 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > > > Yes. That is a few trips to the NN listing directory contents and > then > > > > some edits/reading of .META. We would have to introduce a > QuarterHFile > > > to > > > > go with our HalfHFile (or rename HalfHFile as PieceO'HFile). > > > > > > > > > > > > St.Ack > > > > > > > > > > +
Enis Söztutar 2013-03-08, 00:11
-
Re: reason to do major compaction after splitAndrew Purtell 2013-03-08, 01:42
> also, if instead of files you think about handling blocks directly you
can end up doing more stuff, like a proper compaction that require less I/O if N blocks are not changed, some crazy deduplication on tables with same content & similar... Sounds like a step toward using a block pool directly and avoiding the filesystem layer (Hadoop 2+). On Fri, Mar 8, 2013 at 7:36 AM, Matteo Bertozzi <[EMAIL PROTECTED]>wrote: > sure having the hardlink support > (HDFS-3370<https://issues.apache.org/jira/browse/HDFS-3370>) > solve the HFileLink hack > but you still need to add extra metadata for splits (reference files) > > also, if instead of files you think about handling blocks directly > you can end up doing more stuff, like a proper compaction that > require less I/O if N blocks are not changed, some crazy deduplication > on tables with same content & similar... > > On Thu, Mar 7, 2013 at 11:22 PM, Sergey Shelukhin <[EMAIL PROTECTED] > >wrote: > > > Hmm... ranges sounds good, but for files, it would be nice if there were > a > > hardlink mechanism. > > It should be trivial to do in HDFS if blocks could belong to several > files. > > Then we don't have to have private cleanup code. > > > > On Thu, Mar 7, 2013 at 2:28 PM, Matteo Bertozzi <[EMAIL PROTECTED] > > >wrote: > > > > > This is seems to going in a super messy direction. > > > With HBASE-7806 the ideas was to cleanup all this crazy stuff > (HFileLink, > > > References, ...) > > > > > > unfortunately the initial decision of tight together the fs layout > > > and the tables/regions/families is bringing to all this workaround to > > have > > > something cool. > > > > > > If you put the files in one place, and the association in another you > > can > > > avoid all this complexity. > > > > > > /hbase/data/[file1, file 2, file 3, file N] > > > > > > table 1/region 1: [file 2] > > > table 1/region 2: [file 1 (from 0 to 50)] > > > table 1/region 3: [file 1 (from 50 to 100)] > > > table 2/region 1: [file 1, file 2] > > > > > > On Thu, Mar 7, 2013 at 10:13 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > > > Yes. That is a few trips to the NN listing directory contents and > then > > > > some edits/reading of .META. We would have to introduce a > QuarterHFile > > > to > > > > go with our HalfHFile (or rename HalfHFile as PieceO'HFile). > > > > > > > > > > > > St.Ack > > > > > > > > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) +
Andrew Purtell 2013-03-08, 01:42
-
Re: reason to do major compaction after splitSergey Shelukhin 2013-03-08, 19:32
+1.
That gives us a lot of freedom to do stuff in many scenarios. On Thu, Mar 7, 2013 at 5:42 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > > also, if instead of files you think about handling blocks directly you > can end up doing more stuff, like a proper compaction that require less I/O > if N blocks are not changed, some crazy deduplication on tables with same > content & similar... > > Sounds like a step toward using a block pool directly and avoiding the > filesystem layer (Hadoop 2+). > > > On Fri, Mar 8, 2013 at 7:36 AM, Matteo Bertozzi <[EMAIL PROTECTED] > >wrote: > > > sure having the hardlink support > > (HDFS-3370<https://issues.apache.org/jira/browse/HDFS-3370>) > > solve the HFileLink hack > > but you still need to add extra metadata for splits (reference files) > > > > also, if instead of files you think about handling blocks directly > > you can end up doing more stuff, like a proper compaction that > > require less I/O if N blocks are not changed, some crazy deduplication > > on tables with same content & similar... > > > > On Thu, Mar 7, 2013 at 11:22 PM, Sergey Shelukhin < > [EMAIL PROTECTED] > > >wrote: > > > > > Hmm... ranges sounds good, but for files, it would be nice if there > were > > a > > > hardlink mechanism. > > > It should be trivial to do in HDFS if blocks could belong to several > > files. > > > Then we don't have to have private cleanup code. > > > > > > On Thu, Mar 7, 2013 at 2:28 PM, Matteo Bertozzi < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > This is seems to going in a super messy direction. > > > > With HBASE-7806 the ideas was to cleanup all this crazy stuff > > (HFileLink, > > > > References, ...) > > > > > > > > unfortunately the initial decision of tight together the fs layout > > > > and the tables/regions/families is bringing to all this workaround to > > > have > > > > something cool. > > > > > > > > If you put the files in one place, and the association in another > you > > > can > > > > avoid all this complexity. > > > > > > > > /hbase/data/[file1, file 2, file 3, file N] > > > > > > > > table 1/region 1: [file 2] > > > > table 1/region 2: [file 1 (from 0 to 50)] > > > > table 1/region 3: [file 1 (from 50 to 100)] > > > > table 2/region 1: [file 1, file 2] > > > > > > > > On Thu, Mar 7, 2013 at 10:13 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > > > > > Yes. That is a few trips to the NN listing directory contents and > > then > > > > > some edits/reading of .META. We would have to introduce a > > QuarterHFile > > > > to > > > > > go with our HalfHFile (or rename HalfHFile as PieceO'HFile). > > > > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > +
Sergey Shelukhin 2013-03-08, 19:32
-
Re: reason to do major compaction after splitEnis Söztutar 2013-03-08, 20:06
> Sounds like a step toward using a block pool directly and avoiding the
filesystem layer (Hadoop 2+). This has come up previously. With federation, we should be able to embed NN as a first cut, and own all the blocks in the hbase namespace. Enis On Fri, Mar 8, 2013 at 11:32 AM, Sergey Shelukhin <[EMAIL PROTECTED]>wrote: > +1. > That gives us a lot of freedom to do stuff in many scenarios. > > On Thu, Mar 7, 2013 at 5:42 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > > > also, if instead of files you think about handling blocks directly you > > can end up doing more stuff, like a proper compaction that require less > I/O > > if N blocks are not changed, some crazy deduplication on tables with same > > content & similar... > > > > Sounds like a step toward using a block pool directly and avoiding the > > filesystem layer (Hadoop 2+). > > > > > > On Fri, Mar 8, 2013 at 7:36 AM, Matteo Bertozzi <[EMAIL PROTECTED] > > >wrote: > > > > > sure having the hardlink support > > > (HDFS-3370<https://issues.apache.org/jira/browse/HDFS-3370>) > > > solve the HFileLink hack > > > but you still need to add extra metadata for splits (reference files) > > > > > > also, if instead of files you think about handling blocks directly > > > you can end up doing more stuff, like a proper compaction that > > > require less I/O if N blocks are not changed, some crazy deduplication > > > on tables with same content & similar... > > > > > > On Thu, Mar 7, 2013 at 11:22 PM, Sergey Shelukhin < > > [EMAIL PROTECTED] > > > >wrote: > > > > > > > Hmm... ranges sounds good, but for files, it would be nice if there > > were > > > a > > > > hardlink mechanism. > > > > It should be trivial to do in HDFS if blocks could belong to several > > > files. > > > > Then we don't have to have private cleanup code. > > > > > > > > On Thu, Mar 7, 2013 at 2:28 PM, Matteo Bertozzi < > > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > This is seems to going in a super messy direction. > > > > > With HBASE-7806 the ideas was to cleanup all this crazy stuff > > > (HFileLink, > > > > > References, ...) > > > > > > > > > > unfortunately the initial decision of tight together the fs layout > > > > > and the tables/regions/families is bringing to all this workaround > to > > > > have > > > > > something cool. > > > > > > > > > > If you put the files in one place, and the association in another > > you > > > > can > > > > > avoid all this complexity. > > > > > > > > > > /hbase/data/[file1, file 2, file 3, file N] > > > > > > > > > > table 1/region 1: [file 2] > > > > > table 1/region 2: [file 1 (from 0 to 50)] > > > > > table 1/region 3: [file 1 (from 50 to 100)] > > > > > table 2/region 1: [file 1, file 2] > > > > > > > > > > On Thu, Mar 7, 2013 at 10:13 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > Yes. That is a few trips to the NN listing directory contents > and > > > then > > > > > > some edits/reading of .META. We would have to introduce a > > > QuarterHFile > > > > > to > > > > > > go with our HalfHFile (or rename HalfHFile as PieceO'HFile). > > > > > > > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > +
Enis Söztutar 2013-03-08, 20:06
-
Re: reason to do major compaction after splitSergey Shelukhin 2013-03-07, 23:20
Hmm... should we have hardlinks (or use HDFS hardlinks if any?) to solve
this problem. HalfHFile could be HFileWithRange :) On Thu, Mar 7, 2013 at 2:13 PM, Stack <[EMAIL PROTECTED]> wrote: > On Thu, Mar 7, 2013 at 1:14 PM, Enis Söztutar <[EMAIL PROTECTED]> > wrote: > > > We do not have to created references to references. We can find the > > original file, and directly create a ref at the grand daughters. The > messy > > part, is in the cleanup for parent region, where we have to recursively > > search for all successors to decide whether we can delete this region, > and > > delete the hfile. > > > > Yes. That is a few trips to the NN listing directory contents and then > some edits/reading of .META. We would have to introduce a QuarterHFile to > go with our HalfHFile (or rename HalfHFile as PieceO'HFile). > > > St.Ack > +
Sergey Shelukhin 2013-03-07, 23:20
-
Re: reason to do major compaction after splitJean-Daniel Cryans 2013-03-07, 18:54
Clean the parent would be another one.
J-D On Thu, Mar 7, 2013 at 10:50 AM, Sergey Shelukhin <[EMAIL PROTECTED]> wrote: > Hi. > Is there a reason to do major compaction after split, instead of allowing > the reference files to go away gradually as the normal compactions happen? > I could think up two reasons - region with reference files currently cannot > be split again (not clear why not though, could just create more > references); and avoiding load on the same datanodes from both new regions. > Are there some other reasons? +
Jean-Daniel Cryans 2013-03-07, 18:54
|