Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Problem In Understanding Compaction Process


Copy link to this message
-
Re: Problem In Understanding Compaction Process
Thanks Sergey
In my use case. I want to directly analyze the underlying HFiles, So i
can't tolerance duplicate data.

Can you give me some pointers about how to make this procedure atomic?

On Thu, Feb 21, 2013 at 6:07 AM, Sergey Shelukhin <[EMAIL PROTECTED]>wrote:

> There should be no duplicate records despite the file not being deleted -
> between the records with exact same key/version/etc., the newer file would
> be chosen by logical sequence. If that happens to be the same some choice
> (by time, or name), still one file will be chosen.
> Eventually, the file will be compacted again and disappear. Granted, by
> making the move atomic (via some meta/manifest file) we could avoid some
> overhead in this case at the cost of some added complexity, but it should
> be rather rare.
>
> On Tue, Feb 19, 2013 at 7:10 PM, Anty <[EMAIL PROTECTED]> wrote:
>
> > Hi: Guys
> >
> >       I have some problem in understanding the compaction process, Can
> > someone shed some light on me, much appreciate. Here is the problem:
> >
> >       Region Server after successfully generate the final compacted file,
> > it going through two steps:
> >        1. move the above compacted file into region's directory
> >        2. delete replaced files.
> >
> >        the above two steps are not atomic, if Region Server crash after
> > step1, and  before step2, then there are duplication records!  Is this
> > problem handled  in reading process , or there is another mechanism to
> fix
> > this?
> >
> > --
> > Best Regards
> > Anty Rao
> >
>

--
Best Regards
Anty Rao