In my use case. I want to directly analyze the underlying HFiles, So i
can't tolerance duplicate data.
Can you give me some pointers about how to make this procedure atomic?
On Thu, Feb 21, 2013 at 6:07 AM, Sergey Shelukhin <[EMAIL PROTECTED]>wrote:
> There should be no duplicate records despite the file not being deleted -
> between the records with exact same key/version/etc., the newer file would
> be chosen by logical sequence. If that happens to be the same some choice
> (by time, or name), still one file will be chosen.
> Eventually, the file will be compacted again and disappear. Granted, by
> making the move atomic (via some meta/manifest file) we could avoid some
> overhead in this case at the cost of some added complexity, but it should
> be rather rare.
> On Tue, Feb 19, 2013 at 7:10 PM, Anty <[EMAIL PROTECTED]> wrote:
> > Hi: Guys
> > I have some problem in understanding the compaction process, Can
> > someone shed some light on me, much appreciate. Here is the problem:
> > Region Server after successfully generate the final compacted file,
> > it going through two steps:
> > 1. move the above compacted file into region's directory
> > 2. delete replaced files.
> > the above two steps are not atomic, if Region Server crash after
> > step1, and before step2, then there are duplication records! Is this
> > problem handled in reading process , or there is another mechanism to
> > this?
> > --
> > Best Regards
> > Anty Rao