Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Is append allowed in HDFS?


Copy link to this message
-
Re: Is append allowed in HDFS?
This isn't possible presently. If you close the open file stream for a
sequence file, you're done with it. I'd advise not to close it and use
hflush instead, much like a WAL. Close it only when you're done with
some threshold, and open a new file. The hflush (or sync in 1.x) will
ensure that the latest additions are available for immediate reads (to
all new readers).

The patch at https://issues.apache.org/jira/browse/HADOOP-7139 will
help solve this limitation though. Its under review and needs some
further work.

On Tue, Apr 24, 2012 at 6:47 PM, Florin P <[EMAIL PROTECTED]> wrote:
> Hello!
>   Thank you for your responses. I've read in this posts
> http://stackoverflow.com/questions/5598400/hdfs-using-hdfs-api-to-append-to-a-sequencefile
> also
> https://issues.apache.org/jira/browse/HADOOP-3977
>
> that you cannot add new fresh data in an existing SequenceFile. So,
> basically, you have the scenario:
> 1. Writing to a SequenceFile
> 2. Close the file
> 2. Reopen the written file
> 3. Add new fresh data to it
> 4. Close the file
> At the end you'll have the old data plus new added data. Can you have an
> example (code) how you can achieve this scenario with the API? Please
> specify which version you're using.
>
> Thank you.
>
> Regards,
>   Florin
>
> ________________________________
> From: Ioan Eugen Stan <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Florin P <[EMAIL PROTECTED]>
> Sent: Friday, April 13, 2012 1:23 PM
>
> Subject: Re: Is append allowed in HDFS?
>
> 2012/4/13 Florin P <[EMAIL PROTECTED]>:
>> Hello!
>>  Thank you all for all responses. It is possible to have a matrix of
>> hadoop
>> file input format that supports append or if I understood correctly, all
>> formats are now supporting append?
>> Thanks a lot.
>>   Regards,
>>  Florin
>
> Hi Florin,
>
> Append is a file-system feature not a file format feature although
> some file formats are designed to be immutable (MapFile, HFile). You
> can append to them, just don't use the interface they normally
> provide.
>
>> ________________________________
>> From: Inder Pall <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Tuesday, April 10, 2012 8:12 AM
>> Subject: Re: Is append allowed in HDFS?
>>
>> Harsh,
>>
>> idea is to call sync for a configured batch. Still under implementation as
>> other parts of the system's aren't complete.
>>
>> recovery/resume-from-errors-at-DN code around general tail-like
>>>>This sounds promising, can you please shed some more light on this.
>>
>> - inder
>> On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Your approach looks fine to me. I'd throw in some
>> recovery/resume-from-errors-at-DN code around general tail-like
>> consumption but I think you may have already done that :)
>>
>> But just for my curiosity - do you call sync for every record/unit or
>> batch it by a few, for your problem?
>>
>> On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <[EMAIL PROTECTED]> wrote:
>>> Yes makes sense. My use-case is more like a producer/consumer and
>>> consumer
>>> trying to stream data as it arrives.
>>> Has anyone hit this before and if so resolved it in a better way.
>>>
>>> Apologies, if i am digressing from the subject of this thread however
>>> seems
>>> to land in the bucket of append support in HDFS.
>>>
>>> - Inder
>>>
>>>
>>> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Inder,
>>>>
>>>> Yes, that is a requirement for readers of sync-ing data. The new meta
>>>> entries can only be read by new readers. The read code would end up
>>>> being exactly like the implementation for method "fs -tail" at
>>>>
>>>>
>>>>
>>>>
>>>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup
>>>> (Line 1101)
>>>>
>>>> HBase does not read the WAL (HLog) continuously/vigorously as it
>>>> syncs, by the way. It only reads the them when a specific request is
>>>> made (for splitting, replaying and debug-printing).

Harsh J