Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Is append allowed in HDFS?


Copy link to this message
-
Re: Is append allowed in HDFS?
This isn't possible presently. If you close the open file stream for a
sequence file, you're done with it. I'd advise not to close it and use
hflush instead, much like a WAL. Close it only when you're done with
some threshold, and open a new file. The hflush (or sync in 1.x) will
ensure that the latest additions are available for immediate reads (to
all new readers).

The patch at https://issues.apache.org/jira/browse/HADOOP-7139 will
help solve this limitation though. Its under review and needs some
further work.

On Tue, Apr 24, 2012 at 6:47 PM, Florin P <[EMAIL PROTECTED]> wrote:
> Hello!
>   Thank you for your responses. I've read in this posts
> http://stackoverflow.com/questions/5598400/hdfs-using-hdfs-api-to-append-to-a-sequencefile
> also
> https://issues.apache.org/jira/browse/HADOOP-3977
>
> that you cannot add new fresh data in an existing SequenceFile. So,
> basically, you have the scenario:
> 1. Writing to a SequenceFile
> 2. Close the file
> 2. Reopen the written file
> 3. Add new fresh data to it
> 4. Close the file
> At the end you'll have the old data plus new added data. Can you have an
> example (code) how you can achieve this scenario with the API? Please
> specify which version you're using.
>
> Thank you.
>
> Regards,
>   Florin
>
> ________________________________
> From: Ioan Eugen Stan <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Florin P <[EMAIL PROTECTED]>
> Sent: Friday, April 13, 2012 1:23 PM
>
> Subject: Re: Is append allowed in HDFS?
>
> 2012/4/13 Florin P <[EMAIL PROTECTED]>:
>> Hello!
>>  Thank you all for all responses. It is possible to have a matrix of
>> hadoop
>> file input format that supports append or if I understood correctly, all
>> formats are now supporting append?
>> Thanks a lot.
>>   Regards,
>>  Florin
>
> Hi Florin,
>
> Append is a file-system feature not a file format feature although
> some file formats are designed to be immutable (MapFile, HFile). You
> can append to them, just don't use the interface they normally
> provide.
>
>> ________________________________
>> From: Inder Pall <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Tuesday, April 10, 2012 8:12 AM
>> Subject: Re: Is append allowed in HDFS?
>>
>> Harsh,
>>
>> idea is to call sync for a configured batch. Still under implementation as
>> other parts of the system's aren't complete.
>>
>> recovery/resume-from-errors-at-DN code around general tail-like
>>>>This sounds promising, can you please shed some more light on this.
>>
>> - inder
>> On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Your approach looks fine to me. I'd throw in some
>> recovery/resume-from-errors-at-DN code around general tail-like
>> consumption but I think you may have already done that :)
>>
>> But just for my curiosity - do you call sync for every record/unit or
>> batch it by a few, for your problem?
>>
>> On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <[EMAIL PROTECTED]> wrote:
>>> Yes makes sense. My use-case is more like a producer/consumer and
>>> consumer
>>> trying to stream data as it arrives.
>>> Has anyone hit this before and if so resolved it in a better way.
>>>
>>> Apologies, if i am digressing from the subject of this thread however
>>> seems
>>> to land in the bucket of append support in HDFS.
>>>
>>> - Inder
>>>
>>>
>>> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Inder,
>>>>
>>>> Yes, that is a requirement for readers of sync-ing data. The new meta
>>>> entries can only be read by new readers. The read code would end up
>>>> being exactly like the implementation for method "fs -tail" at
>>>>
>>>>
>>>>
>>>>
>>>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup
>>>> (Line 1101)
>>>>
>>>> HBase does not read the WAL (HLog) continuously/vigorously as it
>>>> syncs, by the way. It only reads the them when a specific request is
>>>> made (for splitting, replaying and debug-printing).

Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB