|
Florin P
2012-04-07, 18:19
Ioan Eugen Stan
2012-04-09, 10:15
Harsh J
2012-04-09, 10:53
Inder Pall
2012-04-09, 12:35
Harsh J
2012-04-09, 12:57
Inder Pall
2012-04-09, 17:04
Harsh J
2012-04-09, 19:37
Inder Pall
2012-04-10, 05:12
Florin P
2012-04-13, 09:56
Ioan Eugen Stan
2012-04-13, 10:23
Florin P
2012-04-24, 13:17
Harsh J
2012-04-24, 18:32
|
-
Is append allowed in HDFS?Florin P 2012-04-07, 18:19
Hello!
Just google it for supporting of append into HDFS files and the result: I'm puzzled. Can someone say: YES you can append in TextFile or SequenceFile or whatever format. If yes, in which version this feature is supported ? Also where can I find a good example of using the API? I know that is a long debate about this subject, but really it is challenge to find on the google the current status of this feature. I look forward for a trust source answer. Thank you, Regards, Florin
-
Re: Is append allowed in HDFS?Ioan Eugen Stan 2012-04-09, 10:15
2012/4/7 Florin P <[EMAIL PROTECTED]>:
> Hello! > Just google it for supporting of append into HDFS files and the result: > I'm puzzled. Can someone say: YES you can append in TextFile or SequenceFile > or whatever format. If yes, in which version this feature is supported ? > Also where can I find a good example of using the API? I know that is a long > debate about this subject, but really it is challenge to find on the google > the current status of this feature. > I look forward for a trust source answer. > Thank you, > Regards, > Florin Hi Florian, HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch). [1] http://hbase.apache.org/book/hadoop.html [2] http://hbase.apache.org/book/hadoop.html -- search for append in release notes Cheers, -- Ioan Eugen Stan http://ieugen.blogspot.com/
-
Re: Is append allowed in HDFS?Harsh J 2012-04-09, 10:53
I'd also like to note that there are some unresolved issues with the
append version in the 1.x (stable) line. Note that HBase's use of the 0.20-append branch features are limited to using "sync" calls alone (Described in p68 "Coherency Model", Chapter 3 (The Hadoop Distributed File System) in Hadoop: The Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening "append" calls. The latter is what is still with issues in the 1.x releases today. Using the former is alright if its done in the way similar to HBase's WAL (HLog) (or for similar needs). On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <[EMAIL PROTECTED]> wrote: > 2012/4/7 Florin P <[EMAIL PROTECTED]>: >> Hello! >> Just google it for supporting of append into HDFS files and the result: >> I'm puzzled. Can someone say: YES you can append in TextFile or SequenceFile >> or whatever format. If yes, in which version this feature is supported ? >> Also where can I find a good example of using the API? I know that is a long >> debate about this subject, but really it is challenge to find on the google >> the current status of this feature. >> I look forward for a trust source answer. >> Thank you, >> Regards, >> Florin > > Hi Florian, > > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch). > > [1] http://hbase.apache.org/book/hadoop.html > [2] http://hbase.apache.org/book/hadoop.html -- search for append in > release notes > > Cheers, > -- > Ioan Eugen Stan > http://ieugen.blogspot.com/ -- Harsh J
-
Re: Is append allowed in HDFS?Inder Pall 2012-04-09, 12:35
Based on what i have tried, after a sync you need to open a new Reader.
Please correct if that's not the write semantics. Thanks, - Inder On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <[EMAIL PROTECTED]> wrote: > I'd also like to note that there are some unresolved issues with the > append version in the 1.x (stable) line. > > Note that HBase's use of the 0.20-append branch features are limited > to using "sync" calls alone (Described in p68 "Coherency Model", > Chapter 3 (The Hadoop Distributed File System) in Hadoop: The > Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening > "append" calls. The latter is what is still with issues in the 1.x > releases today. Using the former is alright if its done in the way > similar to HBase's WAL (HLog) (or for similar needs). > > On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <[EMAIL PROTECTED]> > wrote: > > 2012/4/7 Florin P <[EMAIL PROTECTED]>: > >> Hello! > >> Just google it for supporting of append into HDFS files and the > result: > >> I'm puzzled. Can someone say: YES you can append in TextFile or > SequenceFile > >> or whatever format. If yes, in which version this feature is supported ? > >> Also where can I find a good example of using the API? I know that is a > long > >> debate about this subject, but really it is challenge to find on the > >> the current status of this feature. > >> I look forward for a trust source answer. > >> Thank you, > >> Regards, > >> Florin > > > > Hi Florian, > > > > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a > > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch). > > > > [1] http://hbase.apache.org/book/hadoop.html > > [2] http://hbase.apache.org/book/hadoop.html -- search for append in > > release notes > > > > Cheers, > > -- > > Ioan Eugen Stan > > http://ieugen.blogspot.com/ > > > > -- > Harsh J > -- Thanks, - Inder Tech Platforms @Inmobi Linkedin - http://goo.gl/eR4Ub
-
Re: Is append allowed in HDFS?Harsh J 2012-04-09, 12:57
Inder,
Yes, that is a requirement for readers of sync-ing data. The new meta entries can only be read by new readers. The read code would end up being exactly like the implementation for method "fs -tail" at http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup (Line 1101) HBase does not read the WAL (HLog) continuously/vigorously as it syncs, by the way. It only reads the them when a specific request is made (for splitting, replaying and debug-printing). On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <[EMAIL PROTECTED]> wrote: > Based on what i have tried, after a sync you need to open a new Reader. > Please correct if that's not the write semantics. > > Thanks, > - Inder > > > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> I'd also like to note that there are some unresolved issues with the >> append version in the 1.x (stable) line. >> >> Note that HBase's use of the 0.20-append branch features are limited >> to using "sync" calls alone (Described in p68 "Coherency Model", >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening >> "append" calls. The latter is what is still with issues in the 1.x >> releases today. Using the former is alright if its done in the way >> similar to HBase's WAL (HLog) (or for similar needs). >> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <[EMAIL PROTECTED]> >> wrote: >> > 2012/4/7 Florin P <[EMAIL PROTECTED]>: >> >> Hello! >> >> Just google it for supporting of append into HDFS files and the >> >> result: >> >> I'm puzzled. Can someone say: YES you can append in TextFile or >> >> SequenceFile >> >> or whatever format. If yes, in which version this feature is supported >> >> ? >> >> Also where can I find a good example of using the API? I know that is a >> >> long >> >> debate about this subject, but really it is challenge to find on the >> >> the current status of this feature. >> >> I look forward for a trust source answer. >> >> Thank you, >> >> Regards, >> >> Florin >> > >> > Hi Florian, >> > >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch). >> > >> > [1] http://hbase.apache.org/book/hadoop.html >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append in >> > release notes >> > >> > Cheers, >> > -- >> > Ioan Eugen Stan >> > http://ieugen.blogspot.com/ >> >> >> >> -- >> Harsh J > > > > > -- > Thanks, > - Inder > Tech Platforms @Inmobi > Linkedin - http://goo.gl/eR4Ub -- Harsh J
-
Re: Is append allowed in HDFS?Inder Pall 2012-04-09, 17:04
Yes makes sense. My use-case is more like a producer/consumer and consumer
trying to stream data as it arrives. Has anyone hit this before and if so resolved it in a better way. Apologies, if i am digressing from the subject of this thread however seems to land in the bucket of append support in HDFS. - Inder On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Inder, > > Yes, that is a requirement for readers of sync-ing data. The new meta > entries can only be read by new readers. The read code would end up > being exactly like the implementation for method "fs -tail" at > > http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup > (Line 1101) > > HBase does not read the WAL (HLog) continuously/vigorously as it > syncs, by the way. It only reads the them when a specific request is > made (for splitting, replaying and debug-printing). > > On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <[EMAIL PROTECTED]> wrote: > > Based on what i have tried, after a sync you need to open a new Reader. > > Please correct if that's not the write semantics. > > > > Thanks, > > - Inder > > > > > > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> > >> I'd also like to note that there are some unresolved issues with the > >> append version in the 1.x (stable) line. > >> > >> Note that HBase's use of the 0.20-append branch features are limited > >> to using "sync" calls alone (Described in p68 "Coherency Model", > >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The > >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening > >> "append" calls. The latter is what is still with issues in the 1.x > >> releases today. Using the former is alright if its done in the way > >> similar to HBase's WAL (HLog) (or for similar needs). > >> > >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <[EMAIL PROTECTED]> > >> wrote: > >> > 2012/4/7 Florin P <[EMAIL PROTECTED]>: > >> >> Hello! > >> >> Just google it for supporting of append into HDFS files and the > >> >> result: > >> >> I'm puzzled. Can someone say: YES you can append in TextFile or > >> >> SequenceFile > >> >> or whatever format. If yes, in which version this feature is > supported > >> >> ? > >> >> Also where can I find a good example of using the API? I know that > is a > >> >> long > >> >> debate about this subject, but really it is challenge to find on the > >> >> the current status of this feature. > >> >> I look forward for a trust source answer. > >> >> Thank you, > >> >> Regards, > >> >> Florin > >> > > >> > Hi Florian, > >> > > >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a > >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch). > >> > > >> > [1] http://hbase.apache.org/book/hadoop.html > >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append in > >> > release notes > >> > > >> > Cheers, > >> > -- > >> > Ioan Eugen Stan > >> > http://ieugen.blogspot.com/ > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > > > -- > > Thanks, > > - Inder > > Tech Platforms @Inmobi > > Linkedin - http://goo.gl/eR4Ub > > > > -- > Harsh J > -- Thanks, - Inder Tech Platforms @Inmobi Linkedin - http://goo.gl/eR4Ub
-
Re: Is append allowed in HDFS?Harsh J 2012-04-09, 19:37
Your approach looks fine to me. I'd throw in some
recovery/resume-from-errors-at-DN code around general tail-like consumption but I think you may have already done that :) But just for my curiosity - do you call sync for every record/unit or batch it by a few, for your problem? On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <[EMAIL PROTECTED]> wrote: > Yes makes sense. My use-case is more like a producer/consumer and consumer > trying to stream data as it arrives. > Has anyone hit this before and if so resolved it in a better way. > > Apologies, if i am digressing from the subject of this thread however seems > to land in the bucket of append support in HDFS. > > - Inder > > > On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> Inder, >> >> Yes, that is a requirement for readers of sync-ing data. The new meta >> entries can only be read by new readers. The read code would end up >> being exactly like the implementation for method "fs -tail" at >> >> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup >> (Line 1101) >> >> HBase does not read the WAL (HLog) continuously/vigorously as it >> syncs, by the way. It only reads the them when a specific request is >> made (for splitting, replaying and debug-printing). >> >> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <[EMAIL PROTECTED]> wrote: >> > Based on what i have tried, after a sync you need to open a new Reader. >> > Please correct if that's not the write semantics. >> > >> > Thanks, >> > - Inder >> > >> > >> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> >> >> I'd also like to note that there are some unresolved issues with the >> >> append version in the 1.x (stable) line. >> >> >> >> Note that HBase's use of the 0.20-append branch features are limited >> >> to using "sync" calls alone (Described in p68 "Coherency Model", >> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The >> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening >> >> "append" calls. The latter is what is still with issues in the 1.x >> >> releases today. Using the former is alright if its done in the way >> >> similar to HBase's WAL (HLog) (or for similar needs). >> >> >> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <[EMAIL PROTECTED]> >> >> wrote: >> >> > 2012/4/7 Florin P <[EMAIL PROTECTED]>: >> >> >> Hello! >> >> >> Just google it for supporting of append into HDFS files and the >> >> >> result: >> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or >> >> >> SequenceFile >> >> >> or whatever format. If yes, in which version this feature is >> >> >> supported >> >> >> ? >> >> >> Also where can I find a good example of using the API? I know that >> >> >> is a >> >> >> long >> >> >> debate about this subject, but really it is challenge to find on the >> >> >> the current status of this feature. >> >> >> I look forward for a trust source answer. >> >> >> Thank you, >> >> >> Regards, >> >> >> Florin >> >> > >> >> > Hi Florian, >> >> > >> >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a >> >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch). >> >> > >> >> > [1] http://hbase.apache.org/book/hadoop.html >> >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append in >> >> > release notes >> >> > >> >> > Cheers, >> >> > -- >> >> > Ioan Eugen Stan >> >> > http://ieugen.blogspot.com/ >> >> >> >> >> >> >> >> -- >> >> Harsh J >> > >> > >> > >> > >> > -- >> > Thanks, >> > - Inder >> > Tech Platforms @Inmobi >> > Linkedin - http://goo.gl/eR4Ub >> >> >> >> -- >> Harsh J > > > > > -- > Thanks, > - Inder > Tech Platforms @Inmobi > Linkedin - http://goo.gl/eR4Ub -- Harsh J
-
Re: Is append allowed in HDFS?Inder Pall 2012-04-10, 05:12
Harsh,
idea is to call sync for a configured batch. Still under implementation as other parts of the system's aren't complete. recovery/resume-from-errors-at-DN code around general tail-like >>This sounds promising, can you please shed some more light on this. - inder On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <[EMAIL PROTECTED]> wrote: > Your approach looks fine to me. I'd throw in some > recovery/resume-from-errors-at-DN code around general tail-like > consumption but I think you may have already done that :) > > But just for my curiosity - do you call sync for every record/unit or > batch it by a few, for your problem? > > On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <[EMAIL PROTECTED]> wrote: > > Yes makes sense. My use-case is more like a producer/consumer and > consumer > > trying to stream data as it arrives. > > Has anyone hit this before and if so resolved it in a better way. > > > > Apologies, if i am digressing from the subject of this thread however > seems > > to land in the bucket of append support in HDFS. > > > > - Inder > > > > > > On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> > >> Inder, > >> > >> Yes, that is a requirement for readers of sync-ing data. The new meta > >> entries can only be read by new readers. The read code would end up > >> being exactly like the implementation for method "fs -tail" at > >> > >> > http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup > >> (Line 1101) > >> > >> HBase does not read the WAL (HLog) continuously/vigorously as it > >> syncs, by the way. It only reads the them when a specific request is > >> made (for splitting, replaying and debug-printing). > >> > >> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <[EMAIL PROTECTED]> > wrote: > >> > Based on what i have tried, after a sync you need to open a new > Reader. > >> > Please correct if that's not the write semantics. > >> > > >> > Thanks, > >> > - Inder > >> > > >> > > >> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> >> > >> >> I'd also like to note that there are some unresolved issues with the > >> >> append version in the 1.x (stable) line. > >> >> > >> >> Note that HBase's use of the 0.20-append branch features are limited > >> >> to using "sync" calls alone (Described in p68 "Coherency Model", > >> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The > >> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening > >> >> "append" calls. The latter is what is still with issues in the 1.x > >> >> releases today. Using the former is alright if its done in the way > >> >> similar to HBase's WAL (HLog) (or for similar needs). > >> >> > >> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan < > [EMAIL PROTECTED]> > >> >> wrote: > >> >> > 2012/4/7 Florin P <[EMAIL PROTECTED]>: > >> >> >> Hello! > >> >> >> Just google it for supporting of append into HDFS files and the > >> >> >> result: > >> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or > >> >> >> SequenceFile > >> >> >> or whatever format. If yes, in which version this feature is > >> >> >> supported > >> >> >> ? > >> >> >> Also where can I find a good example of using the API? I know that > >> >> >> is a > >> >> >> long > >> >> >> debate about this subject, but really it is challenge to find on > the > >> >> >> the current status of this feature. > >> >> >> I look forward for a trust source answer. > >> >> >> Thank you, > >> >> >> Regards, > >> >> >> Florin > >> >> > > >> >> > Hi Florian, > >> >> > > >> >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a > >> >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch). > >> >> > > >> >> > [1] http://hbase.apache.org/book/hadoop.html > >> >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append > in > >> >> > release notes > >> >> > > >> >> > Cheers, > >> >> > -- > >> >> > Ioan Eugen Stan > >> Thanks, - Inder Tech Platforms @Inmobi Linkedin - http://goo.gl/eR4Ub
-
Re: Is append allowed in HDFS?Florin P 2012-04-13, 09:56
Hello!
Thank you all for all responses. It is possible to have a matrix of hadoop file input format that supports append or if I understood correctly, all formats are now supporting append? Thanks a lot. Regards, Florin ________________________________ From: Inder Pall <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Tuesday, April 10, 2012 8:12 AM Subject: Re: Is append allowed in HDFS? Harsh, idea is to call sync for a configured batch. Still under implementation as other parts of the system's aren't complete. recovery/resume-from-errors-at-DN code around general tail-like >>This sounds promising, can you please shed some more light on this. - inder On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <[EMAIL PROTECTED]> wrote: Your approach looks fine to me. I'd throw in some >recovery/resume-from-errors-at-DN code around general tail-like >consumption but I think you may have already done that :) > >But just for my curiosity - do you call sync for every record/unit or >batch it by a few, for your problem? > > >On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <[EMAIL PROTECTED]> wrote: >> Yes makes sense. My use-case is more like a producer/consumer and consumer >> trying to stream data as it arrives. >> Has anyone hit this before and if so resolved it in a better way. >> >> Apologies, if i am digressing from the subject of this thread however seems >> to land in the bucket of append support in HDFS. >> >> - Inder >> >> >> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>> >>> Inder, >>> >>> Yes, that is a requirement for readers of sync-ing data. The new meta >>> entries can only be read by new readers. The read code would end up >>> being exactly like the implementation for method "fs -tail" at >>> >>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup >>> (Line 1101) >>> >>> HBase does not read the WAL (HLog) continuously/vigorously as it >>> syncs, by the way. It only reads the them when a specific request is >>> made (for splitting, replaying and debug-printing). >>> >>> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <[EMAIL PROTECTED]> wrote: >>> > Based on what i have tried, after a sync you need to open a new Reader. >>> > Please correct if that's not the write semantics. >>> > >>> > Thanks, >>> > - Inder >>> > >>> > >>> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>> >> >>> >> I'd also like to note that there are some unresolved issues with the >>> >> append version in the 1.x (stable) line. >>> >> >>> >> Note that HBase's use of the 0.20-append branch features are limited >>> >> to using "sync" calls alone (Described in p68 "Coherency Model", >>> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The >>> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening >>> >> "append" calls. The latter is what is still with issues in the 1.x >>> >> releases today. Using the former is alright if its done in the way >>> >> similar to HBase's WAL (HLog) (or for similar needs). >>> >> >>> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <[EMAIL PROTECTED]> >>> >> wrote: >>> >> > 2012/4/7 Florin P <[EMAIL PROTECTED]>: >>> >> >> Hello! >>> >> >> Just google it for supporting of append into HDFS files and the >>> >> >> result: >>> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or >>> >> >> SequenceFile >>> >> >> or whatever format. If yes, in which version this feature is >>> >> >> supported >>> >> >> ? >>> >> >> Also where can I find a good example of using the API? I know that >>> >> >> is a >>> >> >> long >>> >> >> debate about this subject, but really it is challenge to find on the >>> >> >> the current status of this feature. >>> >> >> I look forward for a trust source answer. >>> >> >> Thank you, >>> >> >> Regards, >>> >> >> Florin >>> >> > >>> >> > Hi Florian, >>> >> > >>> >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a Thanks, - Inder Tech Platforms @Inmobi Linkedin - http://goo.gl/eR4Ub
-
Re: Is append allowed in HDFS?Ioan Eugen Stan 2012-04-13, 10:23
2012/4/13 Florin P <[EMAIL PROTECTED]>:
> Hello! > Thank you all for all responses. It is possible to have a matrix of > hadoop > file input format that supports append or if I understood correctly, all > formats are now supporting append? > Thanks a lot. > Regards, > Florin Hi Florin, Append is a file-system feature not a file format feature although some file formats are designed to be immutable (MapFile, HFile). You can append to them, just don't use the interface they normally provide. > ________________________________ > From: Inder Pall <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Tuesday, April 10, 2012 8:12 AM > Subject: Re: Is append allowed in HDFS? > > Harsh, > > idea is to call sync for a configured batch. Still under implementation as > other parts of the system's aren't complete. > > recovery/resume-from-errors-at-DN code around general tail-like >>>This sounds promising, can you please shed some more light on this. > > - inder > On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <[EMAIL PROTECTED]> wrote: > > Your approach looks fine to me. I'd throw in some > recovery/resume-from-errors-at-DN code around general tail-like > consumption but I think you may have already done that :) > > But just for my curiosity - do you call sync for every record/unit or > batch it by a few, for your problem? > > On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <[EMAIL PROTECTED]> wrote: >> Yes makes sense. My use-case is more like a producer/consumer and >> consumer >> trying to stream data as it arrives. >> Has anyone hit this before and if so resolved it in a better way. >> >> Apologies, if i am digressing from the subject of this thread however >> seems >> to land in the bucket of append support in HDFS. >> >> - Inder >> >> >> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>> >>> Inder, >>> >>> Yes, that is a requirement for readers of sync-ing data. The new meta >>> entries can only be read by new readers. The read code would end up >>> being exactly like the implementation for method "fs -tail" at >>> >>> >>> >>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup >>> (Line 1101) >>> >>> HBase does not read the WAL (HLog) continuously/vigorously as it >>> syncs, by the way. It only reads the them when a specific request is >>> made (for splitting, replaying and debug-printing). >>> >>> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <[EMAIL PROTECTED]> wrote: >>> > Based on what i have tried, after a sync you need to open a new >>> > Reader. >>> > Please correct if that's not the write semantics. >>> > >>> > Thanks, >>> > - Inder >>> > >>> > >>> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>> >> >>> >> I'd also like to note that there are some unresolved issues with the >>> >> append version in the 1.x (stable) line. >>> >> >>> >> Note that HBase's use of the 0.20-append branch features are limited >>> >> to using "sync" calls alone (Described in p68 "Coherency Model", >>> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The >>> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening >>> >> "append" calls. The latter is what is still with issues in the 1.x >>> >> releases today. Using the former is alright if its done in the way >>> >> similar to HBase's WAL (HLog) (or for similar needs). >>> >> >>> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan >>> >> <[EMAIL PROTECTED]> >>> >> wrote: >>> >> > 2012/4/7 Florin P <[EMAIL PROTECTED]>: >>> >> >> Hello! >>> >> >> Just google it for supporting of append into HDFS files and the >>> >> >> result: >>> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or >>> >> >> SequenceFile >>> >> >> or whatever format. If yes, in which version this feature is >>> >> >> supported >>> >> >> ? >>> >> >> Also where can I find a good example of using the API? I know that >>> >> >> is a >>> >> >> long >>> >> >> debate about this subject, but really it is challenge to find on Ioan Eugen Stan http://ieugen.blogspot.com/
-
Re: Is append allowed in HDFS?Florin P 2012-04-24, 13:17
Hello!
Thank you for your responses. I've read in this posts http://stackoverflow.com/questions/5598400/hdfs-using-hdfs-api-to-append-to-a-sequencefile also https://issues.apache.org/jira/browse/HADOOP-3977 that you cannot add new fresh data in an existing SequenceFile. So, basically, you have the scenario: 1. Writing to a SequenceFile 2. Close the file 2. Reopen the written file 3. Add new fresh data to it 4. Close the file At the end you'll have the old data plus new added data. Can you have an example (code) how you can achieve this scenario with the API? Please specify which version you're using. Thank you. Regards, Florin ________________________________ From: Ioan Eugen Stan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; Florin P <[EMAIL PROTECTED]> Sent: Friday, April 13, 2012 1:23 PM Subject: Re: Is append allowed in HDFS? 2012/4/13 Florin P <[EMAIL PROTECTED]>: > Hello! > Thank you all for all responses. It is possible to have a matrix of > hadoop > file input format that supports append or if I understood correctly, all > formats are now supporting append? > Thanks a lot. > Regards, > Florin Hi Florin, Append is a file-system feature not a file format feature although some file formats are designed to be immutable (MapFile, HFile). You can append to them, just don't use the interface they normally provide. > ________________________________ > From: Inder Pall <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Tuesday, April 10, 2012 8:12 AM > Subject: Re: Is append allowed in HDFS? > > Harsh, > > idea is to call sync for a configured batch. Still under implementation as > other parts of the system's aren't complete. > > recovery/resume-from-errors-at-DN code around general tail-like >>>This sounds promising, can you please shed some more light on this. > > - inder > On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <[EMAIL PROTECTED]> wrote: > > Your approach looks fine to me. I'd throw in some > recovery/resume-from-errors-at-DN code around general tail-like > consumption but I think you may have already done that :) > > But just for my curiosity - do you call sync for every record/unit or > batch it by a few, for your problem? > > On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <[EMAIL PROTECTED]> wrote: >> Yes makes sense. My use-case is more like a producer/consumer and >> consumer >> trying to stream data as it arrives. >> Has anyone hit this before and if so resolved it in a better way. >> >> Apologies, if i am digressing from the subject of this thread however >> seems >> to land in the bucket of append support in HDFS. >> >> - Inder >> >> >> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>> >>> Inder, >>> >>> Yes, that is a requirement for readers of sync-ing data. The new meta >>> entries can only be read by new readers. The read code would end up >>> being exactly like the implementation for method "fs -tail" at >>> >>> >>> >>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup >>> (Line 1101) >>> >>> HBase does not read the WAL (HLog) continuously/vigorously as it >>> syncs, by the way. It only reads the them when a specific request is >>> made (for splitting, replaying and debug-printing). >>> >>> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <[EMAIL PROTECTED]> wrote: >>> > Based on what i have tried, after a sync you need to open a new >>> > Reader. >>> > Please correct if that's not the write semantics. >>> > >>> > Thanks, >>> > - Inder >>> > >>> > >>> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>> >> >>> >> I'd also like to note that there are some unresolved issues with the >>> >> append version in the 1.x (stable) line. >>> >> >>> >> Note that HBase's use of the 0.20-append branch features are limited >>> >> to using "sync" calls alone (Described in p68 "Coherency Model", >>> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The > Ioan Eugen Stan http://ieugen.blogspot.com/
-
Re: Is append allowed in HDFS?Harsh J 2012-04-24, 18:32
This isn't possible presently. If you close the open file stream for a
sequence file, you're done with it. I'd advise not to close it and use hflush instead, much like a WAL. Close it only when you're done with some threshold, and open a new file. The hflush (or sync in 1.x) will ensure that the latest additions are available for immediate reads (to all new readers). The patch at https://issues.apache.org/jira/browse/HADOOP-7139 will help solve this limitation though. Its under review and needs some further work. On Tue, Apr 24, 2012 at 6:47 PM, Florin P <[EMAIL PROTECTED]> wrote: > Hello! > Thank you for your responses. I've read in this posts > http://stackoverflow.com/questions/5598400/hdfs-using-hdfs-api-to-append-to-a-sequencefile > also > https://issues.apache.org/jira/browse/HADOOP-3977 > > that you cannot add new fresh data in an existing SequenceFile. So, > basically, you have the scenario: > 1. Writing to a SequenceFile > 2. Close the file > 2. Reopen the written file > 3. Add new fresh data to it > 4. Close the file > At the end you'll have the old data plus new added data. Can you have an > example (code) how you can achieve this scenario with the API? Please > specify which version you're using. > > Thank you. > > Regards, > Florin > > ________________________________ > From: Ioan Eugen Stan <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; Florin P <[EMAIL PROTECTED]> > Sent: Friday, April 13, 2012 1:23 PM > > Subject: Re: Is append allowed in HDFS? > > 2012/4/13 Florin P <[EMAIL PROTECTED]>: >> Hello! >> Thank you all for all responses. It is possible to have a matrix of >> hadoop >> file input format that supports append or if I understood correctly, all >> formats are now supporting append? >> Thanks a lot. >> Regards, >> Florin > > Hi Florin, > > Append is a file-system feature not a file format feature although > some file formats are designed to be immutable (MapFile, HFile). You > can append to them, just don't use the interface they normally > provide. > >> ________________________________ >> From: Inder Pall <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Tuesday, April 10, 2012 8:12 AM >> Subject: Re: Is append allowed in HDFS? >> >> Harsh, >> >> idea is to call sync for a configured batch. Still under implementation as >> other parts of the system's aren't complete. >> >> recovery/resume-from-errors-at-DN code around general tail-like >>>>This sounds promising, can you please shed some more light on this. >> >> - inder >> On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> Your approach looks fine to me. I'd throw in some >> recovery/resume-from-errors-at-DN code around general tail-like >> consumption but I think you may have already done that :) >> >> But just for my curiosity - do you call sync for every record/unit or >> batch it by a few, for your problem? >> >> On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <[EMAIL PROTECTED]> wrote: >>> Yes makes sense. My use-case is more like a producer/consumer and >>> consumer >>> trying to stream data as it arrives. >>> Has anyone hit this before and if so resolved it in a better way. >>> >>> Apologies, if i am digressing from the subject of this thread however >>> seems >>> to land in the bucket of append support in HDFS. >>> >>> - Inder >>> >>> >>> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>>> >>>> Inder, >>>> >>>> Yes, that is a requirement for readers of sync-ing data. The new meta >>>> entries can only be read by new readers. The read code would end up >>>> being exactly like the implementation for method "fs -tail" at >>>> >>>> >>>> >>>> >>>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup >>>> (Line 1101) >>>> >>>> HBase does not read the WAL (HLog) continuously/vigorously as it >>>> syncs, by the way. It only reads the them when a specific request is >>>> made (for splitting, replaying and debug-printing). Harsh J |