Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - How to update a file which is in HDFS


Copy link to this message
-
Re: How to update a file which is in HDFS
Harsh J 2013-07-06, 01:59
The append in 1.x is very broken. You'll run into very weird states
and we officially do not support it (we even call out in the config as
broken). I wouldn't recommend using it even if a simple test appears
to work.

On Sat, Jul 6, 2013 at 6:27 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
> @Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I had
> tried append last time and it was not working despite of the fact that API
> had it. I tried it with 1.1.2 and it seems to work fine.
>
> @Manickam : Apologies for the incorrect info. Latest stable release(1.1.2)
> supports append. But, you should consider whatever Harsh has said.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> If it is 1k new records at the "end of the file" then you may extract
>> them out and append the existing file in HDFS. I'd recommend using
>> HDFS from Apache Hadoop 2.x for this purpose.
>>
>> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >
>> > Let me explain the question clearly. I have a file which has one million
>> > records and i moved into my hadoop cluster.
>> > After one month i got a new file which has same one million plus 1000
>> > new
>> > records added in end of the file.
>> > Here i just want to move the 1000 records alone into HDFS instead of
>> > overwriting the entire file.
>> >
>> > Can i use HBase for this scenario? i don't have clear idea about HBase.
>> > Just
>> > asking.
>> >
>> >
>> >
>> >
>> > Thanks,
>> > Manickam P
>> >
>> >
>> >> From: [EMAIL PROTECTED]
>> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
>> >
>> >> Subject: Re: How to update a file which is in HDFS
>> >> To: [EMAIL PROTECTED]
>> >
>> >>
>> >> The answer to the "delta" part is more that HDFS does not presently
>> >> support random writes. You cannot alter a closed file for anything
>> >> other than appending at the end, which I doubt will help you if you
>> >> are also receiving updates (it isn't clear from your question what
>> >> this added data really is).
>> >>
>> >> HBase sounds like something that may solve your requirement though,
>> >> depending on how much of your read/write load is random. You could
>> >> consider it.
>> >>
>> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
>> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
>> >>
>> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <[EMAIL PROTECTED]>
>> >> wrote:
>> >> > Hello Manickam,
>> >> >
>> >> > Append is currently not possible.
>> >> >
>> >> > Warm Regards,
>> >> > Tariq
>> >> > cloudfront.blogspot.com
>> >> >
>> >> >
>> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <[EMAIL PROTECTED]>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I have moved my input file into the HDFS location in the cluster
>> >> >> setup.
>> >> >> Now i got a new set of file which has some new records along with
>> >> >> the
>> >> >> old
>> >> >> one.
>> >> >> I want to move the delta part alone into HDFS because it will take
>> >> >> more
>> >> >> time to move the file from my local to HDFS location.
>> >> >> Is it possible or do i need to move the entire file into HDFS again?
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> Manickam P
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>>
>>
>>
>> --
>> Harsh J
>
>

--
Harsh J