Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: How to update a file which is in HDFS


+
Mohammad Mustaqeem 2013-07-04, 13:31
+
John Lilley 2013-07-04, 21:35
+
Harsh J 2013-07-05, 10:43
+
Manickam P 2013-07-05, 10:52
+
Harsh J 2013-07-05, 10:54
Copy link to this message
-
Re: How to update a file which is in HDFS
@Robin East :  Thank you for keeping me updated. I was on 1.0.3 when I had
tried append last time and it was not working despite of the fact that API
had it. I tried it with 1.1.2 and it seems to work fine.

@Manickam : Apologies for the incorrect info. Latest stable release(1.1.2)
supports append. But, you should consider whatever Harsh has said.

Warm Regards,
Tariq
cloudfront.blogspot.com
On Fri, Jul 5, 2013 at 4:24 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> If it is 1k new records at the "end of the file" then you may extract
> them out and append the existing file in HDFS. I'd recommend using
> HDFS from Apache Hadoop 2.x for this purpose.
>
> On Fri, Jul 5, 2013 at 4:22 PM, Manickam P <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > Let me explain the question clearly. I have a file which has one million
> > records and i moved into my hadoop cluster.
> > After one month i got a new file which has same one million plus 1000 new
> > records added in end of the file.
> > Here i just want to move the 1000 records alone into HDFS instead of
> > overwriting the entire file.
> >
> > Can i use HBase for this scenario? i don't have clear idea about HBase.
> Just
> > asking.
> >
> >
> >
> >
> > Thanks,
> > Manickam P
> >
> >
> >> From: [EMAIL PROTECTED]
> >> Date: Fri, 5 Jul 2013 16:13:16 +0530
> >
> >> Subject: Re: How to update a file which is in HDFS
> >> To: [EMAIL PROTECTED]
> >
> >>
> >> The answer to the "delta" part is more that HDFS does not presently
> >> support random writes. You cannot alter a closed file for anything
> >> other than appending at the end, which I doubt will help you if you
> >> are also receiving updates (it isn't clear from your question what
> >> this added data really is).
> >>
> >> HBase sounds like something that may solve your requirement though,
> >> depending on how much of your read/write load is random. You could
> >> consider it.
> >>
> >> P.s. HBase too doesn't use the append() APIs today (and doesn't need
> >> it either). AFAIK, only Flume's making use of it, if you allow it to.
> >>
> >> On Thu, Jul 4, 2013 at 5:17 PM, Mohammad Tariq <[EMAIL PROTECTED]>
> wrote:
> >> > Hello Manickam,
> >> >
> >> > Append is currently not possible.
> >> >
> >> > Warm Regards,
> >> > Tariq
> >> > cloudfront.blogspot.com
> >> >
> >> >
> >> > On Thu, Jul 4, 2013 at 4:40 PM, Manickam P <[EMAIL PROTECTED]>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I have moved my input file into the HDFS location in the cluster
> setup.
> >> >> Now i got a new set of file which has some new records along with the
> >> >> old
> >> >> one.
> >> >> I want to move the delta part alone into HDFS because it will take
> more
> >> >> time to move the file from my local to HDFS location.
> >> >> Is it possible or do i need to move the entire file into HDFS again?
> >> >>
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Manickam P
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
>
>
>
> --
> Harsh J
>
+
Mohammad Tariq 2013-07-06, 02:13