|
|
jagaran das 2011-06-17, 00:33
Hi,
We have a requirement where
There would be huge number of small files to be pushed to hdfs and then use pig to do analysis. To get around the classic "Small File Issue" we merge the files and push a bigger file in to HDFS. But we are loosing time in this merging process of our pipeline.
But If we can directly append to an existing file in HDFS we can save this "Merging Files" time.
Can you please suggest if there a newer stable version of Hadoop where can go for appending ?
Thanks and Regards, Jagaran
-
Re: HDFS File Appending
Xiaobo Gu 2011-06-17, 01:26
please refer to FileUtil.CopyMerge
On Fri, Jun 17, 2011 at 8:33 AM, jagaran das <[EMAIL PROTECTED]> wrote: > Hi, > > We have a requirement where > > There would be huge number of small files to be pushed to hdfs and then use pig > to do analysis. > To get around the classic "Small File Issue" we merge the files and push a > bigger file in to HDFS. > But we are loosing time in this merging process of our pipeline. > > But If we can directly append to an existing file in HDFS we can save this > "Merging Files" time. > > Can you please suggest if there a newer stable version of Hadoop where can go > for appending ? > > Thanks and Regards, > Jagaran
-
Re: HDFS File Appending
madhu phatak 2011-06-21, 10:11
HDFS doesnot support Appending i think . I m not sure about pig , if you are using Hadoop directly you can zip the files and use zip as the input the jobs.
On Fri, Jun 17, 2011 at 6:56 AM, Xiaobo Gu <[EMAIL PROTECTED]> wrote:
> please refer to FileUtil.CopyMerge > > On Fri, Jun 17, 2011 at 8:33 AM, jagaran das <[EMAIL PROTECTED]> > wrote: > > Hi, > > > > We have a requirement where > > > > There would be huge number of small files to be pushed to hdfs and then > use pig > > to do analysis. > > To get around the classic "Small File Issue" we merge the files and push > a > > bigger file in to HDFS. > > But we are loosing time in this merging process of our pipeline. > > > > But If we can directly append to an existing file in HDFS we can save > this > > "Merging Files" time. > > > > Can you please suggest if there a newer stable version of Hadoop where > can go > > for appending ? > > > > Thanks and Regards, > > Jagaran >
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext