|
|
-
Re: Merging filesMohit Anchlia 2012-12-22, 20:53
Tried distcp but it fails. Is there a way to merge them? Or else I could
write a pig script to load from multiple paths org.apache.hadoop.tools.DistCp$DuplicationException: Invalid input, there are duplicated files in the sources: maprfs:/user/apuser/web-analytics/flume-output/2012/12/20/22/output/appinfo, maprfs:/user/apuser/web-analytics/flume-output/2012/12/21/00/output/appinfo at org.apache.hadoop.tools.DistCp.checkDuplication(DistCp.java:1419) at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1222) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:675) at org.apache.hadoop.tools.DistCp.run(DistCp.java:910) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:937) On Sat, Dec 22, 2012 at 11:24 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > The technical term for this is "copying". You may have heard of it. > > It is a subject of such long technical standing that many do not consider > it worthy of detailed documentation. > > Distcp effects a similar process and can be modified to combine the input > files into a single file. > > http://hadoop.apache.org/docs/r1.0.4/distcp.html > > > On Sat, Dec 22, 2012 at 10:54 AM, Barak Yaish <[EMAIL PROTECTED]>wrote: > >> Can you please attach HOW-TO links for the alternatives you mentioned? >> >> >> On Sat, Dec 22, 2012 at 10:46 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> >>> Yes, via the simple act of opening a target stream and writing all >>> source streams into it. Or to save code time, an identity job with a >>> single reducer (you may not get control over ordering this way). >>> >>> On Sat, Dec 22, 2012 at 12:10 PM, Mohit Anchlia <[EMAIL PROTECTED]> >>> wrote: >>> > Is it possible to merge files from different locations from HDFS >>> location >>> > into one file into HDFS location? >>> >>> >>> >>> -- >>> Harsh J >>> >> >> > |