Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: Merging files


+
Barak Yaish 2012-12-22, 18:54
+
Ted Dunning 2012-12-22, 19:24
Copy link to this message
-
Re: Merging files
Mohit Anchlia 2012-12-22, 20:53
Tried distcp but it fails. Is there a way to merge them? Or else I could
write a pig script to load from multiple paths
org.apache.hadoop.tools.DistCp$DuplicationException: Invalid input, there
are duplicated files in the sources:
maprfs:/user/apuser/web-analytics/flume-output/2012/12/20/22/output/appinfo,
maprfs:/user/apuser/web-analytics/flume-output/2012/12/21/00/output/appinfo

at org.apache.hadoop.tools.DistCp.checkDuplication(DistCp.java:1419)

at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1222)

at org.apache.hadoop.tools.DistCp.copy(DistCp.java:675)

at org.apache.hadoop.tools.DistCp.run(DistCp.java:910)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

at org.apache.hadoop.tools.DistCp.main(DistCp.java:937)
On Sat, Dec 22, 2012 at 11:24 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> The technical term for this is "copying".  You may have heard of it.
>
> It is a subject of such long technical standing that many do not consider
> it worthy of detailed documentation.
>
> Distcp effects a similar process and can be modified to combine the input
> files into a single file.
>
> http://hadoop.apache.org/docs/r1.0.4/distcp.html
>
>
> On Sat, Dec 22, 2012 at 10:54 AM, Barak Yaish <[EMAIL PROTECTED]>wrote:
>
>> Can you please attach HOW-TO links for the alternatives you mentioned?
>>
>>
>> On Sat, Dec 22, 2012 at 10:46 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> Yes, via the simple act of opening a target stream and writing all
>>> source streams into it. Or to save code time, an identity job with a
>>> single reducer (you may not get control over ordering this way).
>>>
>>> On Sat, Dec 22, 2012 at 12:10 PM, Mohit Anchlia <[EMAIL PROTECTED]>
>>> wrote:
>>> > Is it possible to merge files from different locations from HDFS
>>> location
>>> > into one file into HDFS location?
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>
+
Ted Dunning 2012-12-22, 22:05
+
Mohit Anchlia 2012-12-23, 06:20
+
Edward Capriolo 2012-12-23, 15:30