Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Merging files


Copy link to this message
-
Re: Merging files
Tried distcp but it fails. Is there a way to merge them? Or else I could
write a pig script to load from multiple paths
org.apache.hadoop.tools.DistCp$DuplicationException: Invalid input, there
are duplicated files in the sources:
maprfs:/user/apuser/web-analytics/flume-output/2012/12/20/22/output/appinfo,
maprfs:/user/apuser/web-analytics/flume-output/2012/12/21/00/output/appinfo

at org.apache.hadoop.tools.DistCp.checkDuplication(DistCp.java:1419)

at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1222)

at org.apache.hadoop.tools.DistCp.copy(DistCp.java:675)

at org.apache.hadoop.tools.DistCp.run(DistCp.java:910)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

at org.apache.hadoop.tools.DistCp.main(DistCp.java:937)
On Sat, Dec 22, 2012 at 11:24 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> The technical term for this is "copying".  You may have heard of it.
>
> It is a subject of such long technical standing that many do not consider
> it worthy of detailed documentation.
>
> Distcp effects a similar process and can be modified to combine the input
> files into a single file.
>
> http://hadoop.apache.org/docs/r1.0.4/distcp.html
>
>
> On Sat, Dec 22, 2012 at 10:54 AM, Barak Yaish <[EMAIL PROTECTED]>wrote:
>
>> Can you please attach HOW-TO links for the alternatives you mentioned?
>>
>>
>> On Sat, Dec 22, 2012 at 10:46 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> Yes, via the simple act of opening a target stream and writing all
>>> source streams into it. Or to save code time, an identity job with a
>>> single reducer (you may not get control over ordering this way).
>>>
>>> On Sat, Dec 22, 2012 at 12:10 PM, Mohit Anchlia <[EMAIL PROTECTED]>
>>> wrote:
>>> > Is it possible to merge files from different locations from HDFS
>>> location
>>> > into one file into HDFS location?
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB