Tried distcp but it fails. Is there a way to merge them? Or else I could
write a pig script to load from multiple paths
org.apache.hadoop.tools.DistCp$DuplicationException: Invalid input, there
are duplicated files in the sources:
On Sat, Dec 22, 2012 at 11:24 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> The technical term for this is "copying". You may have heard of it.
> It is a subject of such long technical standing that many do not consider
> it worthy of detailed documentation.
> Distcp effects a similar process and can be modified to combine the input
> files into a single file.
> On Sat, Dec 22, 2012 at 10:54 AM, Barak Yaish <[EMAIL PROTECTED]>wrote:
>> Can you please attach HOW-TO links for the alternatives you mentioned?
>> On Sat, Dec 22, 2012 at 10:46 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>> Yes, via the simple act of opening a target stream and writing all
>>> source streams into it. Or to save code time, an identity job with a
>>> single reducer (you may not get control over ordering this way).
>>> On Sat, Dec 22, 2012 at 12:10 PM, Mohit Anchlia <[EMAIL PROTECTED]>
>>> > Is it possible to merge files from different locations from HDFS
>>> > into one file into HDFS location?
>>> Harsh J