Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Combine data from different HDFS FS


+
Pedro Sá da Costa 2013-04-08, 16:26
+
Harsh J 2013-04-08, 17:34
Copy link to this message
-
Re: Combine data from different HDFS FS
I'm invoking the wordcount example in host1 with this command, but I got an
error.
HOST1:$ bin/hadoop jar hadoop-examples-1.0.4.jar wordcount
hdfs://HOST2:54310/gutenberg gutenberg-output

13/04/08 22:02:55 ERROR security.UserGroupInformation:
PriviledgedActionException as:ubuntu
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: hdfs://HOST2:54310/gutenberg
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: hdfs://HOST2:54310/gutenberg

Can you be more specific about using the FileinputFormat? It's because I've
configured MapReduce and HDFS to work in HOST, and I don't know how can I
make an wordcount that reads the data from the HDFS from files in HOST1 and
HOST2?
On 8 April 2013 19:34, Harsh J <[EMAIL PROTECTED]> wrote:

> You should be able to add fully qualified HDFS paths from N clusters
> into the same job via FileInputFormat.addInputPath(…) calls. Caveats
> may apply for secure environments, but for non-secure mode this should
> work just fine. Did you try this and did it not work?
>
> On Mon, Apr 8, 2013 at 9:56 PM, Pedro Sá da Costa <[EMAIL PROTECTED]>
> wrote:
> > Hi,
> >
> > I want to combine the data that are in different HDFS filesystems, for
> them
> > to be executed in one job. Is it possible to do this with MR, or there is
> > another Apache tool that allows me to do this?
> >
> > Eg.
> >
> > Hdfs data in Cluster1 ----v
> > Hdfs data in Cluster2 -> this job reads the data from Cluster1, 2
> >
> >
> > Thanks,
> > --
> > Best regards,
>
>
>
> --
> Harsh J
>

--
Best regards,
+
Pedro Sá da Costa 2013-04-09, 04:07
+
David Rosenstrauch 2013-04-09, 13:37