Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Combine data from different HDFS FS


Copy link to this message
-
Re: Combine data from different HDFS FS
Maybe there is some FileInputFormat class that allows to define input files
from different locations. What I would like to know, is if it's possible to
read input data from different HDFS FS. E.g., run the wordcount with the
input files from HDFS FS in HOST1 and HOST2 (the FS in HOST1 and HOST2 are
distinct). Any suggestion on which InputFormat I should use?

On 9 April 2013 00:10, Pedro Sá da Costa <[EMAIL PROTECTED]> wrote:

> I'm invoking the wordcount example in host1 with this command, but I got
> an error.
>
>
> HOST1:$ bin/hadoop jar hadoop-examples-1.0.4.jar wordcount
> hdfs://HOST2:54310/gutenberg gutenberg-output
>
> 13/04/08 22:02:55 ERROR security.UserGroupInformation:
> PriviledgedActionException as:ubuntu
> cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> path does not exist: hdfs://HOST2:54310/gutenberg
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: hdfs://HOST2:54310/gutenberg
>
> Can you be more specific about using the FileinputFormat? It's because
> I've configured MapReduce and HDFS to work in HOST, and I don't know how
> can I make an wordcount that reads the data from the HDFS from files in
> HOST1 and HOST2?
>
>
>
>
>
>
> On 8 April 2013 19:34, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> You should be able to add fully qualified HDFS paths from N clusters
>> into the same job via FileInputFormat.addInputPath(…) calls. Caveats
>> may apply for secure environments, but for non-secure mode this should
>> work just fine. Did you try this and did it not work?
>>
>> On Mon, Apr 8, 2013 at 9:56 PM, Pedro Sá da Costa <[EMAIL PROTECTED]>
>> wrote:
>> > Hi,
>> >
>> > I want to combine the data that are in different HDFS filesystems, for
>> them
>> > to be executed in one job. Is it possible to do this with MR, or there
>> is
>> > another Apache tool that allows me to do this?
>> >
>> > Eg.
>> >
>> > Hdfs data in Cluster1 ----v
>> > Hdfs data in Cluster2 -> this job reads the data from Cluster1, 2
>> >
>> >
>> > Thanks,
>> > --
>> > Best regards,
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> Best regards,
>

--
Best regards,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB