Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - hadoop cluster for querying data on mongodb


Copy link to this message
-
Re: hadoop cluster for querying data on mongodb
Martinus Martinus 2011-12-26, 04:14
Hi Joey,

I add new user and start-all.sh and it can worked right now, but when I
tried to used the wordcount example, it gave me this :

11/12/26 11:52:51 INFO input.FileInputFormat: Total input paths to process
: 1
11/12/26 11:53:01 INFO mapred.JobClient: Running job: job_201112261118_0002
11/12/26 11:53:03 INFO mapred.JobClient:  map 0% reduce 0%
11/12/26 11:56:46 INFO mapred.JobClient:  map 100% reduce 0%
11/12/26 12:11:10 INFO mapred.JobClient: Task Id :
attempt_201112261118_0002_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
11/12/26 12:11:46 WARN mapred.JobClient: Error reading task
outputConnection timed out
11/12/26 12:12:07 WARN mapred.JobClient: Error reading task
outputConnection timed out

Would you be so kindly to tell me how to fix this?

Thanks.

On Mon, Dec 26, 2011 at 10:31 AM, Martinus Martinus
<[EMAIL PROTECTED]>wrote:

> Hi Joey,
>
> Can you give more explanation about that? You mean I should make a new
> user and new group or just ssh?
>
> Thanks.
>
>
> On Mon, Dec 26, 2011 at 3:57 AM, Joey Echeverria <[EMAIL PROTECTED]>wrote:
>
>> Don't start your daemons as root. They should be started as a system
>> account. Typically hdfs for the HDFS services and mapred for the
>> MapReduce ones.
>>
>> -Joey
>>
>> On Fri, Dec 23, 2011 at 4:04 AM, Martinus Martinus
>> <[EMAIL PROTECTED]> wrote:
>> > Hi Ayon,
>> >
>> > I tried to setup the hadoop-cluster using hadoop-0.20.2 and it seem's
>> to be
>> > ok, but when I tried to used another version of hadoop, such as
>> > hadoop-0.20.3, when I start-all.sh, it gaves me an error like this :
>> >
>> > uvm12dk: Unrecognized option: -jvm
>> > uvm12dk: Could not create the Java virtual machine.
>> >
>> > Would you be so kindly to help me with this problem?
>> >
>> > Thanks.
>> >
>> > Martinus
>> >
>> >
>> > On Wed, Dec 21, 2011 at 1:12 PM, Ayon Sinha <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >> Couple of things:
>> >> 1. Hadoop's strength is in data locality. So having most of your Hadoop
>> >> heavy lifting on local filesystem (HDFS where hadoop computation is
>> shipped
>> >> to the nodes with the data).
>> >> 2. Assuming you are pulling data into Hadoop from Mongo to crunch and
>> put
>> >> the resulting data back into Mongo as only the 1st and the last step
>> in your
>> >> entire workflow, you are basically looking for a MongoInputFormat and
>> >> MongoOutputFormat (I made up the class names). you are probably
>> looking for
>> >> https://jira.mongodb.org/browse/HADOOP/component/10736
>> >>
>> >> Your other options if using Pig or Hive is to write Loader UDF's,
>> similar
>> >> to PigStorage, HBaseStorage, etc.
>> >>
>> >> -Ayon
>> >> See My Photos on Flickr
>> >> Also check out my Blog for answers to commonly asked questions.
>> >>
>> >> ________________________________
>> >> From: Martinus Martinus <[EMAIL PROTECTED]>
>> >> To: [EMAIL PROTECTED]
>> >> Sent: Tuesday, December 20, 2011 7:31 PM
>> >> Subject: hadoop cluster for querying data on mongodb
>> >>
>> >> Hi,
>> >>
>> >> I have hadoop cluster running and have my data inside mongodb
>> database. I
>> >> already write a java code to query data on mongodb using mongodb-java
>> >> driver. And right now, I want to use hadoop cluster to run my java
>> code to
>> >> get and put the data from and to mongo database. Did anyone has done
>> this
>> >> before? Can you explain to me how to do that?
>> >>
>> >> Thanks.
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>
>