Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> hadoop cluster for querying data on mongodb


+
Martinus Martinus 2011-12-21, 03:31
+
Ayon Sinha 2011-12-21, 05:12
+
Martinus Martinus 2011-12-23, 09:04
+
Joey Echeverria 2011-12-25, 19:57
+
Martinus Martinus 2011-12-26, 02:31
Copy link to this message
-
Re: hadoop cluster for querying data on mongodb
Hi Joey,

I add new user and start-all.sh and it can worked right now, but when I
tried to used the wordcount example, it gave me this :

11/12/26 11:52:51 INFO input.FileInputFormat: Total input paths to process
: 1
11/12/26 11:53:01 INFO mapred.JobClient: Running job: job_201112261118_0002
11/12/26 11:53:03 INFO mapred.JobClient:  map 0% reduce 0%
11/12/26 11:56:46 INFO mapred.JobClient:  map 100% reduce 0%
11/12/26 12:11:10 INFO mapred.JobClient: Task Id :
attempt_201112261118_0002_r_000000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
11/12/26 12:11:46 WARN mapred.JobClient: Error reading task
outputConnection timed out
11/12/26 12:12:07 WARN mapred.JobClient: Error reading task
outputConnection timed out

Would you be so kindly to tell me how to fix this?

Thanks.

On Mon, Dec 26, 2011 at 10:31 AM, Martinus Martinus
<[EMAIL PROTECTED]>wrote:

> Hi Joey,
>
> Can you give more explanation about that? You mean I should make a new
> user and new group or just ssh?
>
> Thanks.
>
>
> On Mon, Dec 26, 2011 at 3:57 AM, Joey Echeverria <[EMAIL PROTECTED]>wrote:
>
>> Don't start your daemons as root. They should be started as a system
>> account. Typically hdfs for the HDFS services and mapred for the
>> MapReduce ones.
>>
>> -Joey
>>
>> On Fri, Dec 23, 2011 at 4:04 AM, Martinus Martinus
>> <[EMAIL PROTECTED]> wrote:
>> > Hi Ayon,
>> >
>> > I tried to setup the hadoop-cluster using hadoop-0.20.2 and it seem's
>> to be
>> > ok, but when I tried to used another version of hadoop, such as
>> > hadoop-0.20.3, when I start-all.sh, it gaves me an error like this :
>> >
>> > uvm12dk: Unrecognized option: -jvm
>> > uvm12dk: Could not create the Java virtual machine.
>> >
>> > Would you be so kindly to help me with this problem?
>> >
>> > Thanks.
>> >
>> > Martinus
>> >
>> >
>> > On Wed, Dec 21, 2011 at 1:12 PM, Ayon Sinha <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >> Couple of things:
>> >> 1. Hadoop's strength is in data locality. So having most of your Hadoop
>> >> heavy lifting on local filesystem (HDFS where hadoop computation is
>> shipped
>> >> to the nodes with the data).
>> >> 2. Assuming you are pulling data into Hadoop from Mongo to crunch and
>> put
>> >> the resulting data back into Mongo as only the 1st and the last step
>> in your
>> >> entire workflow, you are basically looking for a MongoInputFormat and
>> >> MongoOutputFormat (I made up the class names). you are probably
>> looking for
>> >> https://jira.mongodb.org/browse/HADOOP/component/10736
>> >>
>> >> Your other options if using Pig or Hive is to write Loader UDF's,
>> similar
>> >> to PigStorage, HBaseStorage, etc.
>> >>
>> >> -Ayon
>> >> See My Photos on Flickr
>> >> Also check out my Blog for answers to commonly asked questions.
>> >>
>> >> ________________________________
>> >> From: Martinus Martinus <[EMAIL PROTECTED]>
>> >> To: [EMAIL PROTECTED]
>> >> Sent: Tuesday, December 20, 2011 7:31 PM
>> >> Subject: hadoop cluster for querying data on mongodb
>> >>
>> >> Hi,
>> >>
>> >> I have hadoop cluster running and have my data inside mongodb
>> database. I
>> >> already write a java code to query data on mongodb using mongodb-java
>> >> driver. And right now, I want to use hadoop cluster to run my java
>> code to
>> >> get and put the data from and to mongo database. Did anyone has done
>> this
>> >> before? Can you explain to me how to do that?
>> >>
>> >> Thanks.
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB