|
Martinus Martinus
2011-12-21, 03:31
Ayon Sinha
2011-12-21, 05:12
Martinus Martinus
2011-12-23, 09:04
Joey Echeverria
2011-12-25, 19:57
Martinus Martinus
2011-12-26, 02:31
Martinus Martinus
2011-12-26, 04:14
|
-
hadoop cluster for querying data on mongodbMartinus Martinus 2011-12-21, 03:31
Hi,
I have hadoop cluster running and have my data inside mongodb database. I already write a java code to query data on mongodb using mongodb-java driver. And right now, I want to use hadoop cluster to run my java code to get and put the data from and to mongo database. Did anyone has done this before? Can you explain to me how to do that? Thanks.
-
Re: hadoop cluster for querying data on mongodbAyon Sinha 2011-12-21, 05:12
Couple of things:
1. Hadoop's strength is in data locality. So having most of your Hadoop heavy lifting on local filesystem (HDFS where hadoop computation is shipped to the nodes with the data). 2. Assuming you are pulling data into Hadoop from Mongo to crunch and put the resulting data back into Mongo as only the 1st and the last step in your entire workflow, you are basically looking for a MongoInputFormat and MongoOutputFormat (I made up the class names). you are probably looking for https://jira.mongodb.org/browse/HADOOP/component/10736 Your other options if using Pig or Hive is to write Loader UDF's, similar to PigStorage, HBaseStorage, etc. -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. ________________________________ From: Martinus Martinus <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Tuesday, December 20, 2011 7:31 PM Subject: hadoop cluster for querying data on mongodb Hi, I have hadoop cluster running and have my data inside mongodb database. I already write a java code to query data on mongodb using mongodb-java driver. And right now, I want to use hadoop cluster to run my java code to get and put the data from and to mongo database. Did anyone has done this before? Can you explain to me how to do that? Thanks.
-
Re: hadoop cluster for querying data on mongodbMartinus Martinus 2011-12-23, 09:04
Hi Ayon,
I tried to setup the hadoop-cluster using hadoop-0.20.2 and it seem's to be ok, but when I tried to used another version of hadoop, such as hadoop-0.20.3, when I start-all.sh, it gaves me an error like this : uvm12dk: Unrecognized option: -jvm uvm12dk: Could not create the Java virtual machine. Would you be so kindly to help me with this problem? Thanks. Martinus On Wed, Dec 21, 2011 at 1:12 PM, Ayon Sinha <[EMAIL PROTECTED]> wrote: > Couple of things: > 1. Hadoop's strength is in data locality. So having most of your Hadoop > heavy lifting on local filesystem (HDFS where hadoop computation is shipped > to the nodes with the data). > 2. Assuming you are pulling data into Hadoop from Mongo to crunch and put > the resulting data back into Mongo as only the 1st and the last step in > your entire workflow, you are basically looking for a MongoInputFormat and > MongoOutputFormat (I made up the class names). you are probably looking for > https://jira.mongodb.org/browse/HADOOP/component/10736 > > Your other options if using Pig or Hive is to write Loader UDF's, similar > to PigStorage, HBaseStorage, etc. > > -Ayon > See My Photos on Flickr <http://www.flickr.com/photos/ayonsinha/> > Also check out my Blog for answers to commonly asked questions.<http://dailyadvisor.blogspot.com> > > ------------------------------ > *From:* Martinus Martinus <[EMAIL PROTECTED]> > *To:* [EMAIL PROTECTED] > *Sent:* Tuesday, December 20, 2011 7:31 PM > *Subject:* hadoop cluster for querying data on mongodb > > Hi, > > I have hadoop cluster running and have my data inside mongodb database. I > already write a java code to query data on mongodb using mongodb-java > driver. And right now, I want to use hadoop cluster to run my java code to > get and put the data from and to mongo database. Did anyone has done this > before? Can you explain to me how to do that? > > Thanks. > > >
-
Re: hadoop cluster for querying data on mongodbJoey Echeverria 2011-12-25, 19:57
Don't start your daemons as root. They should be started as a system
account. Typically hdfs for the HDFS services and mapred for the MapReduce ones. -Joey On Fri, Dec 23, 2011 at 4:04 AM, Martinus Martinus <[EMAIL PROTECTED]> wrote: > Hi Ayon, > > I tried to setup the hadoop-cluster using hadoop-0.20.2 and it seem's to be > ok, but when I tried to used another version of hadoop, such as > hadoop-0.20.3, when I start-all.sh, it gaves me an error like this : > > uvm12dk: Unrecognized option: -jvm > uvm12dk: Could not create the Java virtual machine. > > Would you be so kindly to help me with this problem? > > Thanks. > > Martinus > > > On Wed, Dec 21, 2011 at 1:12 PM, Ayon Sinha <[EMAIL PROTECTED]> wrote: >> >> Couple of things: >> 1. Hadoop's strength is in data locality. So having most of your Hadoop >> heavy lifting on local filesystem (HDFS where hadoop computation is shipped >> to the nodes with the data). >> 2. Assuming you are pulling data into Hadoop from Mongo to crunch and put >> the resulting data back into Mongo as only the 1st and the last step in your >> entire workflow, you are basically looking for a MongoInputFormat and >> MongoOutputFormat (I made up the class names). you are probably looking for >> https://jira.mongodb.org/browse/HADOOP/component/10736 >> >> Your other options if using Pig or Hive is to write Loader UDF's, similar >> to PigStorage, HBaseStorage, etc. >> >> -Ayon >> See My Photos on Flickr >> Also check out my Blog for answers to commonly asked questions. >> >> ________________________________ >> From: Martinus Martinus <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Tuesday, December 20, 2011 7:31 PM >> Subject: hadoop cluster for querying data on mongodb >> >> Hi, >> >> I have hadoop cluster running and have my data inside mongodb database. I >> already write a java code to query data on mongodb using mongodb-java >> driver. And right now, I want to use hadoop cluster to run my java code to >> get and put the data from and to mongo database. Did anyone has done this >> before? Can you explain to me how to do that? >> >> Thanks. >> >> > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
-
Re: hadoop cluster for querying data on mongodbMartinus Martinus 2011-12-26, 02:31
Hi Joey,
Can you give more explanation about that? You mean I should make a new user and new group or just ssh? Thanks. On Mon, Dec 26, 2011 at 3:57 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > Don't start your daemons as root. They should be started as a system > account. Typically hdfs for the HDFS services and mapred for the > MapReduce ones. > > -Joey > > On Fri, Dec 23, 2011 at 4:04 AM, Martinus Martinus > <[EMAIL PROTECTED]> wrote: > > Hi Ayon, > > > > I tried to setup the hadoop-cluster using hadoop-0.20.2 and it seem's to > be > > ok, but when I tried to used another version of hadoop, such as > > hadoop-0.20.3, when I start-all.sh, it gaves me an error like this : > > > > uvm12dk: Unrecognized option: -jvm > > uvm12dk: Could not create the Java virtual machine. > > > > Would you be so kindly to help me with this problem? > > > > Thanks. > > > > Martinus > > > > > > On Wed, Dec 21, 2011 at 1:12 PM, Ayon Sinha <[EMAIL PROTECTED]> wrote: > >> > >> Couple of things: > >> 1. Hadoop's strength is in data locality. So having most of your Hadoop > >> heavy lifting on local filesystem (HDFS where hadoop computation is > shipped > >> to the nodes with the data). > >> 2. Assuming you are pulling data into Hadoop from Mongo to crunch and > put > >> the resulting data back into Mongo as only the 1st and the last step in > your > >> entire workflow, you are basically looking for a MongoInputFormat and > >> MongoOutputFormat (I made up the class names). you are probably looking > for > >> https://jira.mongodb.org/browse/HADOOP/component/10736 > >> > >> Your other options if using Pig or Hive is to write Loader UDF's, > similar > >> to PigStorage, HBaseStorage, etc. > >> > >> -Ayon > >> See My Photos on Flickr > >> Also check out my Blog for answers to commonly asked questions. > >> > >> ________________________________ > >> From: Martinus Martinus <[EMAIL PROTECTED]> > >> To: [EMAIL PROTECTED] > >> Sent: Tuesday, December 20, 2011 7:31 PM > >> Subject: hadoop cluster for querying data on mongodb > >> > >> Hi, > >> > >> I have hadoop cluster running and have my data inside mongodb database. > I > >> already write a java code to query data on mongodb using mongodb-java > >> driver. And right now, I want to use hadoop cluster to run my java code > to > >> get and put the data from and to mongo database. Did anyone has done > this > >> before? Can you explain to me how to do that? > >> > >> Thanks. > >> > >> > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
-
Re: hadoop cluster for querying data on mongodbMartinus Martinus 2011-12-26, 04:14
Hi Joey,
I add new user and start-all.sh and it can worked right now, but when I tried to used the wordcount example, it gave me this : 11/12/26 11:52:51 INFO input.FileInputFormat: Total input paths to process : 1 11/12/26 11:53:01 INFO mapred.JobClient: Running job: job_201112261118_0002 11/12/26 11:53:03 INFO mapred.JobClient: map 0% reduce 0% 11/12/26 11:56:46 INFO mapred.JobClient: map 100% reduce 0% 11/12/26 12:11:10 INFO mapred.JobClient: Task Id : attempt_201112261118_0002_r_000000_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 11/12/26 12:11:46 WARN mapred.JobClient: Error reading task outputConnection timed out 11/12/26 12:12:07 WARN mapred.JobClient: Error reading task outputConnection timed out Would you be so kindly to tell me how to fix this? Thanks. On Mon, Dec 26, 2011 at 10:31 AM, Martinus Martinus <[EMAIL PROTECTED]>wrote: > Hi Joey, > > Can you give more explanation about that? You mean I should make a new > user and new group or just ssh? > > Thanks. > > > On Mon, Dec 26, 2011 at 3:57 AM, Joey Echeverria <[EMAIL PROTECTED]>wrote: > >> Don't start your daemons as root. They should be started as a system >> account. Typically hdfs for the HDFS services and mapred for the >> MapReduce ones. >> >> -Joey >> >> On Fri, Dec 23, 2011 at 4:04 AM, Martinus Martinus >> <[EMAIL PROTECTED]> wrote: >> > Hi Ayon, >> > >> > I tried to setup the hadoop-cluster using hadoop-0.20.2 and it seem's >> to be >> > ok, but when I tried to used another version of hadoop, such as >> > hadoop-0.20.3, when I start-all.sh, it gaves me an error like this : >> > >> > uvm12dk: Unrecognized option: -jvm >> > uvm12dk: Could not create the Java virtual machine. >> > >> > Would you be so kindly to help me with this problem? >> > >> > Thanks. >> > >> > Martinus >> > >> > >> > On Wed, Dec 21, 2011 at 1:12 PM, Ayon Sinha <[EMAIL PROTECTED]> >> wrote: >> >> >> >> Couple of things: >> >> 1. Hadoop's strength is in data locality. So having most of your Hadoop >> >> heavy lifting on local filesystem (HDFS where hadoop computation is >> shipped >> >> to the nodes with the data). >> >> 2. Assuming you are pulling data into Hadoop from Mongo to crunch and >> put >> >> the resulting data back into Mongo as only the 1st and the last step >> in your >> >> entire workflow, you are basically looking for a MongoInputFormat and >> >> MongoOutputFormat (I made up the class names). you are probably >> looking for >> >> https://jira.mongodb.org/browse/HADOOP/component/10736 >> >> >> >> Your other options if using Pig or Hive is to write Loader UDF's, >> similar >> >> to PigStorage, HBaseStorage, etc. >> >> >> >> -Ayon >> >> See My Photos on Flickr >> >> Also check out my Blog for answers to commonly asked questions. >> >> >> >> ________________________________ >> >> From: Martinus Martinus <[EMAIL PROTECTED]> >> >> To: [EMAIL PROTECTED] >> >> Sent: Tuesday, December 20, 2011 7:31 PM >> >> Subject: hadoop cluster for querying data on mongodb >> >> >> >> Hi, >> >> >> >> I have hadoop cluster running and have my data inside mongodb >> database. I >> >> already write a java code to query data on mongodb using mongodb-java >> >> driver. And right now, I want to use hadoop cluster to run my java >> code to >> >> get and put the data from and to mongo database. Did anyone has done >> this >> >> before? Can you explain to me how to do that? >> >> >> >> Thanks. >> >> >> >> >> > >> >> >> >> -- >> Joseph Echeverria >> Cloudera, Inc. >> 443.305.9434 >> > > |