Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Issue with DistributedCache


Copy link to this message
-
Re: Issue with DistributedCache
Hi Denis
       I tried your code with out distributed cache locally and it worked
fine for me. Please find it at
http://pastebin.com/ki175YUx

I echo Mike's words in submitting a map reduce jobs remotely. The remote
machine can be your local PC or any utility server as Mike specified. What
you need to have in remote machine is a replica of hadoop jars and
configuration files same as that of your hadoop cluster. (If you don't have
a remote util server set up then you can use your dev machine for the
same). Just trigger the hadoop job  on local machine and the actual job
would be submitted and running on your cluster based on the NN host and
configuration parameters you have on your config files.

Hope it helps!..

Regards
Bejoy.K.S

On Thu, Nov 24, 2011 at 7:09 PM, Michel Segel <[EMAIL PROTECTED]>wrote:

> Denis...
>
> Sorry, you lost me.
>
> Just to make sure we're using the same terminology...
> The cluster is comprised of two types of nodes...
> The data nodes which run DN,TT, and if you have HBase, RS.
> Then there are control nodes which run you NN,SN, JT and if you run HBase,
> HM and ZKs ...
>
> Outside of the cluster we have machines set up with Hadoop installed but
> are not running any of the processes. They are where our users launch there
> jobs. We call them edge nodes. ( it's not a good idea to let users directly
> on the actual cluster.)
>
> Ok, having said all of that... You launch you job from the edge nodes...
> Your data sits in HDFS so you don't need distributed cache at all. Does
> that make sense?
> You job will run on the local machine, connect to the JT and then run.
>
> We set up the edge nodes so that all of the jars, config files are already
> set up for the users and we can better control access...
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Nov 24, 2011, at 7:22 AM, Denis Kreis <[EMAIL PROTECTED]> wrote:
>
> > Without using the distributed cache i'm getting the same error. It's
> > because i start the job from a remote client / programmatically
> >
> > 2011/11/24 Michel Segel <[EMAIL PROTECTED]>:
> >> Silly question... Why do you need to use the distributed cache for the
> word count program?
> >>  What are you trying to accomplish?
> >>
> >> I've only had to play with it for one project where we had to push out
> a bunch of c++ code to the nodes as part of a job...
> >>
> >> Sent from a remote device. Please excuse any typos...
> >>
> >> Mike Segel
> >>
> >> On Nov 24, 2011, at 7:05 AM, Denis Kreis <[EMAIL PROTECTED]> wrote:
> >>
> >>> Hi Bejoy
> >>>
> >>> 1. Old API:
> >>> The Map and Reduce classes are the same as in the example, the main
> >>> method is as follows
> >>>
> >>> public static void main(String[] args) throws IOException,
> >>> InterruptedException {
> >>>        UserGroupInformation ugi > >>> UserGroupInformation.createProxyUser("<remote user name>",
> >>> UserGroupInformation.getLoginUser());
> >>>        ugi.doAs(new PrivilegedExceptionAction<Void>() {
> >>>            public Void run() throws Exception {
> >>>
> >>>                JobConf conf = new JobConf(WordCount.class);
> >>>                conf.setJobName("wordcount");
> >>>
> >>>                conf.setOutputKeyClass(Text.class);
> >>>                conf.setOutputValueClass(IntWritable.class);
> >>>
> >>>                conf.setMapperClass(Map.class);
> >>>                conf.setCombinerClass(Reduce.class);
> >>>                conf.setReducerClass(Reduce.class);
> >>>
> >>>                conf.setInputFormat(TextInputFormat.class);
> >>>                conf.setOutputFormat(TextOutputFormat.class);
> >>>
> >>>                FileInputFormat.setInputPaths(conf, new Path("<path to
> input dir>"));
> >>>                FileOutputFormat.setOutputPath(conf, new Path("<path to
> >>> output dir>"));
> >>>
> >>>                conf.set("mapred.job.tracker", "<ip:8021>");
> >>>
> >>>                FileSystem fs = FileSystem.get(new
> URI("hdfs://<ip>:8020"),
> >>> new Configuration());
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB