|
|
-
Re: Issue with DistributedCacheBejoy Ks 2011-11-24, 16:03
My Bad, I pasted the wrong file. It is updated now, did a few tiny
modifications(commented in code) and it was working fine for me. http://pastebin.com/RDuZX7Qd Alex, Thanks a lot for pointing out that. Regards Bejoy.KS On Thu, Nov 24, 2011 at 8:31 PM, Alexander C.H. Lorenz < [EMAIL PROTECTED]> wrote: > Hi, > > a typo? > import com.bejoy.sampels.worcount.WordCountDriver; > = wor_d_count ? > > - alex > > On Thu, Nov 24, 2011 at 3:45 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote: > > > Hi Denis > > I tried your code with out distributed cache locally and it worked > > fine for me. Please find it at > > http://pastebin.com/ki175YUx > > > > I echo Mike's words in submitting a map reduce jobs remotely. The remote > > machine can be your local PC or any utility server as Mike specified. > What > > you need to have in remote machine is a replica of hadoop jars and > > configuration files same as that of your hadoop cluster. (If you don't > have > > a remote util server set up then you can use your dev machine for the > > same). Just trigger the hadoop job on local machine and the actual job > > would be submitted and running on your cluster based on the NN host and > > configuration parameters you have on your config files. > > > > Hope it helps!.. > > > > Regards > > Bejoy.K.S > > > > On Thu, Nov 24, 2011 at 7:09 PM, Michel Segel <[EMAIL PROTECTED] > > >wrote: > > > > > Denis... > > > > > > Sorry, you lost me. > > > > > > Just to make sure we're using the same terminology... > > > The cluster is comprised of two types of nodes... > > > The data nodes which run DN,TT, and if you have HBase, RS. > > > Then there are control nodes which run you NN,SN, JT and if you run > > HBase, > > > HM and ZKs ... > > > > > > Outside of the cluster we have machines set up with Hadoop installed > but > > > are not running any of the processes. They are where our users launch > > there > > > jobs. We call them edge nodes. ( it's not a good idea to let users > > directly > > > on the actual cluster.) > > > > > > Ok, having said all of that... You launch you job from the edge > nodes... > > > Your data sits in HDFS so you don't need distributed cache at all. Does > > > that make sense? > > > You job will run on the local machine, connect to the JT and then run. > > > > > > We set up the edge nodes so that all of the jars, config files are > > already > > > set up for the users and we can better control access... > > > > > > Sent from a remote device. Please excuse any typos... > > > > > > Mike Segel > > > > > > On Nov 24, 2011, at 7:22 AM, Denis Kreis <[EMAIL PROTECTED]> wrote: > > > > > > > Without using the distributed cache i'm getting the same error. It's > > > > because i start the job from a remote client / programmatically > > > > > > > > 2011/11/24 Michel Segel <[EMAIL PROTECTED]>: > > > >> Silly question... Why do you need to use the distributed cache for > the > > > word count program? > > > >> What are you trying to accomplish? > > > >> > > > >> I've only had to play with it for one project where we had to push > out > > > a bunch of c++ code to the nodes as part of a job... > > > >> > > > >> Sent from a remote device. Please excuse any typos... > > > >> > > > >> Mike Segel > > > >> > > > >> On Nov 24, 2011, at 7:05 AM, Denis Kreis <[EMAIL PROTECTED]> > wrote: > > > >> > > > >>> Hi Bejoy > > > >>> > > > >>> 1. Old API: > > > >>> The Map and Reduce classes are the same as in the example, the main > > > >>> method is as follows > > > >>> > > > >>> public static void main(String[] args) throws IOException, > > > >>> InterruptedException { > > > >>> UserGroupInformation ugi > > > >>> UserGroupInformation.createProxyUser("<remote user name>", > > > >>> UserGroupInformation.getLoginUser()); > > > >>> ugi.doAs(new PrivilegedExceptionAction<Void>() { > > > >>> public Void run() throws Exception { > > > >>> > > > >>> JobConf conf = new JobConf(WordCount.class); > > > >>> conf.setJobName("wordcount"); |