|
Roger Chen
2011-07-29, 17:26
Roger Chen
2011-07-29, 18:05
Mapred Learn
2011-07-29, 18:09
Roger Chen
2011-07-29, 18:11
Mapred Learn
2011-07-29, 18:11
Mapred Learn
2011-07-29, 18:18
Arindam Khaled
2011-07-29, 18:55
Roger Chen
2011-07-29, 20:51
Mohit Anchlia
2011-07-29, 20:59
Roger Chen
2011-07-29, 21:51
Roger Chen
2011-07-29, 23:22
Michael Segel
2011-07-30, 02:43
Michael Segel
2011-07-30, 02:49
Allen Wittenauer
2011-08-01, 02:21
Michael Segel
2011-08-01, 12:24
|
-
Moving Files to Distributed Cache in MapReduceRoger Chen 2011-07-29, 17:26
Hi all,
Does anybody have examples of how one moves files from the local filestructure/HDFS to the distributed cache in MapReduce? A Google search turned up examples in Pig but not MR. -- Roger Chen UC Davis Genome Center
-
Re: Moving Files to Distributed Cache in MapReduceRoger Chen 2011-07-29, 18:05
Slight modification: I now know how to add files to the distributed file
cache, which can be done via this command placed in the main or run class: DistributedCache.addCacheFile(new URI("/user/hadoop/thefile.dat"), conf); However I am still having trouble locating the file in the distributed cache. *How do I call the file path of thefile.dat in the distributed cache as a string?* I am using Hadoop 0.20.2 On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED]> wrote: > Hi all, > > Does anybody have examples of how one moves files from the local > filestructure/HDFS to the distributed cache in MapReduce? A Google search > turned up examples in Pig but not MR. > > -- > Roger Chen > UC Davis Genome Center > -- Roger Chen UC Davis Genome Center
-
Re: Moving Files to Distributed Cache in MapReduceMapred Learn 2011-07-29, 18:09
Did you try using -files option in your hadoop jar command as:
/usr/bin/hadoop jar <jar name> <main class name> -files <absolute path of file to be added to distributed cache> <input dir> <output dir> On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> wrote: > Slight modification: I now know how to add files to the distributed file > cache, which can be done via this command placed in the main or run class: > > DistributedCache.addCacheFile(new URI("/user/hadoop/thefile.dat"), > conf); > > However I am still having trouble locating the file in the distributed > cache. *How do I call the file path of thefile.dat in the distributed cache > as a string?* I am using Hadoop 0.20.2 > > > On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > > Does anybody have examples of how one moves files from the local > > filestructure/HDFS to the distributed cache in MapReduce? A Google search > > turned up examples in Pig but not MR. > > > > -- > > Roger Chen > > UC Davis Genome Center > > > > > > -- > Roger Chen > UC Davis Genome Center >
-
Re: Moving Files to Distributed Cache in MapReduceRoger Chen 2011-07-29, 18:11
After moving it to the distributed cache, how would I call it within my
MapReduce program? On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <[EMAIL PROTECTED]>wrote: > Did you try using -files option in your hadoop jar command as: > > /usr/bin/hadoop jar <jar name> <main class name> -files <absolute path of > file to be added to distributed cache> <input dir> <output dir> > > > On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> wrote: > > > Slight modification: I now know how to add files to the distributed file > > cache, which can be done via this command placed in the main or run > class: > > > > DistributedCache.addCacheFile(new URI("/user/hadoop/thefile.dat"), > > conf); > > > > However I am still having trouble locating the file in the distributed > > cache. *How do I call the file path of thefile.dat in the distributed > cache > > as a string?* I am using Hadoop 0.20.2 > > > > > > On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED]> > wrote: > > > > > Hi all, > > > > > > Does anybody have examples of how one moves files from the local > > > filestructure/HDFS to the distributed cache in MapReduce? A Google > search > > > turned up examples in Pig but not MR. > > > > > > -- > > > Roger Chen > > > UC Davis Genome Center > > > > > > > > > > > -- > > Roger Chen > > UC Davis Genome Center > > > -- Roger Chen UC Davis Genome Center
-
Re: Moving Files to Distributed Cache in MapReduceMapred Learn 2011-07-29, 18:11
ok for accessing it in mapper code, u can do something like:
On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <[EMAIL PROTECTED]>wrote: > Did you try using -files option in your hadoop jar command as: > > /usr/bin/hadoop jar <jar name> <main class name> -files <absolute path of > file to be added to distributed cache> <input dir> <output dir> > > Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); > > String fileName=""; > for (Path p : cacheFiles) { > > if (p != null) { > fileName = p.getName(); > } > > } > > On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> wrote: > >> Slight modification: I now know how to add files to the distributed file >> cache, which can be done via this command placed in the main or run class: >> >> DistributedCache.addCacheFile(new URI("/user/hadoop/thefile.dat"), >> conf); >> >> However I am still having trouble locating the file in the distributed >> cache. *How do I call the file path of thefile.dat in the distributed >> cache >> as a string?* I am using Hadoop 0.20.2 >> >> >> On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED]> wrote: >> >> > Hi all, >> > >> > Does anybody have examples of how one moves files from the local >> > filestructure/HDFS to the distributed cache in MapReduce? A Google >> search >> > turned up examples in Pig but not MR. >> > >> > -- >> > Roger Chen >> > UC Davis Genome Center >> > >> >> >> >> -- >> Roger Chen >> UC Davis Genome Center >> > >
-
Re: Moving Files to Distributed Cache in MapReduceMapred Learn 2011-07-29, 18:18
I hope my previous reply helps...
On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]> wrote: > After moving it to the distributed cache, how would I call it within my > MapReduce program? > > On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <[EMAIL PROTECTED] > >wrote: > > > Did you try using -files option in your hadoop jar command as: > > > > /usr/bin/hadoop jar <jar name> <main class name> -files <absolute path > of > > file to be added to distributed cache> <input dir> <output dir> > > > > > > On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> > wrote: > > > > > Slight modification: I now know how to add files to the distributed > file > > > cache, which can be done via this command placed in the main or run > > class: > > > > > > DistributedCache.addCacheFile(new > URI("/user/hadoop/thefile.dat"), > > > conf); > > > > > > However I am still having trouble locating the file in the distributed > > > cache. *How do I call the file path of thefile.dat in the distributed > > cache > > > as a string?* I am using Hadoop 0.20.2 > > > > > > > > > On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED]> > > wrote: > > > > > > > Hi all, > > > > > > > > Does anybody have examples of how one moves files from the local > > > > filestructure/HDFS to the distributed cache in MapReduce? A Google > > search > > > > turned up examples in Pig but not MR. > > > > > > > > -- > > > > Roger Chen > > > > UC Davis Genome Center > > > > > > > > > > > > > > > > -- > > > Roger Chen > > > UC Davis Genome Center > > > > > > > > > -- > Roger Chen > UC Davis Genome Center >
-
Re: Moving Files to Distributed Cache in MapReduceArindam Khaled 2011-07-29, 18:55
Please unsubscribe me.
On Jul 29, 2011, at 1:18 PM, Mapred Learn wrote: > I hope my previous reply helps... > > On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]> > wrote: > >> After moving it to the distributed cache, how would I call it >> within my >> MapReduce program? >> >> On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn >> <[EMAIL PROTECTED] >>> wrote: >> >>> Did you try using -files option in your hadoop jar command as: >>> >>> /usr/bin/hadoop jar <jar name> <main class name> -files <absolute >>> path >> of >>> file to be added to distributed cache> <input dir> <output dir> >>> >>> >>> On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> >> wrote: >>> >>>> Slight modification: I now know how to add files to the distributed >> file >>>> cache, which can be done via this command placed in the main or run >>> class: >>>> >>>> DistributedCache.addCacheFile(new >> URI("/user/hadoop/thefile.dat"), >>>> conf); >>>> >>>> However I am still having trouble locating the file in the >>>> distributed >>>> cache. *How do I call the file path of thefile.dat in the >>>> distributed >>> cache >>>> as a string?* I am using Hadoop 0.20.2 >>>> >>>> >>>> On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED]> >>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> Does anybody have examples of how one moves files from the local >>>>> filestructure/HDFS to the distributed cache in MapReduce? A Google >>> search >>>>> turned up examples in Pig but not MR. >>>>> >>>>> -- >>>>> Roger Chen >>>>> UC Davis Genome Center >>>>> >>>> >>>> >>>> >>>> -- >>>> Roger Chen >>>> UC Davis Genome Center >>>> >>> >> >> >> >> -- >> Roger Chen >> UC Davis Genome Center >>
-
Re: Moving Files to Distributed Cache in MapReduceRoger Chen 2011-07-29, 20:51
Thanks for the response! However, I'm having an issue with this line
Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); because conf has private access in org.apache.hadoop.configured On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <[EMAIL PROTECTED]>wrote: > I hope my previous reply helps... > > On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]> wrote: > > > After moving it to the distributed cache, how would I call it within my > > MapReduce program? > > > > On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <[EMAIL PROTECTED] > > >wrote: > > > > > Did you try using -files option in your hadoop jar command as: > > > > > > /usr/bin/hadoop jar <jar name> <main class name> -files <absolute path > > of > > > file to be added to distributed cache> <input dir> <output dir> > > > > > > > > > On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> > > wrote: > > > > > > > Slight modification: I now know how to add files to the distributed > > file > > > > cache, which can be done via this command placed in the main or run > > > class: > > > > > > > > DistributedCache.addCacheFile(new > > URI("/user/hadoop/thefile.dat"), > > > > conf); > > > > > > > > However I am still having trouble locating the file in the > distributed > > > > cache. *How do I call the file path of thefile.dat in the distributed > > > cache > > > > as a string?* I am using Hadoop 0.20.2 > > > > > > > > > > > > On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > Does anybody have examples of how one moves files from the local > > > > > filestructure/HDFS to the distributed cache in MapReduce? A Google > > > search > > > > > turned up examples in Pig but not MR. > > > > > > > > > > -- > > > > > Roger Chen > > > > > UC Davis Genome Center > > > > > > > > > > > > > > > > > > > > > -- > > > > Roger Chen > > > > UC Davis Genome Center > > > > > > > > > > > > > > > -- > > Roger Chen > > UC Davis Genome Center > > > -- Roger Chen UC Davis Genome Center
-
Re: Moving Files to Distributed Cache in MapReduceMohit Anchlia 2011-07-29, 20:59
Is this what you are looking for?
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html search for jobConf On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <[EMAIL PROTECTED]> wrote: > Thanks for the response! However, I'm having an issue with this line > > Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); > > because conf has private access in org.apache.hadoop.configured > > On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <[EMAIL PROTECTED]>wrote: > >> I hope my previous reply helps... >> >> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]> wrote: >> >> > After moving it to the distributed cache, how would I call it within my >> > MapReduce program? >> > >> > On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <[EMAIL PROTECTED] >> > >wrote: >> > >> > > Did you try using -files option in your hadoop jar command as: >> > > >> > > /usr/bin/hadoop jar <jar name> <main class name> -files <absolute path >> > of >> > > file to be added to distributed cache> <input dir> <output dir> >> > > >> > > >> > > On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> >> > wrote: >> > > >> > > > Slight modification: I now know how to add files to the distributed >> > file >> > > > cache, which can be done via this command placed in the main or run >> > > class: >> > > > >> > > > DistributedCache.addCacheFile(new >> > URI("/user/hadoop/thefile.dat"), >> > > > conf); >> > > > >> > > > However I am still having trouble locating the file in the >> distributed >> > > > cache. *How do I call the file path of thefile.dat in the distributed >> > > cache >> > > > as a string?* I am using Hadoop 0.20.2 >> > > > >> > > > >> > > > On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED]> >> > > wrote: >> > > > >> > > > > Hi all, >> > > > > >> > > > > Does anybody have examples of how one moves files from the local >> > > > > filestructure/HDFS to the distributed cache in MapReduce? A Google >> > > search >> > > > > turned up examples in Pig but not MR. >> > > > > >> > > > > -- >> > > > > Roger Chen >> > > > > UC Davis Genome Center >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Roger Chen >> > > > UC Davis Genome Center >> > > > >> > > >> > >> > >> > >> > -- >> > Roger Chen >> > UC Davis Genome Center >> > >> > > > > -- > Roger Chen > UC Davis Genome Center >
-
Re: Moving Files to Distributed Cache in MapReduceRoger Chen 2011-07-29, 21:51
jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
Configuration for that On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > Is this what you are looking for? > > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html > > search for jobConf > > On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <[EMAIL PROTECTED]> wrote: > > Thanks for the response! However, I'm having an issue with this line > > > > Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); > > > > because conf has private access in org.apache.hadoop.configured > > > > On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <[EMAIL PROTECTED] > >wrote: > > > >> I hope my previous reply helps... > >> > >> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]> > wrote: > >> > >> > After moving it to the distributed cache, how would I call it within > my > >> > MapReduce program? > >> > > >> > On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn < > [EMAIL PROTECTED] > >> > >wrote: > >> > > >> > > Did you try using -files option in your hadoop jar command as: > >> > > > >> > > /usr/bin/hadoop jar <jar name> <main class name> -files <absolute > path > >> > of > >> > > file to be added to distributed cache> <input dir> <output dir> > >> > > > >> > > > >> > > On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> > >> > wrote: > >> > > > >> > > > Slight modification: I now know how to add files to the > distributed > >> > file > >> > > > cache, which can be done via this command placed in the main or > run > >> > > class: > >> > > > > >> > > > DistributedCache.addCacheFile(new > >> > URI("/user/hadoop/thefile.dat"), > >> > > > conf); > >> > > > > >> > > > However I am still having trouble locating the file in the > >> distributed > >> > > > cache. *How do I call the file path of thefile.dat in the > distributed > >> > > cache > >> > > > as a string?* I am using Hadoop 0.20.2 > >> > > > > >> > > > > >> > > > On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED] > > > >> > > wrote: > >> > > > > >> > > > > Hi all, > >> > > > > > >> > > > > Does anybody have examples of how one moves files from the local > >> > > > > filestructure/HDFS to the distributed cache in MapReduce? A > >> > > search > >> > > > > turned up examples in Pig but not MR. > >> > > > > > >> > > > > -- > >> > > > > Roger Chen > >> > > > > UC Davis Genome Center > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > -- > >> > > > Roger Chen > >> > > > UC Davis Genome Center > >> > > > > >> > > > >> > > >> > > >> > > >> > -- > >> > Roger Chen > >> > UC Davis Genome Center > >> > > >> > > > > > > > > -- > > Roger Chen > > UC Davis Genome Center > > > -- Roger Chen UC Davis Genome Center
-
Re: Moving Files to Distributed Cache in MapReduceRoger Chen 2011-07-29, 23:22
Hi all, I have now resolved my issue by doing a try/catch statement. Thanks
for all the help! On Fri, Jul 29, 2011 at 2:51 PM, Roger Chen <[EMAIL PROTECTED]> wrote: > jobConf is deprecated in 0.20.2 I believe; you're supposed to be using > Configuration for that > > > On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >> Is this what you are looking for? >> >> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html >> >> search for jobConf >> >> On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <[EMAIL PROTECTED]> wrote: >> > Thanks for the response! However, I'm having an issue with this line >> > >> > Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); >> > >> > because conf has private access in org.apache.hadoop.configured >> > >> > On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <[EMAIL PROTECTED] >> >wrote: >> > >> >> I hope my previous reply helps... >> >> >> >> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]> >> wrote: >> >> >> >> > After moving it to the distributed cache, how would I call it within >> my >> >> > MapReduce program? >> >> > >> >> > On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn < >> [EMAIL PROTECTED] >> >> > >wrote: >> >> > >> >> > > Did you try using -files option in your hadoop jar command as: >> >> > > >> >> > > /usr/bin/hadoop jar <jar name> <main class name> -files <absolute >> path >> >> > of >> >> > > file to be added to distributed cache> <input dir> <output dir> >> >> > > >> >> > > >> >> > > On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> >> >> > wrote: >> >> > > >> >> > > > Slight modification: I now know how to add files to the >> distributed >> >> > file >> >> > > > cache, which can be done via this command placed in the main or >> run >> >> > > class: >> >> > > > >> >> > > > DistributedCache.addCacheFile(new >> >> > URI("/user/hadoop/thefile.dat"), >> >> > > > conf); >> >> > > > >> >> > > > However I am still having trouble locating the file in the >> >> distributed >> >> > > > cache. *How do I call the file path of thefile.dat in the >> distributed >> >> > > cache >> >> > > > as a string?* I am using Hadoop 0.20.2 >> >> > > > >> >> > > > >> >> > > > On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen < >> [EMAIL PROTECTED]> >> >> > > wrote: >> >> > > > >> >> > > > > Hi all, >> >> > > > > >> >> > > > > Does anybody have examples of how one moves files from the >> local >> >> > > > > filestructure/HDFS to the distributed cache in MapReduce? A >> >> > > search >> >> > > > > turned up examples in Pig but not MR. >> >> > > > > >> >> > > > > -- >> >> > > > > Roger Chen >> >> > > > > UC Davis Genome Center >> >> > > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > -- >> >> > > > Roger Chen >> >> > > > UC Davis Genome Center >> >> > > > >> >> > > >> >> > >> >> > >> >> > >> >> > -- >> >> > Roger Chen >> >> > UC Davis Genome Center >> >> > >> >> >> > >> > >> > >> > -- >> > Roger Chen >> > UC Davis Genome Center >> > >> > > > > -- > Roger Chen > UC Davis Genome Center > -- Roger Chen UC Davis Genome Center
-
RE: Moving Files to Distributed Cache in MapReduceMichael Segel 2011-07-30, 02:43
I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache. > Date: Fri, 29 Jul 2011 14:51:26 -0700 > Subject: Re: Moving Files to Distributed Cache in MapReduce > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > jobConf is deprecated in 0.20.2 I believe; you're supposed to be using > Configuration for that > > On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > > > Is this what you are looking for? > > > > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html > > > > search for jobConf > > > > On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <[EMAIL PROTECTED]> wrote: > > > Thanks for the response! However, I'm having an issue with this line > > > > > > Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); > > > > > > because conf has private access in org.apache.hadoop.configured > > > > > > On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <[EMAIL PROTECTED] > > >wrote: > > > > > >> I hope my previous reply helps... > > >> > > >> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]> > > wrote: > > >> > > >> > After moving it to the distributed cache, how would I call it within > > my > > >> > MapReduce program? > > >> > > > >> > On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn < > > [EMAIL PROTECTED] > > >> > >wrote: > > >> > > > >> > > Did you try using -files option in your hadoop jar command as: > > >> > > > > >> > > /usr/bin/hadoop jar <jar name> <main class name> -files <absolute > > path > > >> > of > > >> > > file to be added to distributed cache> <input dir> <output dir> > > >> > > > > >> > > > > >> > > On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> > > >> > wrote: > > >> > > > > >> > > > Slight modification: I now know how to add files to the > > distributed > > >> > file > > >> > > > cache, which can be done via this command placed in the main or > > run > > >> > > class: > > >> > > > > > >> > > > DistributedCache.addCacheFile(new > > >> > URI("/user/hadoop/thefile.dat"), > > >> > > > conf); > > >> > > > > > >> > > > However I am still having trouble locating the file in the > > >> distributed > > >> > > > cache. *How do I call the file path of thefile.dat in the > > distributed > > >> > > cache > > >> > > > as a string?* I am using Hadoop 0.20.2 > > >> > > > > > >> > > > > > >> > > > On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED] > > > > > >> > > wrote: > > >> > > > > > >> > > > > Hi all, > > >> > > > > > > >> > > > > Does anybody have examples of how one moves files from the local > > >> > > > > filestructure/HDFS to the distributed cache in MapReduce? A > > >> > > search > > >> > > > > turned up examples in Pig but not MR. > > >> > > > > > > >> > > > > -- > > >> > > > > Roger Chen > > >> > > > > UC Davis Genome Center > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > Roger Chen > > >> > > > UC Davis Genome Center > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > Roger Chen > > >> > UC Davis Genome Center > > >> > > > >> > > > > > > > > > > > > -- > > > Roger Chen > > > UC Davis Genome Center > > > > > > > > > -- > Roger Chen > UC Davis Genome Center
-
RE: Moving Files to Distributed Cache in MapReduceMichael Segel 2011-07-30, 02:49
Here's the meat of my post earlier... Sample code on putting a file on the cache: DistributedCache.addCacheFile(new URI(path+"MyFileName",conf)); Sample code in pulling data off the cache: private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration()); boolean exitProcess = false; int i=0; while (!exit){ fileName = localFiles[i].getName(); if (fileName.equalsIgnoreCase("model.txt")){ // Build your input file reader on localFiles[i].toString() exitProcess = true; } i++; } Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[]. Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true". When you build your file reader you need the full path, not just the file name. The path will vary when the job runs. HTH -Mike > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: RE: Moving Files to Distributed Cache in MapReduce > Date: Fri, 29 Jul 2011 21:43:37 -0500 > > > I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache. > > > > Date: Fri, 29 Jul 2011 14:51:26 -0700 > > Subject: Re: Moving Files to Distributed Cache in MapReduce > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED] > > > > jobConf is deprecated in 0.20.2 I believe; you're supposed to be using > > Configuration for that > > > > On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > > > > > Is this what you are looking for? > > > > > > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html > > > > > > search for jobConf > > > > > > On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <[EMAIL PROTECTED]> wrote: > > > > Thanks for the response! However, I'm having an issue with this line > > > > > > > > Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); > > > > > > > > because conf has private access in org.apache.hadoop.configured > > > > > > > > On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <[EMAIL PROTECTED] > > > >wrote: > > > > > > > >> I hope my previous reply helps... > > > >> > > > >> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]> > > > wrote: > > > >> > > > >> > After moving it to the distributed cache, how would I call it within > > > my > > > >> > MapReduce program? > > > >> > > > > >> > On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn < > > > [EMAIL PROTECTED] > > > >> > >wrote: > > > >> > > > > >> > > Did you try using -files option in your hadoop jar command as: > > > >> > > > > > >> > > /usr/bin/hadoop jar <jar name> <main class name> -files <absolute > > > path > > > >> > of > > > >> > > file to be added to distributed cache> <input dir> <output dir> > > > >> > > > > > >> > > > > > >> > > On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> > > > >> > wrote: > > > >> > > > > > >> > > > Slight modification: I now know how to add files to the > > > distributed > > > >> > file > > > >> > > > cache, which can be done via this command placed in the main or > > > run > > > >> > > class: > > > >> > > > > > > >> > > > DistributedCache.addCacheFile(new > > > >> > URI("/user/hadoop/thefile.dat"), > > > >> > > > conf); > > > >> > > > > > > >> > > > However I am still having trouble locating the file in the > > > >> distributed > > > >> > > > cache. *How do I call the file path of thefile.dat in the > > > distributed > > > >> > > cache > > > >> > > > as a string?* I am using Hadoop 0.20.2 > > > >> > > > > > > >> > > > > > > >> > > > On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED] > > > > > > > >> > > wrote: > > > >> > > > > > > >> > > > > Hi all, > > > >> > > > > > > > >> > > > > Does anybody have examples of how one moves files from the local > > > >> >
-
Re: Moving Files to Distributed Cache in MapReduceAllen Wittenauer 2011-08-01, 02:21
We really need to build a working example to the wiki and add a link from the FAQ page. Any volunteers? On Jul 29, 2011, at 7:49 PM, Michael Segel wrote: > > Here's the meat of my post earlier... > Sample code on putting a file on the cache: > DistributedCache.addCacheFile(new URI(path+"MyFileName",conf)); > > Sample code in pulling data off the cache: > private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration()); > boolean exitProcess = false; > int i=0; > while (!exit){ > fileName = localFiles[i].getName(); > if (fileName.equalsIgnoreCase("model.txt")){ > // Build your input file reader on localFiles[i].toString() > exitProcess = true; > } > i++; > } > > > Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[]. > Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true". > > When you build your file reader you need the full path, not just the file name. The path will vary when the job runs. > > HTH > > -Mike > > >> From: [EMAIL PROTECTED] >> To: [EMAIL PROTECTED] >> Subject: RE: Moving Files to Distributed Cache in MapReduce >> Date: Fri, 29 Jul 2011 21:43:37 -0500 >> >> >> I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache. >> >> >>> Date: Fri, 29 Jul 2011 14:51:26 -0700 >>> Subject: Re: Moving Files to Distributed Cache in MapReduce >>> From: [EMAIL PROTECTED] >>> To: [EMAIL PROTECTED] >>> >>> jobConf is deprecated in 0.20.2 I believe; you're supposed to be using >>> Configuration for that >>> >>> On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: >>> >>>> Is this what you are looking for? >>>> >>>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html >>>> >>>> search for jobConf >>>> >>>> On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <[EMAIL PROTECTED]> wrote: >>>>> Thanks for the response! However, I'm having an issue with this line >>>>> >>>>> Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); >>>>> >>>>> because conf has private access in org.apache.hadoop.configured >>>>> >>>>> On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <[EMAIL PROTECTED] >>>>> wrote: >>>>> >>>>>> I hope my previous reply helps... >>>>>> >>>>>> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]> >>>> wrote: >>>>>> >>>>>>> After moving it to the distributed cache, how would I call it within >>>> my >>>>>>> MapReduce program? >>>>>>> >>>>>>> On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn < >>>> [EMAIL PROTECTED] >>>>>>>> wrote: >>>>>>> >>>>>>>> Did you try using -files option in your hadoop jar command as: >>>>>>>> >>>>>>>> /usr/bin/hadoop jar <jar name> <main class name> -files <absolute >>>> path >>>>>>> of >>>>>>>> file to be added to distributed cache> <input dir> <output dir> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Slight modification: I now know how to add files to the >>>> distributed >>>>>>> file >>>>>>>>> cache, which can be done via this command placed in the main or >>>> run >>>>>>>> class: >>>>>>>>> >>>>>>>>> DistributedCache.addCacheFile(new >>>>>>> URI("/user/hadoop/thefile.dat"), >>>>>>>>> conf); >>>>>>>>> >>>>>>>>> However I am still having trouble locating the file in the >>>>>> distributed >>>>>>>>> cache. *How do I call the file path of thefile.dat in the >>>> distributed >>>>>>>> cache >>>>>>>>> as a string?* I am using Hadoop 0.20.2 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <[EMAIL PROTECTED] >>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Does anybody have examples of how one moves files from the local
-
RE: Moving Files to Distributed Cache in MapReduceMichael Segel 2011-08-01, 12:24
Yeah, I'll write something up and post it on my web site. Definitely not InfoQ stuff, but a simple tip and tricks stuff. -Mike > Subject: Re: Moving Files to Distributed Cache in MapReduce > From: [EMAIL PROTECTED] > Date: Sun, 31 Jul 2011 19:21:14 -0700 > To: [EMAIL PROTECTED] > > > We really need to build a working example to the wiki and add a link from the FAQ page. Any volunteers? > > On Jul 29, 2011, at 7:49 PM, Michael Segel wrote: > > > > > Here's the meat of my post earlier... > > Sample code on putting a file on the cache: > > DistributedCache.addCacheFile(new URI(path+"MyFileName",conf)); > > > > Sample code in pulling data off the cache: > > private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration()); > > boolean exitProcess = false; > > int i=0; > > while (!exit){ > > fileName = localFiles[i].getName(); > > if (fileName.equalsIgnoreCase("model.txt")){ > > // Build your input file reader on localFiles[i].toString() > > exitProcess = true; > > } > > i++; > > } > > > > > > Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[]. > > Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true". > > > > When you build your file reader you need the full path, not just the file name. The path will vary when the job runs. > > > > HTH > > > > -Mike > > > > > >> From: [EMAIL PROTECTED] > >> To: [EMAIL PROTECTED] > >> Subject: RE: Moving Files to Distributed Cache in MapReduce > >> Date: Fri, 29 Jul 2011 21:43:37 -0500 > >> > >> > >> I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache. > >> > >> > >>> Date: Fri, 29 Jul 2011 14:51:26 -0700 > >>> Subject: Re: Moving Files to Distributed Cache in MapReduce > >>> From: [EMAIL PROTECTED] > >>> To: [EMAIL PROTECTED] > >>> > >>> jobConf is deprecated in 0.20.2 I believe; you're supposed to be using > >>> Configuration for that > >>> > >>> On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >>> > >>>> Is this what you are looking for? > >>>> > >>>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html > >>>> > >>>> search for jobConf > >>>> > >>>> On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <[EMAIL PROTECTED]> wrote: > >>>>> Thanks for the response! However, I'm having an issue with this line > >>>>> > >>>>> Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf); > >>>>> > >>>>> because conf has private access in org.apache.hadoop.configured > >>>>> > >>>>> On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <[EMAIL PROTECTED] > >>>>> wrote: > >>>>> > >>>>>> I hope my previous reply helps... > >>>>>> > >>>>>> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]> > >>>> wrote: > >>>>>> > >>>>>>> After moving it to the distributed cache, how would I call it within > >>>> my > >>>>>>> MapReduce program? > >>>>>>> > >>>>>>> On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn < > >>>> [EMAIL PROTECTED] > >>>>>>>> wrote: > >>>>>>> > >>>>>>>> Did you try using -files option in your hadoop jar command as: > >>>>>>>> > >>>>>>>> /usr/bin/hadoop jar <jar name> <main class name> -files <absolute > >>>> path > >>>>>>> of > >>>>>>>> file to be added to distributed cache> <input dir> <output dir> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Slight modification: I now know how to add files to the > >>>> distributed > >>>>>>> file > >>>>>>>>> cache, which can be done via this command placed in the main or > >>>> run > >>>>>>>> class: > >>>>>>>>> > >>>>>>>>> DistributedCache.addCacheFile(new > >>>>>>> URI("/user/hadoop/thefile.dat"), |