|
ilyal levin
2011-09-05, 15:49
Joey Echeverria
2011-09-05, 16:41
ilyal levin
2011-09-05, 19:21
Roger Chen
2011-09-05, 19:50
ilyal levin
2011-09-05, 22:33
ilyal levin
2011-09-05, 23:53
Joey Echeverria
2011-09-06, 00:16
Niels Basjes
2011-09-06, 05:57
ilyal levin
2011-09-06, 07:16
David Rosenstrauch
2011-09-06, 20:26
ilyal levin
2011-09-07, 22:10
David Rosenstrauch
2011-09-07, 22:17
Lance Norskog
2011-09-07, 22:52
ilyal levin
2011-09-08, 09:24
|
-
How to Create an effective chained MapReduce program.ilyal levin 2011-09-05, 15:49
Hi
I'm trying to write a chained mapreduce program. i'm doing so with a simple loop where in each iteration i create a job ,execute it and every time the current job's output is the next job's input. how can i configure the outputFormat of the current job and the inputFormat of the next job so that i will not use the TextInputFormat (TextOutputFormat), because if i do use it, i need to parse the input file in the Map function? i.e if possible i want the next job to "consider" the input file as <key,value> and not plain Text. Thanks a lot.
-
Re: How to Create an effective chained MapReduce program.Joey Echeverria 2011-09-05, 16:41
Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
-Joey On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <[EMAIL PROTECTED]> wrote: > Hi > I'm trying to write a chained mapreduce program. i'm doing so with a simple > loop where in each iteration i > create a job ,execute it and every time the current job's output is the next > job's input. > how can i configure the outputFormat of the current job and the inputFormat > of the next job so that > i will not use the TextInputFormat (TextOutputFormat), because if i do use > it, i need to parse the input file in the Map function? > i.e if possible i want the next job to "consider" the input file as > <key,value> and not plain Text. > Thanks a lot. > > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434
-
Re: How to Create an effective chained MapReduce program.ilyal levin 2011-09-05, 19:21
Thanks for the reply.
I tried it but it creates a binary file which i can not understand (i need the result of the first job). The other thing is how can i use this file in the next chained mapper? i.e how can i retrieve the keys and the values in the map function? Ilyal On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? > > -Joey > > On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <[EMAIL PROTECTED]> > wrote: > > Hi > > I'm trying to write a chained mapreduce program. i'm doing so with a > simple > > loop where in each iteration i > > create a job ,execute it and every time the current job's output is the > next > > job's input. > > how can i configure the outputFormat of the current job and the > inputFormat > > of the next job so that > > i will not use the TextInputFormat (TextOutputFormat), because if i do > use > > it, i need to parse the input file in the Map function? > > i.e if possible i want the next job to "consider" the input file as > > <key,value> and not plain Text. > > Thanks a lot. > > > > > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
-
Re: How to Create an effective chained MapReduce program.Roger Chen 2011-09-05, 19:50
The binary file will allow you to pass the output from the first reducer to
the second mapper. For example, if you outputed Text, IntWritable from the first one in SequenceFileOutputFormat, then you are able to retrieve Text, IntWritable input at the head of the second mapper. The idea of chaining is that you know what kind of output the first reducer is going to give already, and that you want to perform some secondary operation on it. One last thing on chaining jobs: it's often worth looking to see if you can consolidate all of your separate map and reduce tasks into a single map/reduce operation. There are many situations where it is more intuitive to write a number of map/reduce operations and chain them together, but more efficient to have just a single operation. On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <[EMAIL PROTECTED]> wrote: > Thanks for the reply. > I tried it but it creates a binary file which i can not understand (i need > the result of the first job). > The other thing is how can i use this file in the next chained mapper? i.e > how can i retrieve the keys and the values in the map function? > > > Ilyal > > > On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > >> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? >> >> -Joey >> >> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <[EMAIL PROTECTED]> >> wrote: >> > Hi >> > I'm trying to write a chained mapreduce program. i'm doing so with a >> simple >> > loop where in each iteration i >> > create a job ,execute it and every time the current job's output is the >> next >> > job's input. >> > how can i configure the outputFormat of the current job and the >> inputFormat >> > of the next job so that >> > i will not use the TextInputFormat (TextOutputFormat), because if i do >> use >> > it, i need to parse the input file in the Map function? >> > i.e if possible i want the next job to "consider" the input file as >> > <key,value> and not plain Text. >> > Thanks a lot. >> > >> > >> > >> >> >> >> -- >> Joseph Echeverria >> Cloudera, Inc. >> 443.305.9434 >> > > -- Roger Chen UC Davis Genome Center
-
Re: How to Create an effective chained MapReduce program.ilyal levin 2011-09-05, 22:33
Thanks for the help.
On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <[EMAIL PROTECTED]> wrote: > The binary file will allow you to pass the output from the first reducer to > the second mapper. For example, if you outputed Text, IntWritable from the > first one in SequenceFileOutputFormat, then you are able to retrieve Text, > IntWritable input at the head of the second mapper. The idea of chaining is > that you know what kind of output the first reducer is going to give > already, and that you want to perform some secondary operation on it. > > One last thing on chaining jobs: it's often worth looking to see if you can > consolidate all of your separate map and reduce tasks into a single > map/reduce operation. There are many situations where it is more intuitive > to write a number of map/reduce operations and chain them together, but more > efficient to have just a single operation. > > > > On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <[EMAIL PROTECTED]>wrote: > >> Thanks for the reply. >> I tried it but it creates a binary file which i can not understand (i need >> the result of the first job). >> The other thing is how can i use this file in the next chained mapper? i.e >> how can i retrieve the keys and the values in the map function? >> >> >> Ilyal >> >> >> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <[EMAIL PROTECTED]>wrote: >> >>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? >>> >>> -Joey >>> >>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <[EMAIL PROTECTED]> >>> wrote: >>> > Hi >>> > I'm trying to write a chained mapreduce program. i'm doing so with a >>> simple >>> > loop where in each iteration i >>> > create a job ,execute it and every time the current job's output is the >>> next >>> > job's input. >>> > how can i configure the outputFormat of the current job and the >>> inputFormat >>> > of the next job so that >>> > i will not use the TextInputFormat (TextOutputFormat), because if i do >>> use >>> > it, i need to parse the input file in the Map function? >>> > i.e if possible i want the next job to "consider" the input file as >>> > <key,value> and not plain Text. >>> > Thanks a lot. >>> > >>> > >>> > >>> >>> >>> >>> -- >>> Joseph Echeverria >>> Cloudera, Inc. >>> 443.305.9434 >>> >> >> > > > -- > Roger Chen > UC Davis Genome Center >
-
Re: How to Create an effective chained MapReduce program.ilyal levin 2011-09-05, 23:53
o.k , so now i'm using SequenceFileInputFormat and SequenceFileOutputFormat
and it works fine but the output of the reducer is now a binary file (not txt) so i can't understand the data. how can i solve this? i need the data (in txt form ) of the Intermediate stages in the chain. Thanks On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <[EMAIL PROTECTED]> wrote: > Thanks for the help. > > > On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <[EMAIL PROTECTED]> wrote: > >> The binary file will allow you to pass the output from the first reducer >> to the second mapper. For example, if you outputed Text, IntWritable from >> the first one in SequenceFileOutputFormat, then you are able to retrieve >> Text, IntWritable input at the head of the second mapper. The idea of >> chaining is that you know what kind of output the first reducer is going to >> give already, and that you want to perform some secondary operation on it. >> >> One last thing on chaining jobs: it's often worth looking to see if you >> can consolidate all of your separate map and reduce tasks into a single >> map/reduce operation. There are many situations where it is more intuitive >> to write a number of map/reduce operations and chain them together, but more >> efficient to have just a single operation. >> >> >> >> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <[EMAIL PROTECTED]>wrote: >> >>> Thanks for the reply. >>> I tried it but it creates a binary file which i can not understand (i >>> need the result of the first job). >>> The other thing is how can i use this file in the next chained mapper? >>> i.e how can i retrieve the keys and the values in the map function? >>> >>> >>> Ilyal >>> >>> >>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <[EMAIL PROTECTED]>wrote: >>> >>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? >>>> >>>> -Joey >>>> >>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <[EMAIL PROTECTED]> >>>> wrote: >>>> > Hi >>>> > I'm trying to write a chained mapreduce program. i'm doing so with a >>>> simple >>>> > loop where in each iteration i >>>> > create a job ,execute it and every time the current job's output is >>>> the next >>>> > job's input. >>>> > how can i configure the outputFormat of the current job and the >>>> inputFormat >>>> > of the next job so that >>>> > i will not use the TextInputFormat (TextOutputFormat), because if i do >>>> use >>>> > it, i need to parse the input file in the Map function? >>>> > i.e if possible i want the next job to "consider" the input file as >>>> > <key,value> and not plain Text. >>>> > Thanks a lot. >>>> > >>>> > >>>> > >>>> >>>> >>>> >>>> -- >>>> Joseph Echeverria >>>> Cloudera, Inc. >>>> 443.305.9434 >>>> >>> >>> >> >> >> -- >> Roger Chen >> UC Davis Genome Center >> > >
-
Re: How to Create an effective chained MapReduce program.Joey Echeverria 2011-09-06, 00:16
Why do you need to see the intermediate data as text?
What are the types of your key and values? -Joey On Sep 5, 2011 6:54 PM, "ilyal levin" <[EMAIL PROTECTED]> wrote: > o.k , so now i'm using SequenceFileInputFormat and SequenceFileOutputFormat > and it works fine but the output of the reducer is > now a binary file (not txt) so i can't understand the data. how can i solve > this? i need the data (in txt form ) of the Intermediate stages in the > chain. > > Thanks > > On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <[EMAIL PROTECTED]> wrote: > >> Thanks for the help. >> >> >> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <[EMAIL PROTECTED]> wrote: >> >>> The binary file will allow you to pass the output from the first reducer >>> to the second mapper. For example, if you outputed Text, IntWritable from >>> the first one in SequenceFileOutputFormat, then you are able to retrieve >>> Text, IntWritable input at the head of the second mapper. The idea of >>> chaining is that you know what kind of output the first reducer is going to >>> give already, and that you want to perform some secondary operation on it. >>> >>> One last thing on chaining jobs: it's often worth looking to see if you >>> can consolidate all of your separate map and reduce tasks into a single >>> map/reduce operation. There are many situations where it is more intuitive >>> to write a number of map/reduce operations and chain them together, but more >>> efficient to have just a single operation. >>> >>> >>> >>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <[EMAIL PROTECTED] >wrote: >>> >>>> Thanks for the reply. >>>> I tried it but it creates a binary file which i can not understand (i >>>> need the result of the first job). >>>> The other thing is how can i use this file in the next chained mapper? >>>> i.e how can i retrieve the keys and the values in the map function? >>>> >>>> >>>> Ilyal >>>> >>>> >>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <[EMAIL PROTECTED] >wrote: >>>> >>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? >>>>> >>>>> -Joey >>>>> >>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <[EMAIL PROTECTED]> >>>>> wrote: >>>>> > Hi >>>>> > I'm trying to write a chained mapreduce program. i'm doing so with a >>>>> simple >>>>> > loop where in each iteration i >>>>> > create a job ,execute it and every time the current job's output is >>>>> the next >>>>> > job's input. >>>>> > how can i configure the outputFormat of the current job and the >>>>> inputFormat >>>>> > of the next job so that >>>>> > i will not use the TextInputFormat (TextOutputFormat), because if i do >>>>> use >>>>> > it, i need to parse the input file in the Map function? >>>>> > i.e if possible i want the next job to "consider" the input file as >>>>> > <key,value> and not plain Text. >>>>> > Thanks a lot. >>>>> > >>>>> > >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Joseph Echeverria >>>>> Cloudera, Inc. >>>>> 443.305.9434 >>>>> >>>> >>>> >>> >>> >>> -- >>> Roger Chen >>> UC Davis Genome Center >>> >> >>
-
Re: How to Create an effective chained MapReduce program.Niels Basjes 2011-09-06, 05:57
Hi,
In the past i've had the same situation where I needed the data for debugging. Back then I chose to create a second job with simply SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally TextOutputFormat. In my situation that worked great for my purpose. -- Met vriendelijke groet, Niels Basjes Op 6 sep. 2011 01:54 schreef "ilyal levin" <[EMAIL PROTECTED]> het volgende: > > o.k , so now i'm using SequenceFileInputFormat and SequenceFileOutputFormat and it works fine but the output of the reducer is > now a binary file (not txt) so i can't understand the data. how can i solve this? i need the data (in txt form ) of the Intermediate stages in the chain. > > Thanks > > > On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <[EMAIL PROTECTED]> wrote: >> >> Thanks for the help. >> >> >> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <[EMAIL PROTECTED]> wrote: >>> >>> The binary file will allow you to pass the output from the first reducer to the second mapper. For example, if you outputed Text, IntWritable from the first one in SequenceFileOutputFormat, then you are able to retrieve Text, IntWritable input at the head of the second mapper. The idea of chaining is that you know what kind of output the first reducer is going to give already, and that you want to perform some secondary operation on it. >>> >>> One last thing on chaining jobs: it's often worth looking to see if you can consolidate all of your separate map and reduce tasks into a single map/reduce operation. There are many situations where it is more intuitive to write a number of map/reduce operations and chain them together, but more efficient to have just a single operation. >>> >>> >>> >>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <[EMAIL PROTECTED]> wrote: >>>> >>>> Thanks for the reply. >>>> I tried it but it creates a binary file which i can not understand (i need the result of the first job). >>>> The other thing is how can i use this file in the next chained mapper? i.e how can i retrieve the keys and the values in the map function? >>>> >>>> >>>> Ilyal >>>> >>>> >>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: >>>>> >>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? >>>>> >>>>> -Joey >>>>> >>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <[EMAIL PROTECTED]> wrote: >>>>> > Hi >>>>> > I'm trying to write a chained mapreduce program. i'm doing so with a simple >>>>> > loop where in each iteration i >>>>> > create a job ,execute it and every time the current job's output is the next >>>>> > job's input. >>>>> > how can i configure the outputFormat of the current job and the inputFormat >>>>> > of the next job so that >>>>> > i will not use the TextInputFormat (TextOutputFormat), because if i do use >>>>> > it, i need to parse the input file in the Map function? >>>>> > i.e if possible i want the next job to "consider" the input file as >>>>> > <key,value> and not plain Text. >>>>> > Thanks a lot. >>>>> > >>>>> > >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Joseph Echeverria >>>>> Cloudera, Inc. >>>>> 443.305.9434 >>>> >>>> >>> >>> >>> >>> -- >>> Roger Chen >>> UC Davis Genome Center >> >> >
-
Re: How to Create an effective chained MapReduce program.ilyal levin 2011-09-06, 07:16
I need it because the intermediate data is also part of the solution to the
problem my algorithm solve. i somehow need to log this information. The key is Text and the value is ArrayWritable (TextArrayWritable). On Tue, Sep 6, 2011 at 8:57 AM, Niels Basjes <[EMAIL PROTECTED]> wrote: > Hi, > > In the past i've had the same situation where I needed the data for > debugging. Back then I chose to create a second job with simply > SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally > TextOutputFormat. > > In my situation that worked great for my purpose. > > -- > Met vriendelijke groet, > Niels Basjes > > Op 6 sep. 2011 01:54 schreef "ilyal levin" <[EMAIL PROTECTED]> het > volgende: > > > > > o.k , so now i'm using SequenceFileInputFormat > and SequenceFileOutputFormat and it works fine but the output of the reducer > is > > now a binary file (not txt) so i can't understand the data. how can i > solve this? i need the data (in txt form ) of the Intermediate stages in the > chain. > > > > Thanks > > > > > > On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <[EMAIL PROTECTED]> > wrote: > >> > >> Thanks for the help. > >> > >> > >> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <[EMAIL PROTECTED]> > wrote: > >>> > >>> The binary file will allow you to pass the output from the first > reducer to the second mapper. For example, if you outputed Text, IntWritable > from the first one in SequenceFileOutputFormat, then you are able to > retrieve Text, IntWritable input at the head of the second mapper. The idea > of chaining is that you know what kind of output the first reducer is going > to give already, and that you want to perform some secondary operation on > it. > >>> > >>> One last thing on chaining jobs: it's often worth looking to see if you > can consolidate all of your separate map and reduce tasks into a single > map/reduce operation. There are many situations where it is more intuitive > to write a number of map/reduce operations and chain them together, but more > efficient to have just a single operation. > >>> > >>> > >>> > >>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <[EMAIL PROTECTED]> > wrote: > >>>> > >>>> Thanks for the reply. > >>>> I tried it but it creates a binary file which i can not understand (i > need the result of the first job). > >>>> The other thing is how can i use this file in the next chained mapper? > i.e how can i retrieve the keys and the values in the map function? > >>>> > >>>> > >>>> Ilyal > >>>> > >>>> > >>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <[EMAIL PROTECTED]> > wrote: > >>>>> > >>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat? > >>>>> > >>>>> -Joey > >>>>> > >>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <[EMAIL PROTECTED]> > wrote: > >>>>> > Hi > >>>>> > I'm trying to write a chained mapreduce program. i'm doing so with > a simple > >>>>> > loop where in each iteration i > >>>>> > create a job ,execute it and every time the current job's output is > the next > >>>>> > job's input. > >>>>> > how can i configure the outputFormat of the current job and the > inputFormat > >>>>> > of the next job so that > >>>>> > i will not use the TextInputFormat (TextOutputFormat), because if i > do use > >>>>> > it, i need to parse the input file in the Map function? > >>>>> > i.e if possible i want the next job to "consider" the input file as > >>>>> > <key,value> and not plain Text. > >>>>> > Thanks a lot. > >>>>> > > >>>>> > > >>>>> > > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Joseph Echeverria > >>>>> Cloudera, Inc. > >>>>> 443.305.9434 > >>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Roger Chen > >>> UC Davis Genome Center > >> > >> > > > >
-
Re: How to Create an effective chained MapReduce program.David Rosenstrauch 2011-09-06, 20:26
On 09/06/2011 01:57 AM, Niels Basjes wrote:
> Hi, > > In the past i've had the same situation where I needed the data for > debugging. Back then I chose to create a second job with simply > SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally > TextOutputFormat. > > In my situation that worked great for my purpose. I did similar at my last job, but rather than writing a 2nd map/reduce job for this, we just wrote a simple command line app that used the Hadoop Java API to dump the contents of the binary file as text (JSON) to the console. HTH, DR
-
Re: How to Create an effective chained MapReduce program.ilyal levin 2011-09-07, 22:10
Can you be more specific on how to perform this. In general is there a way
to convert the binary files i have to text files? On Tue, Sep 6, 2011 at 11:26 PM, David Rosenstrauch <[EMAIL PROTECTED]>wrote: > On 09/06/2011 01:57 AM, Niels Basjes wrote: > >> Hi, >> >> In the past i've had the same situation where I needed the data for >> debugging. Back then I chose to create a second job with simply >> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally >> TextOutputFormat. >> >> In my situation that worked great for my purpose. >> > > I did similar at my last job, but rather than writing a 2nd map/reduce job > for this, we just wrote a simple command line app that used the Hadoop Java > API to dump the contents of the binary file as text (JSON) to the console. > > HTH, > > DR >
-
Re: How to Create an effective chained MapReduce program.David Rosenstrauch 2011-09-07, 22:17
* open a SequenceFile.Reader on the sequence file
* in a loop, call next(key,val) on the reader to read the next key/val pair in the file (see: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next(org.apache.hadoop.io.Writable,%20org.apache.hadoop.io.Writable) ) * write code to format the key & val into whatever appropriate format you want, and write them to the console * when next(key,val) returns false, exit the loop HTH, DR On 09/07/2011 06:10 PM, ilyal levin wrote: > Can you be more specific on how to perform this. In general is there a way > to convert the binary files i have to text files? > > > > On Tue, Sep 6, 2011 at 11:26 PM, David Rosenstrauch<[EMAIL PROTECTED]>wrote: > >> On 09/06/2011 01:57 AM, Niels Basjes wrote: >> >>> Hi, >>> >>> In the past i've had the same situation where I needed the data for >>> debugging. Back then I chose to create a second job with simply >>> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally >>> TextOutputFormat. >>> >>> In my situation that worked great for my purpose. >>> >> >> I did similar at my last job, but rather than writing a 2nd map/reduce job >> for this, we just wrote a simple command line app that used the Hadoop Java >> API to dump the contents of the binary file as text (JSON) to the console. >> >> HTH, >> >> DR
-
Re: How to Create an effective chained MapReduce program.Lance Norskog 2011-09-07, 22:52
You might find it more easy to understand this if you use one of the
low-level job-scripting languages like Oozie or Hamake. They put the whole assemblage of stuff into one file. On Wed, Sep 7, 2011 at 3:17 PM, David Rosenstrauch <[EMAIL PROTECTED]>wrote: > * open a SequenceFile.Reader on the sequence file > * in a loop, call next(key,val) on the reader to read the next key/val pair > in the file (see: http://hadoop.apache.org/**common/docs/current/api/org/* > *apache/hadoop/io/SequenceFile.**Reader.html#next(org.apache.** > hadoop.io.Writable,%20org.**apache.hadoop.io.Writable)<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next%28org.apache.hadoop.io.Writable,%20org.apache.hadoop.io.Writable%29>) > * write code to format the key & val into whatever appropriate format you > want, and write them to the console > * when next(key,val) returns false, exit the loop > > HTH, > > DR > > > On 09/07/2011 06:10 PM, ilyal levin wrote: > >> Can you be more specific on how to perform this. In general is there a way >> to convert the binary files i have to text files? >> >> >> >> On Tue, Sep 6, 2011 at 11:26 PM, David Rosenstrauch<[EMAIL PROTECTED]** >> >wrote: >> >> On 09/06/2011 01:57 AM, Niels Basjes wrote: >>> >>> Hi, >>>> >>>> In the past i've had the same situation where I needed the data for >>>> debugging. Back then I chose to create a second job with simply >>>> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally >>>> TextOutputFormat. >>>> >>>> In my situation that worked great for my purpose. >>>> >>>> >>> I did similar at my last job, but rather than writing a 2nd map/reduce >>> job >>> for this, we just wrote a simple command line app that used the Hadoop >>> Java >>> API to dump the contents of the binary file as text (JSON) to the >>> console. >>> >>> HTH, >>> >>> DR >>> >> -- Lance Norskog [EMAIL PROTECTED]
-
Re: How to Create an effective chained MapReduce program.ilyal levin 2011-09-08, 09:24
* open a SequenceFile.Reader on the sequence file
* in a loop, call next(key,val) on the reader to read the next key/val pair in the file (see: http://hadoop.apache.org/**common/docs/current/api/org/** apache/hadoop/io/SequenceFile.**Reader.html#next(org.apache.** hadoop.io.Writable,%20org.**apache.hadoop.io.Writable)<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next(org.apache.hadoop.io.Writable,%20org.apache.hadoop.io.Writable)> ) * write code to format the key & val into whatever appropriate format you want, and write them to the console * when next(key,val) returns false, exit the loop did you have some kind of a permission problems when you tried reading the file with SequenceFile.Reader ? i keep getting this error: Exception in thread "main" java.io.FileNotFoundException: C:\cygwin\home\closedItemSet\output0 (Access is denied) and just can't work around it. Thanks On Thu, Sep 8, 2011 at 1:52 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > You might find it more easy to understand this if you use one of the > low-level job-scripting languages like Oozie or Hamake. They put the whole > assemblage of stuff into one file. > > On Wed, Sep 7, 2011 at 3:17 PM, David Rosenstrauch <[EMAIL PROTECTED]>wrote: > >> * open a SequenceFile.Reader on the sequence file >> * in a loop, call next(key,val) on the reader to read the next key/val >> pair in the file (see: http://hadoop.apache.org/** >> common/docs/current/api/org/**apache/hadoop/io/SequenceFile.** >> Reader.html#next(org.apache.**hadoop.io.Writable,%20org.** >> apache.hadoop.io.Writable)<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next%28org.apache.hadoop.io.Writable,%20org.apache.hadoop.io.Writable%29>) >> * write code to format the key & val into whatever appropriate format you >> want, and write them to the console >> * when next(key,val) returns false, exit the loop >> >> HTH, >> >> DR >> >> >> On 09/07/2011 06:10 PM, ilyal levin wrote: >> >>> Can you be more specific on how to perform this. In general is there a >>> way >>> to convert the binary files i have to text files? >>> >>> >>> >>> On Tue, Sep 6, 2011 at 11:26 PM, David Rosenstrauch<[EMAIL PROTECTED]** >>> >wrote: >>> >>> On 09/06/2011 01:57 AM, Niels Basjes wrote: >>>> >>>> Hi, >>>>> >>>>> In the past i've had the same situation where I needed the data for >>>>> debugging. Back then I chose to create a second job with simply >>>>> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally >>>>> TextOutputFormat. >>>>> >>>>> In my situation that worked great for my purpose. >>>>> >>>>> >>>> I did similar at my last job, but rather than writing a 2nd map/reduce >>>> job >>>> for this, we just wrote a simple command line app that used the Hadoop >>>> Java >>>> API to dump the contents of the binary file as text (JSON) to the >>>> console. >>>> >>>> HTH, >>>> >>>> DR >>>> >>> > > > -- > Lance Norskog > [EMAIL PROTECTED] > > |