|
sharat attupurath
2012-05-04, 16:06
Steve Lewis
2012-05-04, 16:22
sharat attupurath
2012-05-04, 16:54
Steve Lewis
2012-05-04, 20:15
sharat attupurath
2012-05-05, 15:08
Steve Lewis
2012-05-05, 15:20
sharat attupurath
2012-05-05, 15:37
Steve Lewis
2012-05-05, 20:16
sharat attupurath
2012-05-06, 03:11
GUOJUN Zhu
2012-05-07, 14:54
Steve Lewis
2012-05-07, 15:17
GUOJUN Zhu
2012-05-07, 16:03
Steve Lewis
2012-05-07, 16:24
sharat attupurath
2012-05-08, 10:55
Steve Lewis
2012-05-08, 14:47
sharat attupurath
2012-05-08, 15:32
Steve Lewis
2012-05-08, 16:22
sharat attupurath
2012-05-08, 16:25
|
-
Ant Colony Optimization for Travelling Salesman Problem in Hadoopsharat attupurath 2012-05-04, 16:06
Hi, We are trying to parallelize the ant colony optimization algorithm for TSP over hadoop and are facing some issues. We are using TSPLIB as input files. The input is a text file containing eucledian coordinates of the cities - first column is city number and the next two columns contain the x and y coordinates respectively. What we intend to do is to take input from this single file, send copies of it to multiple mappers (each mapper acts like the ant in the algorithm), each mapper works on its input to find its own TSP solution that it outputs and finally the reducer outputs the smallest tour found by the mappers. Hope we are in the right track. Here are the issues: 1) Since the input file is small, we need to force hadoop to fire up multiple map tasks by replicating the input. How can we make an InputSplit of the whole file and replicate it so that the input can be sent to multiple mappers? 2) the algorithm uses a shared pheromone array and each mapper needs to read and write data from this. How can we share the pheromone data across the mappers. Hope the questions are clear enough. Any help would be greatly appreciated. Thank you Regards Sharat +
sharat attupurath 2012-05-04, 16:06
-
Re: Ant Colony Optimization for Travelling Salesman Problem in HadoopSteve Lewis 2012-05-04, 16:22
On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <[EMAIL PROTECTED]>wrote:
> Hi, > > We are trying to parallelize the ant colony optimization algorithm for TSP > over hadoop and are facing some issues. We are using TSPLIB as input files. > The input is a text file containing eucledian coordinates of the cities - > first column is city number and the next two columns contain the x and y > coordinates respectively. > > What we intend to do is to take input from this single file, send copies > of it to multiple mappers (each mapper acts like the ant in the algorithm), > each mapper works on its input to find its own TSP solution that it outputs > and finally the reducer outputs the smallest tour found by the mappers. > Hope we are in the right track. Here are the issues: > > 1) Since the input file is small, we need to force hadoop to fire up > multiple map tasks by replicating the input. How can we make an InputSplit > of the whole file and replicate it so that the input can be sent to > multiple mappers? > Write a custom splitter sending the same data to all mappers - the only critical criteria is the number of "Ants" > > 2) the algorithm uses a shared pheromone array and each mapper needs to > read and write data from this. How can we share the pheromone data across > the mappers. > You cannot share data across mappers and should not attempt to do so - Better to use the reducer(s) to combine trails combine first pass trails and then pass the combined trails to another mapreduce job - this with the original problem plus the current pheremone trails > > Hope the questions are clear enough. Any help would be greatly > appreciated. > > Thank you > > Regards > > Sharat > -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com +
Steve Lewis 2012-05-04, 16:22
-
RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoopsharat attupurath 2012-05-04, 16:54
Hi, Thanks a lot for the quick reply sir! We are new to Apache Hadoop and haven't yet understood it properly yet. Can you please elaborate on how we can have multiple stages of mapreduce jobs for combining the trails as you have mentioned? We have been trying to find out how to write a custom splitter and almost all online resources tell to subclass the FileInputFormat and write only a custom RecordReader? Will it be possible to generate our splits in that way? Regards Sharat Date: Fri, 4 May 2012 09:22:51 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, We are trying to parallelize the ant colony optimization algorithm for TSP over hadoop and are facing some issues. We are using TSPLIB as input files. The input is a text file containing eucledian coordinates of the cities - first column is city number and the next two columns contain the x and y coordinates respectively. What we intend to do is to take input from this single file, send copies of it to multiple mappers (each mapper acts like the ant in the algorithm), each mapper works on its input to find its own TSP solution that it outputs and finally the reducer outputs the smallest tour found by the mappers. Hope we are in the right track. Here are the issues: 1) Since the input file is small, we need to force hadoop to fire up multiple map tasks by replicating the input. How can we make an InputSplit of the whole file and replicate it so that the input can be sent to multiple mappers? Write a custom splitter sending the same data to all mappers - the only critical criteria is the number of "Ants" 2) the algorithm uses a shared pheromone array and each mapper needs to read and write data from this. How can we share the pheromone data across the mappers. You cannot share data across mappers and should not attempt to do so - Better to use the reducer(s) to combine trails combine first pass trails and then pass the combined trails to another mapreduce job - this with the original problem plus the current pheremone trails Hope the questions are clear enough. Any help would be greatly appreciated. Thank you Regards Sharat -- Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell) Skype lordjoe_com +
sharat attupurath 2012-05-04, 16:54
-
Re: Ant Colony Optimization for Travelling Salesman Problem in HadoopSteve Lewis 2012-05-04, 20:15
Look at NShotInputFormat
call setNumberKeys to set the number of ants then change the method getValueFromIndex to return the text representing the original problem to each mapper - what happens in the second and third jobs is an exercise but the saved test needs to have pheremones and the original problem On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <[EMAIL PROTECTED]>wrote: > Hi, > > Thanks a lot for the quick reply sir! We are new to Apache Hadoop and > haven't yet understood it properly yet. Can you please elaborate on how we > can have multiple stages of mapreduce jobs for combining the trails as you > have mentioned? > > We have been trying to find out how to write a custom splitter and almost > all online resources tell to subclass the FileInputFormat and write only a > custom RecordReader? Will it be possible to generate our splits in that way? > > Regards > > Sharat > > ------------------------------ > Date: Fri, 4 May 2012 09:22:51 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > > > > On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <[EMAIL PROTECTED]>wrote: > > Hi, > > We are trying to parallelize the ant colony optimization algorithm for TSP > over hadoop and are facing some issues. We are using TSPLIB as input files. > The input is a text file containing eucledian coordinates of the cities - > first column is city number and the next two columns contain the x and y > coordinates respectively. > > What we intend to do is to take input from this single file, send copies > of it to multiple mappers (each mapper acts like the ant in the algorithm), > each mapper works on its input to find its own TSP solution that it outputs > and finally the reducer outputs the smallest tour found by the mappers. > Hope we are in the right track. Here are the issues: > > 1) Since the input file is small, we need to force hadoop to fire up > multiple map tasks by replicating the input. How can we make an InputSplit > of the whole file and replicate it so that the input can be sent to > multiple mappers? > > Write a custom splitter sending the same data to all mappers - the only > critical criteria is the number of "Ants" > > > 2) the algorithm uses a shared pheromone array and each mapper needs to > read and write data from this. How can we share the pheromone data across > the mappers. > > You cannot share data across mappers and should not attempt to do so - > Better to use the reducer(s) to combine trails combine first pass trails > and > then pass the combined trails to another mapreduce job - this with the > original problem plus the current pheremone trails > > > Hope the questions are clear enough. Any help would be greatly > appreciated. > > Thank you > > Regards > > Sharat > > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > > > -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com +
Steve Lewis 2012-05-04, 20:15
-
RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoopsharat attupurath 2012-05-05, 15:08
I looked at both the files. in AbstractNShotInputFormat it is mentioned that this input format does not read from files. My input is in a text file. I want the whole file as a single record. So is it enough if i just copy the contents of the file and return it as a string from getValueFromIndex() ? Date: Fri, 4 May 2012 13:15:46 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Look at NShotInputFormat call setNumberKeys to set the number of ants thenchange the method getValueFromIndex to return the text representing the original problem to each mapper -what happens in the second and third jobs is an exercise but the saved test needs to have pheremones and the original problem On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, Thanks a lot for the quick reply sir! We are new to Apache Hadoop and haven't yet understood it properly yet. Can you please elaborate on how we can have multiple stages of mapreduce jobs for combining the trails as you have mentioned? We have been trying to find out how to write a custom splitter and almost all online resources tell to subclass the FileInputFormat and write only a custom RecordReader? Will it be possible to generate our splits in that way? Regards Sharat Date: Fri, 4 May 2012 09:22:51 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, We are trying to parallelize the ant colony optimization algorithm for TSP over hadoop and are facing some issues. We are using TSPLIB as input files. The input is a text file containing eucledian coordinates of the cities - first column is city number and the next two columns contain the x and y coordinates respectively. What we intend to do is to take input from this single file, send copies of it to multiple mappers (each mapper acts like the ant in the algorithm), each mapper works on its input to find its own TSP solution that it outputs and finally the reducer outputs the smallest tour found by the mappers. Hope we are in the right track. Here are the issues: 1) Since the input file is small, we need to force hadoop to fire up multiple map tasks by replicating the input. How can we make an InputSplit of the whole file and replicate it so that the input can be sent to multiple mappers? Write a custom splitter sending the same data to all mappers - the only critical criteria is the number of "Ants" 2) the algorithm uses a shared pheromone array and each mapper needs to read and write data from this. How can we share the pheromone data across the mappers. You cannot share data across mappers and should not attempt to do so - Better to use the reducer(s) to combine trails combine first pass trails and then pass the combined trails to another mapreduce job - this with the original problem plus the current pheremone trails Hope the questions are clear enough. Any help would be greatly appreciated. Thank you Regards Sharat -- Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell) Skype lordjoe_com -- Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell) Skype lordjoe_com +
sharat attupurath 2012-05-05, 15:08
-
Re: Ant Colony Optimization for Travelling Salesman Problem in HadoopSteve Lewis 2012-05-05, 15:20
yes - if you know how you can put it in distributed cache or if it is small
put in as a String in the config or have all InputFormats read it from somewhere On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <[EMAIL PROTECTED]>wrote: > I looked at both the files. in AbstractNShotInputFormat it is mentioned > that this input format does not read from files. My input is in a text > file. I want the whole file as a single record. So is it enough if i just > copy the contents of the file and return it as a string from > getValueFromIndex() ? > > ------------------------------ > Date: Fri, 4 May 2012 13:15:46 -0700 > > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Look at NShotInputFormat > call setNumberKeys to set the number of ants then > change the method getValueFromIndex to return the text representing the > original problem to each mapper - > what happens in the second and third jobs is an exercise but the saved > test needs to have pheremones and the original problem > > > On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <[EMAIL PROTECTED]>wrote: > > Hi, > > Thanks a lot for the quick reply sir! We are new to Apache Hadoop and > haven't yet understood it properly yet. Can you please elaborate on how we > can have multiple stages of mapreduce jobs for combining the trails as you > have mentioned? > > We have been trying to find out how to write a custom splitter and almost > all online resources tell to subclass the FileInputFormat and write only a > custom RecordReader? Will it be possible to generate our splits in that way? > > Regards > > Sharat > > ------------------------------ > Date: Fri, 4 May 2012 09:22:51 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > > > > On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <[EMAIL PROTECTED]>wrote: > > Hi, > > We are trying to parallelize the ant colony optimization algorithm for TSP > over hadoop and are facing some issues. We are using TSPLIB as input files. > The input is a text file containing eucledian coordinates of the cities - > first column is city number and the next two columns contain the x and y > coordinates respectively. > > What we intend to do is to take input from this single file, send copies > of it to multiple mappers (each mapper acts like the ant in the algorithm), > each mapper works on its input to find its own TSP solution that it outputs > and finally the reducer outputs the smallest tour found by the mappers. > Hope we are in the right track. Here are the issues: > > 1) Since the input file is small, we need to force hadoop to fire up > multiple map tasks by replicating the input. How can we make an InputSplit > of the whole file and replicate it so that the input can be sent to > multiple mappers? > > Write a custom splitter sending the same data to all mappers - the only > critical criteria is the number of "Ants" > > > 2) the algorithm uses a shared pheromone array and each mapper needs to > read and write data from this. How can we share the pheromone data across > the mappers. > > You cannot share data across mappers and should not attempt to do so - > Better to use the reducer(s) to combine trails combine first pass trails > and > then pass the combined trails to another mapreduce job - this with the > original problem plus the current pheremone trails > > > Hope the questions are clear enough. Any help would be greatly > appreciated. > > Thank you > > Regards > > Sharat > > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > > > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > > > -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com +
Steve Lewis 2012-05-05, 15:20
-
RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoopsharat attupurath 2012-05-05, 15:37
Since the input files are very small, the default input formats in Hadoop all generate just a single InputSplit, so only a single map task is executed, and we wont have much parallelism. I was thinking of writing an InputFormat that would read the whole file as an InputSplit and replicate this input split n times (where n would be the number of ants in a single stage) so that we'll have n mappers. Also I want the input format to return the value as the adjacency matrix of the graph (calculating it from the coordinates in the input file). But I can't find a way to do this. Is it possible? Or is it better to just have the input as Text and create the adjacency matrix in the mappers? Date: Sat, 5 May 2012 08:20:34 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] yes - if you know how you can put it in distributed cache or if it is small put in as a String in the config or have all InputFormats read it from somewhere On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: I looked at both the files. in AbstractNShotInputFormat it is mentioned that this input format does not read from files. My input is in a text file. I want the whole file as a single record. So is it enough if i just copy the contents of the file and return it as a string from getValueFromIndex() ? Date: Fri, 4 May 2012 13:15:46 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Look at NShotInputFormat call setNumberKeys to set the number of ants thenchange the method getValueFromIndex to return the text representing the original problem to each mapper - what happens in the second and third jobs is an exercise but the saved test needs to have pheremones and the original problem On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, Thanks a lot for the quick reply sir! We are new to Apache Hadoop and haven't yet understood it properly yet. Can you please elaborate on how we can have multiple stages of mapreduce jobs for combining the trails as you have mentioned? We have been trying to find out how to write a custom splitter and almost all online resources tell to subclass the FileInputFormat and write only a custom RecordReader? Will it be possible to generate our splits in that way? Regards Sharat Date: Fri, 4 May 2012 09:22:51 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, We are trying to parallelize the ant colony optimization algorithm for TSP over hadoop and are facing some issues. We are using TSPLIB as input files. The input is a text file containing eucledian coordinates of the cities - first column is city number and the next two columns contain the x and y coordinates respectively. What we intend to do is to take input from this single file, send copies of it to multiple mappers (each mapper acts like the ant in the algorithm), each mapper works on its input to find its own TSP solution that it outputs and finally the reducer outputs the smallest tour found by the mappers. Hope we are in the right track. Here are the issues: 1) Since the input file is small, we need to force hadoop to fire up multiple map tasks by replicating the input. How can we make an InputSplit of the whole file and replicate it so that the input can be sent to multiple mappers? Write a custom splitter sending the same data to all mappers - the only critical criteria is the number of "Ants" 2) the algorithm uses a shared pheromone array and each mapper needs to read and write data from this. How can we share the pheromone data across the mappers. You cannot share data across mappers and should not attempt to do so - Better to use the reducer(s) to combine trails combine first pass trails and then pass the combined trails to another mapreduce job - this with the original problem plus the current pheremone trails Hope the questions are clear enough. Any help would be greatly appreciated. Thank you Regards Sharat Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell) Skype lordjoe_com Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell) Skype lordjoe_com Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell) Skype lordjoe_com +
sharat attupurath 2012-05-05, 15:37
-
RE: Ant Colony Optimization for Travelling Salesman Problem in HadoopSteve Lewis 2012-05-05, 20:16
Look at what I sent you.it generates a number of splits three is a static
variable to list how many On May 5, 2012 8:38 AM, "sharat attupurath" <[EMAIL PROTECTED]> wrote: > Since the input files are very small, the default input formats in Hadoop > all generate just a single InputSplit, so only a single map task is > executed, and we wont have much parallelism. > > I was thinking of writing an InputFormat that would read the whole file as > an InputSplit and replicate this input split n times (where n would be the > number of ants in a single stage) so that we'll have n mappers. > Also I want the input format to return the value as the adjacency matrix > of the graph (calculating it from the coordinates in the input file). > > But I can't find a way to do this. Is it possible? Or is it better to just > have the input as Text and create the adjacency matrix in the mappers? > > ------------------------------ > Date: Sat, 5 May 2012 08:20:34 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > yes - if you know how you can put it in distributed cache or if it is > small put in as a String in the config or have all InputFormats read it > from somewhere > > On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <[EMAIL PROTECTED]>wrote: > > I looked at both the files. in AbstractNShotInputFormat it is mentioned > that this input format does not read from files. My input is in a text > file. I want the whole file as a single record. So is it enough if i just > copy the contents of the file and return it as a string from > getValueFromIndex() ? > > ------------------------------ > Date: Fri, 4 May 2012 13:15:46 -0700 > > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Look at NShotInputFormat > call setNumberKeys to set the number of ants then > change the method getValueFromIndex to return the text representing the > original problem to each mapper - > what happens in the second and third jobs is an exercise but the saved > test needs to have pheremones and the original problem > > > On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <[EMAIL PROTECTED]>wrote: > > Hi, > > Thanks a lot for the quick reply sir! We are new to Apache Hadoop and > haven't yet understood it properly yet. Can you please elaborate on how we > can have multiple stages of mapreduce jobs for combining the trails as you > have mentioned? > > We have been trying to find out how to write a custom splitter and almost > all online resources tell to subclass the FileInputFormat and write only a > custom RecordReader? Will it be possible to generate our splits in that way? > > Regards > > Sharat > > ------------------------------ > Date: Fri, 4 May 2012 09:22:51 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > > > > On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <[EMAIL PROTECTED]>wrote: > > Hi, > > We are trying to parallelize the ant colony optimization algorithm for TSP > over hadoop and are facing some issues. We are using TSPLIB as input files. > The input is a text file containing eucledian coordinates of the cities - > first column is city number and the next two columns contain the x and y > coordinates respectively. > > What we intend to do is to take input from this single file, send copies > of it to multiple mappers (each mapper acts like the ant in the algorithm), > each mapper works on its input to find its own TSP solution that it outputs > and finally the reducer outputs the smallest tour found by the mappers. > Hope we are in the right track. Here are the issues: > > 1) Since the input file is small, we need to force hadoop to fire up > multiple map tasks by replicating the input. How can we make an InputSplit > of the whole file and replicate it so that the input can be sent to +
Steve Lewis 2012-05-05, 20:16
-
RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoopsharat attupurath 2012-05-06, 03:11
sorry i missed it the first time. Thank you
Date: Sat, 5 May 2012 13:16:47 -0700 Subject: RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Look at what I sent you.it generates a number of splits three is a static variable to list how many On May 5, 2012 8:38 AM, "sharat attupurath" <[EMAIL PROTECTED]> wrote: Since the input files are very small, the default input formats in Hadoop all generate just a single InputSplit, so only a single map task is executed, and we wont have much parallelism. I was thinking of writing an InputFormat that would read the whole file as an InputSplit and replicate this input split n times (where n would be the number of ants in a single stage) so that we'll have n mappers. Also I want the input format to return the value as the adjacency matrix of the graph (calculating it from the coordinates in the input file). But I can't find a way to do this. Is it possible? Or is it better to just have the input as Text and create the adjacency matrix in the mappers? Date: Sat, 5 May 2012 08:20:34 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] yes - if you know how you can put it in distributed cache or if it is small put in as a String in the config or have all InputFormats read it from somewhere On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: I looked at both the files. in AbstractNShotInputFormat it is mentioned that this input format does not read from files. My input is in a text file. I want the whole file as a single record. So is it enough if i just copy the contents of the file and return it as a string from getValueFromIndex() ? Date: Fri, 4 May 2012 13:15:46 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Look at NShotInputFormat call setNumberKeys to set the number of ants thenchange the method getValueFromIndex to return the text representing the original problem to each mapper - what happens in the second and third jobs is an exercise but the saved test needs to have pheremones and the original problem On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, Thanks a lot for the quick reply sir! We are new to Apache Hadoop and haven't yet understood it properly yet. Can you please elaborate on how we can have multiple stages of mapreduce jobs for combining the trails as you have mentioned? We have been trying to find out how to write a custom splitter and almost all online resources tell to subclass the FileInputFormat and write only a custom RecordReader? Will it be possible to generate our splits in that way? Regards Sharat Date: Fri, 4 May 2012 09:22:51 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, We are trying to parallelize the ant colony optimization algorithm for TSP over hadoop and are facing some issues. We are using TSPLIB as input files. The input is a text file containing eucledian coordinates of the cities - first column is city number and the next two columns contain the x and y coordinates respectively. What we intend to do is to take input from this single file, send copies of it to multiple mappers (each mapper acts like the ant in the algorithm), each mapper works on its input to find its own TSP solution that it outputs and finally the reducer outputs the smallest tour found by the mappers. Hope we are in the right track. Here are the issues: 1) Since the input file is small, we need to force hadoop to fire up multiple map tasks by replicating the input. How can we make an InputSplit of the whole file and replicate it so that the input can be sent to multiple mappers? Write a custom splitter sending the same data to all mappers - the only critical criteria is the number of "Ants" 2) the algorithm uses a shared pheromone array and each mapper needs to read and write data from this. How can we share the pheromone data across the mappers. You cannot share data across mappers and should not attempt to do so - Better to use the reducer(s) to combine trails combine first pass trails and then pass the combined trails to another mapreduce job - this with the original problem plus the current pheremone trails Hope the questions are clear enough. Any help would be greatly appreciated. Thank you Regards Sharat Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell) Skype lordjoe_com Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell) Skype lordjoe_com Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell) Skype lordjoe_com +
sharat attupurath 2012-05-06, 03:11
-
RE: Ant Colony Optimization for Travelling Salesman Problem in HadoopGUOJUN Zhu 2012-05-07, 14:54
We are using old API of 0.20. I think when you set "mapred.reduce.tasks"
with certain number N and use fileinputformat, the default behavior is that any file will be split into that number, N, each split smaller than the default block size. Of course, other restriction, such as "mapred.min.split.size" cannot be set too large (default is as small as possible I think). Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac sharat attupurath <[EMAIL PROTECTED]> 05/05/2012 11:37 AM Please respond to [EMAIL PROTECTED] To <[EMAIL PROTECTED]> cc Subject RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop Since the input files are very small, the default input formats in Hadoop all generate just a single InputSplit, so only a single map task is executed, and we wont have much parallelism. I was thinking of writing an InputFormat that would read the whole file as an InputSplit and replicate this input split n times (where n would be the number of ants in a single stage) so that we'll have n mappers. Also I want the input format to return the value as the adjacency matrix of the graph (calculating it from the coordinates in the input file). But I can't find a way to do this. Is it possible? Or is it better to just have the input as Text and create the adjacency matrix in the mappers? Date: Sat, 5 May 2012 08:20:34 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] yes - if you know how you can put it in distributed cache or if it is small put in as a String in the config or have all InputFormats read it from somewhere On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: I looked at both the files. in AbstractNShotInputFormat it is mentioned that this input format does not read from files. My input is in a text file. I want the whole file as a single record. So is it enough if i just copy the contents of the file and return it as a string from getValueFromIndex() ? Date: Fri, 4 May 2012 13:15:46 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Look at NShotInputFormat call setNumberKeys to set the number of ants then change the method getValueFromIndex to return the text representing the original problem to each mapper - what happens in the second and third jobs is an exercise but the saved test needs to have pheremones and the original problem On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, Thanks a lot for the quick reply sir! We are new to Apache Hadoop and haven't yet understood it properly yet. Can you please elaborate on how we can have multiple stages of mapreduce jobs for combining the trails as you have mentioned? We have been trying to find out how to write a custom splitter and almost all online resources tell to subclass the FileInputFormat and write only a custom RecordReader? Will it be possible to generate our splits in that way? Regards Sharat Date: Fri, 4 May 2012 09:22:51 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, We are trying to parallelize the ant colony optimization algorithm for TSP over hadoop and are facing some issues. We are using TSPLIB as input files. The input is a text file containing eucledian coordinates of the cities - first column is city number and the next two columns contain the x and y coordinates respectively. What we intend to do is to take input from this single file, send copies of it to multiple mappers (each mapper acts like the ant in the algorithm), each mapper works on its input to find its own TSP solution that it outputs and finally the reducer outputs the smallest tour found by the mappers. Hope we are in the right track. Here are the issues: 1) Since the input file is small, we need to force hadoop to fire up multiple map tasks by replicating the input. How can we make an InputSplit of the whole file and replicate it so that the input can be sent to multiple mappers? Write a custom splitter sending the same data to all mappers - the only critical criteria is the number of "Ants" 2) the algorithm uses a shared pheromone array and each mapper needs to read and write data from this. How can we share the pheromone data across the mappers. You cannot share data across mappers and should not attempt to do so - Better to use the reducer(s) to combine trails combine first pass trails and then pass the combined trails to another mapreduce job - this with the original problem plus the current pheremone trails Hope the questions are clear enough. Any help would be greatly appreciated. Thank you Regards Sharat Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com +
GUOJUN Zhu 2012-05-07, 14:54
-
Re: Ant Colony Optimization for Travelling Salesman Problem in HadoopSteve Lewis 2012-05-07, 15:17
Yes but it is the job of the InputFormat code to implement the behavior -
it is not necessary to do so and in other cases I choose to create more mappers when the mapper has a lot of work On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <[EMAIL PROTECTED]>wrote: > > We are using old API of 0.20. I think when you set "mapred.reduce.tasks" > with certain number N and use fileinputformat, the default behavior is that > any file will be split into that number, N, each split smaller than the > default block size. Of course, other restriction, such as > "mapred.min.split.size" cannot be set too large (default is as small as > possible I think). > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > [EMAIL PROTECTED] > Financial Engineering > Freddie Mac > > > *sharat attupurath <[EMAIL PROTECTED]>* > > 05/05/2012 11:37 AM > Please respond to > [EMAIL PROTECTED] > > To > <[EMAIL PROTECTED]> > cc > Subject > RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop > > > > > Since the input files are very small, the default input formats in Hadoop > all generate just a single InputSplit, so only a single map task is > executed, and we wont have much parallelism. > > I was thinking of writing an InputFormat that would read the whole file as > an InputSplit and replicate this input split n times (where n would be the > number of ants in a single stage) so that we'll have n mappers. > Also I want the input format to return the value as the adjacency matrix > of the graph (calculating it from the coordinates in the input file). > > But I can't find a way to do this. Is it possible? Or is it better to just > have the input as Text and create the adjacency matrix in the mappers? > > ------------------------------ > Date: Sat, 5 May 2012 08:20:34 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > yes - if you know how you can put it in distributed cache or if it is > small put in as a String in the config or have all InputFormats read it > from somewhere > > On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <*[EMAIL PROTECTED]*<[EMAIL PROTECTED]>> > wrote: > I looked at both the files. in AbstractNShotInputFormat it is mentioned > that this input format does not read from files. My input is in a text > file. I want the whole file as a single record. So is it enough if i just > copy the contents of the file and return it as a string from > getValueFromIndex() ? > > ------------------------------ > Date: Fri, 4 May 2012 13:15:46 -0700 > > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: *[EMAIL PROTECTED]* <[EMAIL PROTECTED]> > To: *[EMAIL PROTECTED]* <[EMAIL PROTECTED]> > > Look at NShotInputFormat > call setNumberKeys to set the number of ants then > change the method getValueFromIndex to return the text representing the > original problem to each mapper - > what happens in the second and third jobs is an exercise but the saved > test needs to have pheremones and the original problem > > > On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <*[EMAIL PROTECTED]*<[EMAIL PROTECTED]>> > wrote: > Hi, > > Thanks a lot for the quick reply sir! We are new to Apache Hadoop and > haven't yet understood it properly yet. Can you please elaborate on how we > can have multiple stages of mapreduce jobs for combining the trails as you > have mentioned? > > We have been trying to find out how to write a custom splitter and almost > all online resources tell to subclass the FileInputFormat and write only a > custom RecordReader? Will it be possible to generate our splits in that way? > > Regards > > Sharat > > ------------------------------ > Date: Fri, 4 May 2012 09:22:51 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: *[EMAIL PROTECTED]* <[EMAIL PROTECTED]> Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com +
Steve Lewis 2012-05-07, 15:17
-
Re: Ant Colony Optimization for Travelling Salesman Problem in HadoopGUOJUN Zhu 2012-05-07, 16:03
The default FileInputformat split the file according to the size. If you
use line text data, the TextFileInputformat respects the line structure for input. We got splits as small as a few KBs. The file split is a tricky business, especially when you want it to respect your logical boundary. It is better to use the existing battle-test code than invent your own wheel. Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac Steve Lewis <[EMAIL PROTECTED]> 05/07/2012 11:17 AM Please respond to [EMAIL PROTECTED] To [EMAIL PROTECTED] cc Subject Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop Yes but it is the job of the InputFormat code to implement the behavior - it is not necessary to do so and in other cases I choose to create more mappers when the mapper has a lot of work On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <[EMAIL PROTECTED]> wrote: We are using old API of 0.20. I think when you set "mapred.reduce.tasks" with certain number N and use fileinputformat, the default behavior is that any file will be split into that number, N, each split smaller than the default block size. Of course, other restriction, such as "mapred.min.split.size" cannot be set too large (default is as small as possible I think). Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac sharat attupurath <[EMAIL PROTECTED]> 05/05/2012 11:37 AM Please respond to [EMAIL PROTECTED] To <[EMAIL PROTECTED]> cc Subject RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop Since the input files are very small, the default input formats in Hadoop all generate just a single InputSplit, so only a single map task is executed, and we wont have much parallelism. I was thinking of writing an InputFormat that would read the whole file as an InputSplit and replicate this input split n times (where n would be the number of ants in a single stage) so that we'll have n mappers. Also I want the input format to return the value as the adjacency matrix of the graph (calculating it from the coordinates in the input file). But I can't find a way to do this. Is it possible? Or is it better to just have the input as Text and create the adjacency matrix in the mappers? Date: Sat, 5 May 2012 08:20:34 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] yes - if you know how you can put it in distributed cache or if it is small put in as a String in the config or have all InputFormats read it from somewhere On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: I looked at both the files. in AbstractNShotInputFormat it is mentioned that this input format does not read from files. My input is in a text file. I want the whole file as a single record. So is it enough if i just copy the contents of the file and return it as a string from getValueFromIndex() ? Date: Fri, 4 May 2012 13:15:46 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Look at NShotInputFormat call setNumberKeys to set the number of ants then change the method getValueFromIndex to return the text representing the original problem to each mapper - what happens in the second and third jobs is an exercise but the saved test needs to have pheremones and the original problem On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, Thanks a lot for the quick reply sir! We are new to Apache Hadoop and haven't yet understood it properly yet. Can you please elaborate on how we can have multiple stages of mapreduce jobs for combining the trails as you have mentioned? We have been trying to find out how to write a custom splitter and almost all online resources tell to subclass the FileInputFormat and write only a custom RecordReader? Will it be possible to generate our splits in that way? Regards Sharat Date: Fri, 4 May 2012 09:22:51 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, We are trying to parallelize the ant colony optimization algorithm for TSP over hadoop and are facing some issues. We are using TSPLIB as input files. The input is a text file containing eucledian coordinates of the cities - first column is city number and the next two columns contain the x and y coordinates respectively. What we intend to do is to take input from this single file, send copies of it to multiple mappers (each mapper acts like the ant in the algorithm), each mapper works on its input to find its own TSP solution that it outputs and finally the reducer outputs the smallest tour found by the mappers. Hope we are in the right track. Here are the issues: 1) Since the input file is small, we need to force hadoop to fire up multiple map tasks by replicating the input. How can we make an InputSplit of the whole file and replicate it so that the input can be sent to multiple mappers? Write a custom splitter sending the same data to all mappers - the only critical criteria is the number of "Ants" 2) the algorithm uses a shared pheromone array and each mapper needs to read and write data from this. How can we share the pheromone data across the mappers. You cannot share data across mappers and should not attempt to do so - Better to use the reducer(s) to combine trails combine first pass trails and then pass the combined trails to another mapreduce job - this with the original problem plus the current pheremone trails +
GUOJUN Zhu 2012-05-07, 16:03
-
Re: Ant Colony Optimization for Travelling Salesman Problem in HadoopSteve Lewis 2012-05-07, 16:24
Fair enough - I write a lot of InputFormats since for most of my problems a
line of text is not the proper unit - I read fasta files - read lines intil you hit a line starting with > and xml fragments - read until you hit a closing tag On Mon, May 7, 2012 at 9:03 AM, GUOJUN Zhu <[EMAIL PROTECTED]>wrote: > > The default FileInputformat split the file according to the size. If you > use line text data, the TextFileInputformat respects the line structure for > input. We got splits as small as a few KBs. The file split is a tricky > business, especially when you want it to respect your logical boundary. It > is better to use the existing battle-test code than invent your own wheel. > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > [EMAIL PROTECTED] > Financial Engineering > Freddie Mac > > > *Steve Lewis <[EMAIL PROTECTED]>* > > 05/07/2012 11:17 AM > Please respond to > [EMAIL PROTECTED] > > To > [EMAIL PROTECTED] > cc > Subject > Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop > > > > > Yes but it is the job of the InputFormat code to implement the behavior - > it is not necessary to do so and in other cases I choose to create more > mappers when the mapper has a lot of work > > On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <*[EMAIL PROTECTED]*<[EMAIL PROTECTED]>> > wrote: > > We are using old API of 0.20. I think when you set "mapred.reduce.tasks" > with certain number N and use fileinputformat, the default behavior is that > any file will be split into that number, N, each split smaller than the > default block size. Of course, other restriction, such as > "mapred.min.split.size" cannot be set too large (default is as small as > possible I think). > > Zhu, Guojun > Modeling Sr Graduate* > **571-3824370* <571-3824370>* > **[EMAIL PROTECTED]* <[EMAIL PROTECTED]> > Financial Engineering > Freddie Mac > > *sharat attupurath <**[EMAIL PROTECTED]* <[EMAIL PROTECTED]>*>* > > 05/05/2012 11:37 AM > > > Please respond to* > **[EMAIL PROTECTED]* <[EMAIL PROTECTED]> > > To > <*[EMAIL PROTECTED]* <[EMAIL PROTECTED]>> > cc > Subject > RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop > > > > > > > Since the input files are very small, the default input formats in Hadoop > all generate just a single InputSplit, so only a single map task is > executed, and we wont have much parallelism. > > I was thinking of writing an InputFormat that would read the whole file as > an InputSplit and replicate this input split n times (where n would be the > number of ants in a single stage) so that we'll have n mappers. > Also I want the input format to return the value as the adjacency matrix > of the graph (calculating it from the coordinates in the input file). > > But I can't find a way to do this. Is it possible? Or is it better to just > have the input as Text and create the adjacency matrix in the mappers? > > ------------------------------ > Date: Sat, 5 May 2012 08:20:34 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: *[EMAIL PROTECTED]* <[EMAIL PROTECTED]> > To: *[EMAIL PROTECTED]* <[EMAIL PROTECTED]> > > yes - if you know how you can put it in distributed cache or if it is > small put in as a String in the config or have all InputFormats read it > from somewhere > > On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <*[EMAIL PROTECTED]*<[EMAIL PROTECTED]>> > wrote: > I looked at both the files. in AbstractNShotInputFormat it is mentioned > that this input format does not read from files. My input is in a text > file. I want the whole file as a single record. So is it enough if i just > copy the contents of the file and return it as a string from > getValueFromIndex() ? > > ------------------------------ > Date: Fri, 4 May 2012 13:15:46 -0700 Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com +
Steve Lewis 2012-05-07, 16:24
-
RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoopsharat attupurath 2012-05-08, 10:55
Hi Steve, I tried using the NShotInputFormat, but after setting it as the InputFormat class and running my mapreduce job I get the following error : Exception in thread "main" java.lang.RuntimeException: class tsphadoop.NShotInputFormat not org.apache.hadoop.mapred.InputFormat at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:915) at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:590) at tsphadoop.TspHadoop.main(TspHadoop.java:40) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Am i missing something? Is there anything else I should know about using NShotInputFormat? Thanks and regards Sharat Date: Mon, 7 May 2012 09:24:05 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Fair enough - I write a lot of InputFormats since for most of my problems a line of text is not the proper unit -I read fasta files - read lines intil you hit a line starting with > and xml fragments - read until you hit a closing tag On Mon, May 7, 2012 at 9:03 AM, GUOJUN Zhu <[EMAIL PROTECTED]> wrote: The default FileInputformat split the file according to the size. If you use line text data, the TextFileInputformat respects the line structure for input. We got splits as small as a few KBs. The file split is a tricky business, especially when you want it to respect your logical boundary. It is better to use the existing battle-test code than invent your own wheel. Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac Steve Lewis <[EMAIL PROTECTED]> 05/07/2012 11:17 AM Please respond to [EMAIL PROTECTED] To [EMAIL PROTECTED] cc Subject Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop Yes but it is the job of the InputFormat code to implement the behavior - it is not necessary to do so and in other cases I choose to create more mappers when the mapper has a lot of work On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <[EMAIL PROTECTED]> wrote: We are using old API of 0.20. I think when you set "mapred.reduce.tasks" with certain number N and use fileinputformat, the default behavior is that any file will be split into that number, N, each split smaller than the default block size. Of course, other restriction, such as "mapred.min.split.size" cannot be set too large (default is as small as possible I think). Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac sharat attupurath <[EMAIL PROTECTED]> 05/05/2012 11:37 AM Please respond to [EMAIL PROTECTED] To <[EMAIL PROTECTED]> cc Subject RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop Since the input files are very small, the default input formats in Hadoop all generate just a single InputSplit, so only a single map task is executed, and we wont have much parallelism. I was thinking of writing an InputFormat that would read the whole file as an InputSplit and replicate this input split n times (where n would be the number of ants in a single stage) so that we'll have n mappers. Also I want the input format to return the value as the adjacency matrix of the graph (calculating it from the coordinates in the input file). But I can't find a way to do this. Is it possible? Or is it better to just have the input as Text and create the adjacency matrix in the mappers? Date: Sat, 5 May 2012 08:20:34 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] yes - if you know how you can put it in distributed cache or if it is small put in as a String in the config or have all InputFormats read it from somewhere On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: I looked at both the files. in AbstractNShotInputFormat it is mentioned that this input format does not read from files. My input is in a text file. I want the whole file as a single record. So is it enough if i just copy the contents of the file and return it as a string from getValueFromIndex() ? Date: Fri, 4 May 2012 13:15:46 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Look at NShotInputFormat call setNumberKeys to set the number of ants then change the method getValueFromIndex to return the text representing the original problem to each mapper - what happens in the second and third jobs is an exercise but the saved test needs to have pheremones and the original problem On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, Thanks a lot for the quick reply sir! We are new to Apache Hadoop and haven't yet understood it properly yet. Can you please elaborate on how we can have multiple stages of mapreduce jobs for combining the trails as you have mentioned? We have been trying to find out how to write a custom splitter and almost all online resources tell to subclass the FileInputFormat and write only a custom RecordReader? Will it be possible to generate our splits in that way? Regards Sharat Date: Fri, 4 May 2012 09:22:51 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <sh +
sharat attupurath 2012-05-08, 10:55
-
RE: Ant Colony Optimization for Travelling Salesman Problem in HadoopSteve Lewis 2012-05-08, 14:47
Which api are you using? They changed between 0.18 and 0.20.this is the
more recent version On May 8, 2012 3:55 AM, "sharat attupurath" <[EMAIL PROTECTED]> wrote: > Hi Steve, > > I tried using the NShotInputFormat, but after setting it as the > InputFormat class and running my mapreduce job I get the following error : > > Exception in thread "main" java.lang.RuntimeException: class > tsphadoop.NShotInputFormat not org.apache.hadoop.mapred.InputFormat > at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:915) > at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:590) > at tsphadoop.TspHadoop.main(TspHadoop.java:40) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > Am i missing something? Is there anything else I should know about using > NShotInputFormat? > > Thanks and regards > > Sharat > > ------------------------------ > Date: Mon, 7 May 2012 09:24:05 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Fair enough - I write a lot of InputFormats since for most of my problems > a line of text is not the proper unit - > I read fasta files - read lines intil you hit a line starting with > and > xml fragments - read until you hit a closing tag > > > On Mon, May 7, 2012 at 9:03 AM, GUOJUN Zhu <[EMAIL PROTECTED]>wrote: > > > The default FileInputformat split the file according to the size. If you > use line text data, the TextFileInputformat respects the line structure for > input. We got splits as small as a few KBs. The file split is a tricky > business, especially when you want it to respect your logical boundary. It > is better to use the existing battle-test code than invent your own wheel. > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > [EMAIL PROTECTED] > Financial Engineering > Freddie Mac > > > > *Steve Lewis <[EMAIL PROTECTED]>* 05/07/2012 11:17 AM > Please respond to > [EMAIL PROTECTED] > > To > [EMAIL PROTECTED] > cc > Subject > Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop > > > > > Yes but it is the job of the InputFormat code to implement the behavior - > it is not necessary to do so and in other cases I choose to create more > mappers when the mapper has a lot of work > > On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <*[EMAIL PROTECTED]*<[EMAIL PROTECTED]>> > wrote: > > We are using old API of 0.20. I think when you set "mapred.reduce.tasks" > with certain number N and use fileinputformat, the default behavior is that > any file will be split into that number, N, each split smaller than the > default block size. Of course, other restriction, such as > "mapred.min.split.size" cannot be set too large (default is as small as > possible I think). > > Zhu, Guojun > Modeling Sr Graduate* > **571-3824370** > **[EMAIL PROTECTED]* <[EMAIL PROTECTED]> > Financial Engineering > Freddie Mac > > > > > *sharat attupurath <**[EMAIL PROTECTED]* <[EMAIL PROTECTED]>*>* > 05/05/2012 11:37 AM > > > Please respond to* > **[EMAIL PROTECTED]* <[EMAIL PROTECTED]> > > To > <*[EMAIL PROTECTED]* <[EMAIL PROTECTED]>> > cc > Subject > RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop > > > > > > > Since the input files are very small, the default input formats in Hadoop > all generate just a single InputSplit, so only a single map task is > executed, and we wont have much parallelism. > > I was thinking of writing an InputFormat that would read the whole file as > an InputSplit and replicate this input split n times (where n would be the +
Steve Lewis 2012-05-08, 14:47
-
RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoopsharat attupurath 2012-05-08, 15:32
i am using hadoop 0.20.203 Date: Tue, 8 May 2012 07:47:23 -0700 Subject: RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Which api are you using? They changed between 0.18 and 0.20.this is the more recent version On May 8, 2012 3:55 AM, "sharat attupurath" <[EMAIL PROTECTED]> wrote: Hi Steve, I tried using the NShotInputFormat, but after setting it as the InputFormat class and running my mapreduce job I get the following error : Exception in thread "main" java.lang.RuntimeException: class tsphadoop.NShotInputFormat not org.apache.hadoop.mapred.InputFormat at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:915) at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:590) at tsphadoop.TspHadoop.main(TspHadoop.java:40) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Am i missing something? Is there anything else I should know about using NShotInputFormat? Thanks and regards Sharat Date: Mon, 7 May 2012 09:24:05 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Fair enough - I write a lot of InputFormats since for most of my problems a line of text is not the proper unit -I read fasta files - read lines intil you hit a line starting with > and xml fragments - read until you hit a closing tag On Mon, May 7, 2012 at 9:03 AM, GUOJUN Zhu <[EMAIL PROTECTED]> wrote: The default FileInputformat split the file according to the size. If you use line text data, the TextFileInputformat respects the line structure for input. We got splits as small as a few KBs. The file split is a tricky business, especially when you want it to respect your logical boundary. It is better to use the existing battle-test code than invent your own wheel. Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac Steve Lewis <[EMAIL PROTECTED]> 05/07/2012 11:17 AM Please respond to [EMAIL PROTECTED] To [EMAIL PROTECTED] cc Subject Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop Yes but it is the job of the InputFormat code to implement the behavior - it is not necessary to do so and in other cases I choose to create more mappers when the mapper has a lot of work On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <[EMAIL PROTECTED]> wrote: We are using old API of 0.20. I think when you set "mapred.reduce.tasks" with certain number N and use fileinputformat, the default behavior is that any file will be split into that number, N, each split smaller than the default block size. Of course, other restriction, such as "mapred.min.split.size" cannot be set too large (default is as small as possible I think). Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac sharat attupurath <[EMAIL PROTECTED]> 05/05/2012 11:37 AM Please respond to [EMAIL PROTECTED] To <[EMAIL PROTECTED]> cc Subject RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop Since the input files are very small, the default input formats in Hadoop all generate just a single InputSplit, so only a single map task is executed, and we wont have much parallelism. I was thinking of writing an InputFormat that would read the whole file as an InputSplit and replicate this input split n times (where n would be the number of ants in a single stage) so that we'll have n mappers. Also I want the input format to return the value as the adjacency matrix of the graph (calculating it from the coordinates in the input file). But I can't find a way to do this. Is it possible? Or is it better to just have the input as Text and create the adjacency matrix in the mappers? Date: Sat, 5 May 2012 08:20:34 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] yes - if you know how you can put it in distributed cache or if it is small put in as a String in the config or have all InputFormats read it from somewhere On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: I looked at both the files. in AbstractNShotInputFormat it is mentioned that this input format does not read from files. My input is in a text file. I want the whole file as a single record. So is it enough if i just copy the contents of the file and return it as a string from getValueFromIndex() ? Date: Fri, 4 May 2012 13:15:46 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Look at NShotInputFormat call setNumberKeys to set the number of ants then change the method getValueFromIndex to return the text representing the original problem to each mapper - what happens in the second and third jobs is an exercise but the saved test needs to have pheremones and the original problem On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: Hi, Thanks a lot for the quick reply sir! We are new to Apache Hadoop and haven't yet understood it properly yet. Can you please elaborate on how we can have multiple stages of mapreduce jobs for combining the trails as you have mentioned? We have been trying to find out how to write a custom splitter and almost all online resources tell t +
sharat attupurath 2012-05-08, 15:32
-
Re: Ant Colony Optimization for Travelling Salesman Problem in HadoopSteve Lewis 2012-05-08, 16:22
of course but do your reducers subclass
org.apache.hadoop.mapreduce.Reducer or org.apache.hadoop.mapred.Reducer On Tue, May 8, 2012 at 8:32 AM, sharat attupurath <[EMAIL PROTECTED]>wrote: > i am using hadoop 0.20.203 > > ------------------------------ > Date: Tue, 8 May 2012 07:47:23 -0700 > > Subject: RE: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > > Which api are you using? They changed between 0.18 and 0.20.this is the > more recent version > On May 8, 2012 3:55 AM, "sharat attupurath" <[EMAIL PROTECTED]> wrote: > > Hi Steve, > > I tried using the NShotInputFormat, but after setting it as the > InputFormat class and running my mapreduce job I get the following error : > > Exception in thread "main" java.lang.RuntimeException: class > tsphadoop.NShotInputFormat not org.apache.hadoop.mapred.InputFormat > at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:915) > at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:590) > at tsphadoop.TspHadoop.main(TspHadoop.java:40) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > Am i missing something? Is there anything else I should know about using > NShotInputFormat? > > Thanks and regards > > Sharat > > ------------------------------ > Date: Mon, 7 May 2012 09:24:05 -0700 > Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in > Hadoop > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Fair enough - I write a lot of InputFormats since for most of my problems > a line of text is not the proper unit - > I read fasta files - read lines intil you hit a line starting with > and > xml fragments - read until you hit a closing tag > > > On Mon, May 7, 2012 at 9:03 AM, GUOJUN Zhu <[EMAIL PROTECTED]>wrote: > > > The default FileInputformat split the file according to the size. If you > use line text data, the TextFileInputformat respects the line structure for > input. We got splits as small as a few KBs. The file split is a tricky > business, especially when you want it to respect your logical boundary. It > is better to use the existing battle-test code than invent your own wheel. > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > [EMAIL PROTECTED] > Financial Engineering > Freddie Mac > > > > *Steve Lewis <[EMAIL PROTECTED]>* 05/07/2012 11:17 AM > Please respond to > [EMAIL PROTECTED] > > To > [EMAIL PROTECTED] > cc > Subject > Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop > > > > > Yes but it is the job of the InputFormat code to implement the behavior - > it is not necessary to do so and in other cases I choose to create more > mappers when the mapper has a lot of work > > On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <*[EMAIL PROTECTED]*<[EMAIL PROTECTED]>> > wrote: > > We are using old API of 0.20. I think when you set "mapred.reduce.tasks" > with certain number N and use fileinputformat, the default behavior is that > any file will be split into that number, N, each split smaller than the > default block size. Of course, other restriction, such as > "mapred.min.split.size" cannot be set too large (default is as small as > possible I think). > > Zhu, Guojun > Modeling Sr Graduate* > **571-3824370** > **[EMAIL PROTECTED]* <[EMAIL PROTECTED]> > Financial Engineering > Freddie Mac > > > > > *sharat attupurath <**[EMAIL PROTECTED]* <[EMAIL PROTECTED]>*>* > 05/05/2012 11:37 AM > > > Please respond to* > **[EMAIL PROTECTED]* <[EMAIL PROTECTED]> > Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com +
Steve Lewis 2012-05-08, 16:22
-
RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoopsharat attupurath 2012-05-08, 16:25
i followed an old tutorial it seems, so my reducer is implementing org.apache.hadoop.mapred.Reducer Date: Tue, 8 May 2012 09:22:20 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] of course but do your reducers subclass org.apache.hadoop.mapreduce.Reducer ororg.apache.hadoop.mapred.Reducer On Tue, May 8, 2012 at 8:32 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: i am using hadoop 0.20.203 Date: Tue, 8 May 2012 07:47:23 -0700 Subject: RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Which api are you using? They changed between 0.18 and 0.20.this is the more recent version On May 8, 2012 3:55 AM, "sharat attupurath" <[EMAIL PROTECTED]> wrote: Hi Steve, I tried using the NShotInputFormat, but after setting it as the InputFormat class and running my mapreduce job I get the following error : Exception in thread "main" java.lang.RuntimeException: class tsphadoop.NShotInputFormat not org.apache.hadoop.mapred.InputFormat at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:915) at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:590) at tsphadoop.TspHadoop.main(TspHadoop.java:40) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Am i missing something? Is there anything else I should know about using NShotInputFormat? Thanks and regards Sharat Date: Mon, 7 May 2012 09:24:05 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Fair enough - I write a lot of InputFormats since for most of my problems a line of text is not the proper unit -I read fasta files - read lines intil you hit a line starting with > and xml fragments - read until you hit a closing tag On Mon, May 7, 2012 at 9:03 AM, GUOJUN Zhu <[EMAIL PROTECTED]> wrote: The default FileInputformat split the file according to the size. If you use line text data, the TextFileInputformat respects the line structure for input. We got splits as small as a few KBs. The file split is a tricky business, especially when you want it to respect your logical boundary. It is better to use the existing battle-test code than invent your own wheel. Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac Steve Lewis <[EMAIL PROTECTED]> 05/07/2012 11:17 AM Please respond to [EMAIL PROTECTED] To [EMAIL PROTECTED] cc Subject Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop Yes but it is the job of the InputFormat code to implement the behavior - it is not necessary to do so and in other cases I choose to create more mappers when the mapper has a lot of work On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <[EMAIL PROTECTED]> wrote: We are using old API of 0.20. I think when you set "mapred.reduce.tasks" with certain number N and use fileinputformat, the default behavior is that any file will be split into that number, N, each split smaller than the default block size. Of course, other restriction, such as "mapred.min.split.size" cannot be set too large (default is as small as possible I think). Zhu, Guojun Modeling Sr Graduate 571-3824370 [EMAIL PROTECTED] Financial Engineering Freddie Mac sharat attupurath <[EMAIL PROTECTED]> 05/05/2012 11:37 AM Please respond to [EMAIL PROTECTED] To <[EMAIL PROTECTED]> cc Subject RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop Since the input files are very small, the default input formats in Hadoop all generate just a single InputSplit, so only a single map task is executed, and we wont have much parallelism. I was thinking of writing an InputFormat that would read the whole file as an InputSplit and replicate this input split n times (where n would be the number of ants in a single stage) so that we'll have n mappers. Also I want the input format to return the value as the adjacency matrix of the graph (calculating it from the coordinates in the input file). But I can't find a way to do this. Is it possible? Or is it better to just have the input as Text and create the adjacency matrix in the mappers? Date: Sat, 5 May 2012 08:20:34 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] yes - if you know how you can put it in distributed cache or if it is small put in as a String in the config or have all InputFormats read it from somewhere On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <[EMAIL PROTECTED]> wrote: I looked at both the files. in AbstractNShotInputFormat it is mentioned that this input format does not read from files. My input is in a text file. I want the whole file as a single record. So is it enough if i just copy the contents of the file and return it as a string from getValueFromIndex() ? Date: Fri, 4 May 2012 13:15:46 -0700 Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Look at NShotInputFormat call setNumberKeys to set the number of ants then change the method getValueFromIndex to return the text representing the original problem to each mapper - what happens in the second and third jobs is an exercise but the sa +
sharat attupurath 2012-05-08, 16:25
|