Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Ant Colony Optimization for Travelling Salesman Problem in Hadoop


Copy link to this message
-
Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
Steve Lewis 2012-05-07, 16:24
Fair enough - I write a lot of InputFormats since for most of my problems a
line of text is not the proper unit -
I read fasta files - read lines intil you hit a line starting with > and
xml fragments - read until you hit a closing tag
On Mon, May 7, 2012 at 9:03 AM, GUOJUN Zhu <[EMAIL PROTECTED]>wrote:

>
> The default FileInputformat split the file according to the size.  If you
> use line text data, the TextFileInputformat respects the line structure for
> input.   We got splits as small as a few KBs.  The file split is a tricky
> business, especially when you want it to respect your logical boundary. It
> is better to use the existing battle-test code than invent your own wheel.
>
> Zhu, Guojun
> Modeling Sr Graduate
> 571-3824370
> [EMAIL PROTECTED]
> Financial Engineering
> Freddie Mac
>
>
>     *Steve Lewis <[EMAIL PROTECTED]>*
>
>    05/07/2012 11:17 AM
>     Please respond to
> [EMAIL PROTECTED]
>
>   To
> [EMAIL PROTECTED]
> cc
>   Subject
> Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
>
>
>
>
> Yes but it is the job of the InputFormat code to implement the behavior -
> it is not necessary to do so and in other cases I choose to create more
> mappers when the mapper has a lot of work
>
> On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <*[EMAIL PROTECTED]*<[EMAIL PROTECTED]>>
> wrote:
>
> We are using old API of 0.20.  I think when you set "mapred.reduce.tasks"
> with certain number N and use fileinputformat, the default behavior is that
> any file will be split into that number, N, each split smaller than the
> default block size. Of course, other restriction, such as
> "mapred.min.split.size" cannot be set too large (default is as small as
> possible I think).
>
> Zhu, Guojun
> Modeling Sr Graduate*
> **571-3824370* <571-3824370>*
> **[EMAIL PROTECTED]* <[EMAIL PROTECTED]>
> Financial Engineering
> Freddie Mac
>
>      *sharat attupurath <**[EMAIL PROTECTED]* <[EMAIL PROTECTED]>*>*
>
>    05/05/2012 11:37 AM
>
>
>      Please respond to*
> **[EMAIL PROTECTED]* <[EMAIL PROTECTED]>
>
>   To
> <*[EMAIL PROTECTED]* <[EMAIL PROTECTED]>>
> cc
>   Subject
> RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
>
>
>
>
>
>
> Since the input files are very small, the default input formats in Hadoop
> all generate just a single InputSplit, so only a single map task is
> executed, and we wont have much parallelism.
>
> I was thinking of writing an InputFormat that would read the whole file as
> an InputSplit and replicate this input split n times (where n would be the
> number of ants in a single stage) so that we'll have n mappers.
> Also I want the input format to return the value as the adjacency matrix
> of the graph (calculating it from the coordinates in the input file).
>
> But I can't find a way to do this. Is it possible? Or is it better to just
> have the input as Text and create the adjacency matrix in the mappers?
>
>  ------------------------------
> Date: Sat, 5 May 2012 08:20:34 -0700
> Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in
> Hadoop
> From: *[EMAIL PROTECTED]* <[EMAIL PROTECTED]>
> To: *[EMAIL PROTECTED]* <[EMAIL PROTECTED]>
>
> yes - if you know how you can put it in distributed cache or if it is
> small put in as a String in the config or have all InputFormats read it
> from somewhere
>
> On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <*[EMAIL PROTECTED]*<[EMAIL PROTECTED]>>
> wrote:
> I looked at both the files. in AbstractNShotInputFormat it is mentioned
> that this input format does not read from files. My input is in a text
> file. I want the whole file as a single record. So is it enough if i just
> copy the contents of the file and return it as a string from
> getValueFromIndex() ?
>
>  ------------------------------
> Date: Fri, 4 May 2012 13:15:46 -0700
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com