Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Ant Colony Optimization for Travelling Salesman Problem in Hadoop


Copy link to this message
-
Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
Fair enough - I write a lot of InputFormats since for most of my problems a
line of text is not the proper unit -
I read fasta files - read lines intil you hit a line starting with > and
xml fragments - read until you hit a closing tag
On Mon, May 7, 2012 at 9:03 AM, GUOJUN Zhu <[EMAIL PROTECTED]>wrote:

>
> The default FileInputformat split the file according to the size.  If you
> use line text data, the TextFileInputformat respects the line structure for
> input.   We got splits as small as a few KBs.  The file split is a tricky
> business, especially when you want it to respect your logical boundary. It
> is better to use the existing battle-test code than invent your own wheel.
>
> Zhu, Guojun
> Modeling Sr Graduate
> 571-3824370
> [EMAIL PROTECTED]
> Financial Engineering
> Freddie Mac
>
>
>     *Steve Lewis <[EMAIL PROTECTED]>*
>
>    05/07/2012 11:17 AM
>     Please respond to
> [EMAIL PROTECTED]
>
>   To
> [EMAIL PROTECTED]
> cc
>   Subject
> Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
>
>
>
>
> Yes but it is the job of the InputFormat code to implement the behavior -
> it is not necessary to do so and in other cases I choose to create more
> mappers when the mapper has a lot of work
>
> On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <*[EMAIL PROTECTED]*<[EMAIL PROTECTED]>>
> wrote:
>
> We are using old API of 0.20.  I think when you set "mapred.reduce.tasks"
> with certain number N and use fileinputformat, the default behavior is that
> any file will be split into that number, N, each split smaller than the
> default block size. Of course, other restriction, such as
> "mapred.min.split.size" cannot be set too large (default is as small as
> possible I think).
>
> Zhu, Guojun
> Modeling Sr Graduate*
> **571-3824370* <571-3824370>*
> **[EMAIL PROTECTED]* <[EMAIL PROTECTED]>
> Financial Engineering
> Freddie Mac
>
>      *sharat attupurath <**[EMAIL PROTECTED]* <[EMAIL PROTECTED]>*>*
>
>    05/05/2012 11:37 AM
>
>
>      Please respond to*
> **[EMAIL PROTECTED]* <[EMAIL PROTECTED]>
>
>   To
> <*[EMAIL PROTECTED]* <[EMAIL PROTECTED]>>
> cc
>   Subject
> RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
>
>
>
>
>
>
> Since the input files are very small, the default input formats in Hadoop
> all generate just a single InputSplit, so only a single map task is
> executed, and we wont have much parallelism.
>
> I was thinking of writing an InputFormat that would read the whole file as
> an InputSplit and replicate this input split n times (where n would be the
> number of ants in a single stage) so that we'll have n mappers.
> Also I want the input format to return the value as the adjacency matrix
> of the graph (calculating it from the coordinates in the input file).
>
> But I can't find a way to do this. Is it possible? Or is it better to just
> have the input as Text and create the adjacency matrix in the mappers?
>
>  ------------------------------
> Date: Sat, 5 May 2012 08:20:34 -0700
> Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in
> Hadoop
> From: *[EMAIL PROTECTED]* <[EMAIL PROTECTED]>
> To: *[EMAIL PROTECTED]* <[EMAIL PROTECTED]>
>
> yes - if you know how you can put it in distributed cache or if it is
> small put in as a String in the config or have all InputFormats read it
> from somewhere
>
> On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <*[EMAIL PROTECTED]*<[EMAIL PROTECTED]>>
> wrote:
> I looked at both the files. in AbstractNShotInputFormat it is mentioned
> that this input format does not read from files. My input is in a text
> file. I want the whole file as a single record. So is it enough if i just
> copy the contents of the file and return it as a string from
> getValueFromIndex() ?
>
>  ------------------------------
> Date: Fri, 4 May 2012 13:15:46 -0700
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB