Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - [Input split] File manipulation


Copy link to this message
-
Re: [Input split] File manipulation
Jeff Zhang 2010-08-17, 15:44
What size is your input ? If the input size is large enough, you do
not need to worry about the splitting, only one split (the last split)
has the different size, all the other splits has the same split.

On Tue, Aug 17, 2010 at 7:50 AM, Erik Test <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I'm trying to determine how to split a file evenly so each map task has a
> similar work load. The input I will have is a list of coordinates like this:
>
> 2,8
> 3,9
> 4,10
> 5,7
> 6,2
> 7,3
> 8,1
> 9,0
> 10,4
>
> Since there are 9 inputs in this example, I would like to split the records
> so that there would be 3 map tasks.
>
> I've been looking into different text input format classes but I'm still not
> sure how to split the input file the way I would like to.
>
> Does anyone have advice or suggestions how I can go about manipulating the
> input splits by specifying the number of lines are in an input split?
>
> Erik
>

--
Best Regards

Jeff Zhang