-Re: [Input split] File manipulation
Jeff Zhang 2010-08-18, 01:37
The default split size is 64M which is the block size, and you can
change it by configuration.
What file type of your input file ? if it's gz , it can not been
spited, and you will always get only one mapper task.
On Wed, Aug 18, 2010 at 12:03 AM, Erik Test <[EMAIL PROTECTED]> wrote:
> I'm expecting to come across millions of data points.
> Thanks for the response by the way. I thought that Hadoop set the number of
> splits, regardless of file size, to just 1 by default.
> On 17 August 2010 11:44, Jeff Zhang <[EMAIL PROTECTED]> wrote:
>> What size is your input ? If the input size is large enough, you do
>> not need to worry about the splitting, only one split (the last split)
>> has the different size, all the other splits has the same split.
>> On Tue, Aug 17, 2010 at 7:50 AM, Erik Test <[EMAIL PROTECTED]> wrote:
>> > Hello,
>> > I'm trying to determine how to split a file evenly so each map task has a
>> > similar work load. The input I will have is a list of coordinates like
>> > 2,8
>> > 3,9
>> > 4,10
>> > 5,7
>> > 6,2
>> > 7,3
>> > 8,1
>> > 9,0
>> > 10,4
>> > Since there are 9 inputs in this example, I would like to split the
>> > so that there would be 3 map tasks.
>> > I've been looking into different text input format classes but I'm still
>> > sure how to split the input file the way I would like to.
>> > Does anyone have advice or suggestions how I can go about manipulating
>> > input splits by specifying the number of lines are in an input split?
>> > Erik
>> Best Regards
>> Jeff Zhang