|
|
-
Setting number of mappers according to number of TextInput lines
Ondřej Klimpera 2012-06-16, 09:01
Hello,
I have very small input size (kB), but processing to produce some output takes several minutes. Is there a way how to say, file has 100 lines, i need 10 mappers, where each mapper node has to process 10 lines of input file?
Thanks for advice. Ondrej Klimpera
-
Re: Setting number of mappers according to number of TextInput lines
Bejoy KS 2012-06-16, 09:27
Hi Ondrej
You can use NLineInputFormat with n set to 10.
------Original Message------ From: Ondřej Klimpera To: [EMAIL PROTECTED] ReplyTo: [EMAIL PROTECTED] Subject: Setting number of mappers according to number of TextInput lines Sent: Jun 16, 2012 14:31
Hello,
I have very small input size (kB), but processing to produce some output takes several minutes. Is there a way how to say, file has 100 lines, i need 10 mappers, where each mapper node has to process 10 lines of input file?
Thanks for advice. Ondrej Klimpera Regards Bejoy KS
Sent from handheld, please excuse typos.
-
Re: Setting number of mappers according to number of TextInput lines
Ondřej Klimpera 2012-06-16, 09:31
I tried this approach, but the job is not distributed among 10 mapper nodes. Seems Hadoop ignores this property :(
My first thought is, that the small file size is the problem and Hadoop doesn't care about it's splitting in proper way.
Thanks any ideas. On 06/16/2012 11:27 AM, Bejoy KS wrote: > Hi Ondrej > > You can use NLineInputFormat with n set to 10. > > ------Original Message------ > From: Ondřej Klimpera > To: [EMAIL PROTECTED] > ReplyTo: [EMAIL PROTECTED] > Subject: Setting number of mappers according to number of TextInput lines > Sent: Jun 16, 2012 14:31 > > Hello, > > I have very small input size (kB), but processing to produce some output > takes several minutes. Is there a way how to say, file has 100 lines, i > need 10 mappers, where each mapper node has to process 10 lines of input > file? > > Thanks for advice. > Ondrej Klimpera > > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. >
-
Re: Setting number of mappers according to number of TextInput lines
Edward Capriolo 2012-06-16, 16:12
No. The number of lines is not known at planning time. All you know is the size of the blocks. You want to look at mapred.max.split.size .
On Sat, Jun 16, 2012 at 5:31 AM, Ondřej Klimpera <[EMAIL PROTECTED]> wrote: > I tried this approach, but the job is not distributed among 10 mapper nodes. > Seems Hadoop ignores this property :( > > My first thought is, that the small file size is the problem and Hadoop > doesn't care about it's splitting in proper way. > > Thanks any ideas. > > > On 06/16/2012 11:27 AM, Bejoy KS wrote: >> >> Hi Ondrej >> >> You can use NLineInputFormat with n set to 10. >> >> ------Original Message------ >> From: Ondřej Klimpera >> To: [EMAIL PROTECTED] >> ReplyTo: [EMAIL PROTECTED] >> Subject: Setting number of mappers according to number of TextInput lines >> Sent: Jun 16, 2012 14:31 >> >> Hello, >> >> I have very small input size (kB), but processing to produce some output >> takes several minutes. Is there a way how to say, file has 100 lines, i >> need 10 mappers, where each mapper node has to process 10 lines of input >> file? >> >> Thanks for advice. >> Ondrej Klimpera >> >> >> Regards >> Bejoy KS >> >> Sent from handheld, please excuse typos. >> >
-
Re: Setting number of mappers according to number of TextInput lines
Shi Yu 2012-06-16, 22:33
How did you try it? I had no problem with NLineInputFormat. It just works exactly as expected.
Shi
-
Re: Setting number of mappers according to number of TextInput lines
Harsh J 2012-06-17, 03:02
Ondřej,
While NLineInputFormat will indeed give you N lines per task, it does not guarantee that the N map tasks that come out for a file from it will all be sent to different nodes. Which one is your need exactly - Simply having N lines per map task, or N wider distributed maps?
On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera <[EMAIL PROTECTED]> wrote: > I tried this approach, but the job is not distributed among 10 mapper nodes. > Seems Hadoop ignores this property :( > > My first thought is, that the small file size is the problem and Hadoop > doesn't care about it's splitting in proper way. > > Thanks any ideas. > > > > On 06/16/2012 11:27 AM, Bejoy KS wrote: >> >> Hi Ondrej >> >> You can use NLineInputFormat with n set to 10. >> >> ------Original Message------ >> From: Ondřej Klimpera >> To: [EMAIL PROTECTED] >> ReplyTo: [EMAIL PROTECTED] >> Subject: Setting number of mappers according to number of TextInput lines >> Sent: Jun 16, 2012 14:31 >> >> Hello, >> >> I have very small input size (kB), but processing to produce some output >> takes several minutes. Is there a way how to say, file has 100 lines, i >> need 10 mappers, where each mapper node has to process 10 lines of input >> file? >> >> Thanks for advice. >> Ondrej Klimpera >> >> >> Regards >> Bejoy KS >> >> Sent from handheld, please excuse typos. >> >
-- Harsh J
-
Re: Setting number of mappers according to number of TextInput lines
Ondřej Klimpera 2012-06-17, 07:35
Hi, I made some progress, combination of NLineInputFormat and mapre.max.split.size seems to work, but it is hard to exactly set the byte value. Input lines have from 64 to 1024 bytes approx.
What I need is having as much mappers as possible (use full potential of the cluster), where each receives N input lines. On 06/17/2012 05:02 AM, Harsh J wrote: > Ondřej, > > While NLineInputFormat will indeed give you N lines per task, it does > not guarantee that the N map tasks that come out for a file from it > will all be sent to different nodes. Which one is your need exactly - > Simply having N lines per map task, or N wider distributed maps? > > On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera<[EMAIL PROTECTED]> wrote: >> I tried this approach, but the job is not distributed among 10 mapper nodes. >> Seems Hadoop ignores this property :( >> >> My first thought is, that the small file size is the problem and Hadoop >> doesn't care about it's splitting in proper way. >> >> Thanks any ideas. >> >> >> >> On 06/16/2012 11:27 AM, Bejoy KS wrote: >>> Hi Ondrej >>> >>> You can use NLineInputFormat with n set to 10. >>> >>> ------Original Message------ >>> From: Ondřej Klimpera >>> To: [EMAIL PROTECTED] >>> ReplyTo: [EMAIL PROTECTED] >>> Subject: Setting number of mappers according to number of TextInput lines >>> Sent: Jun 16, 2012 14:31 >>> >>> Hello, >>> >>> I have very small input size (kB), but processing to produce some output >>> takes several minutes. Is there a way how to say, file has 100 lines, i >>> need 10 mappers, where each mapper node has to process 10 lines of input >>> file? >>> >>> Thanks for advice. >>> Ondrej Klimpera >>> >>> >>> Regards >>> Bejoy KS >>> >>> Sent from handheld, please excuse typos. >>> > >
-
Re: Setting number of mappers according to number of TextInput lines
Sachin Aggarwal 2012-06-21, 05:05
use like this
FileInputFormat.setMaxInputSplitSize(job, 2097152); FileInputFormat.setMinInputSplitSize(job, 1048576);
size in bytes or u can write ur on split function google it.
On Sun, Jun 17, 2012 at 1:05 PM, Ondřej Klimpera <[EMAIL PROTECTED]>wrote:
> Hi, I made some progress, combination of NLineInputFormat and > mapre.max.split.size seems to work, but it is hard to exactly set the byte > value. Input lines have from 64 to 1024 bytes approx. > > What I need is having as much mappers as possible (use full potential of > the cluster), where each receives N input lines. > > > > On 06/17/2012 05:02 AM, Harsh J wrote: > >> Ondřej, >> >> While NLineInputFormat will indeed give you N lines per task, it does >> not guarantee that the N map tasks that come out for a file from it >> will all be sent to different nodes. Which one is your need exactly - >> Simply having N lines per map task, or N wider distributed maps? >> >> On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera<[EMAIL PROTECTED]> >> wrote: >> >>> I tried this approach, but the job is not distributed among 10 mapper >>> nodes. >>> Seems Hadoop ignores this property :( >>> >>> My first thought is, that the small file size is the problem and Hadoop >>> doesn't care about it's splitting in proper way. >>> >>> Thanks any ideas. >>> >>> >>> >>> On 06/16/2012 11:27 AM, Bejoy KS wrote: >>> >>>> Hi Ondrej >>>> >>>> You can use NLineInputFormat with n set to 10. >>>> >>>> ------Original Message------ >>>> From: Ondřej Klimpera >>>> To: [EMAIL PROTECTED] >>>> ReplyTo: [EMAIL PROTECTED] >>>> Subject: Setting number of mappers according to number of TextInput >>>> lines >>>> Sent: Jun 16, 2012 14:31 >>>> >>>> Hello, >>>> >>>> I have very small input size (kB), but processing to produce some output >>>> takes several minutes. Is there a way how to say, file has 100 lines, i >>>> need 10 mappers, where each mapper node has to process 10 lines of input >>>> file? >>>> >>>> Thanks for advice. >>>> Ondrej Klimpera >>>> >>>> >>>> Regards >>>> Bejoy KS >>>> >>>> Sent from handheld, please excuse typos. >>>> >>>> >> >> > --
Thanks & Regards
Sachin Aggarwal 7760502772
|
|