|
wd
2010-06-09, 03:20
wd
2010-06-09, 07:04
Edward Capriolo
2010-06-09, 17:58
wd
2010-06-10, 01:55
Edward Capriolo
2010-06-10, 02:07
Alex Kozlov
2010-06-10, 04:15
Namit Jain
2010-06-10, 05:20
wd
2010-06-10, 06:21
wd
2010-06-10, 06:52
Edward Capriolo
2010-06-10, 13:09
wd
2010-06-11, 02:26
|
-
set mapred.map.tasks=1 not workwd 2010-06-09, 03:20
hi,
I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive cli, but seemes it doesn't work, total map tasks still over 300+. Is this a svn version problem?
-
Re: set mapred.map.tasks=1 not workwd 2010-06-09, 07:04
I've tried hive 0.5, the option not work too.
And find this page[ http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results] via google. 2010/6/9 wd <[EMAIL PROTECTED]> > hi, > > I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive > cli, but seemes it doesn't work, total map tasks still over 300+. > > Is this a svn version problem? >
-
Re: set mapred.map.tasks=1 not workEdward Capriolo 2010-06-09, 17:58
On Wed, Jun 9, 2010 at 3:04 AM, wd <[EMAIL PROTECTED]> wrote:
> I've tried hive 0.5, the option not work too. > And find this page[ > http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results] > via google. > > 2010/6/9 wd <[EMAIL PROTECTED]> > > hi, >> >> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive >> cli, but seemes it doesn't work, total map tasks still over 300+. >> >> Is this a svn version problem? >> > > You answered your own question, look in the link "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks. " Map tasks is based on the number of input files and folders. Even though hive uses a CombinedInput format you still can get a number of mappers. Edward
-
Re: set mapred.map.tasks=1 not workwd 2010-06-10, 01:55
I have lots of small files in hive, the mapred is too slow .... Is there a
way to improve the speed ? 2010/6/10 Edward Capriolo <[EMAIL PROTECTED]> > > > On Wed, Jun 9, 2010 at 3:04 AM, wd <[EMAIL PROTECTED]> wrote: > >> I've tried hive 0.5, the option not work too. >> And find this page[ >> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results] >> via google. >> >> 2010/6/9 wd <[EMAIL PROTECTED]> >> >> hi, >>> >>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive >>> cli, but seemes it doesn't work, total map tasks still over 300+. >>> >>> Is this a svn version problem? >>> >> >> > You answered your own question, look in the link > > "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks. > " > > Map tasks is based on the number of input files and folders. Even though > hive uses a CombinedInput format you still can get a number of mappers. > > Edward >
-
Re: set mapred.map.tasks=1 not workEdward Capriolo 2010-06-10, 02:07
On Wed, Jun 9, 2010 at 9:55 PM, wd <[EMAIL PROTECTED]> wrote:
> I have lots of small files in hive, the mapred is too slow .... Is there a > way to improve the speed ? > > 2010/6/10 Edward Capriolo <[EMAIL PROTECTED]> > > >> >> On Wed, Jun 9, 2010 at 3:04 AM, wd <[EMAIL PROTECTED]> wrote: >> >>> I've tried hive 0.5, the option not work too. >>> And find this page[ >>> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results] >>> via google. >>> >>> 2010/6/9 wd <[EMAIL PROTECTED]> >>> >>> hi, >>>> >>>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive >>>> cli, but seemes it doesn't work, total map tasks still over 300+. >>>> >>>> Is this a svn version problem? >>>> >>> >>> >> You answered your own question, look in the link >> >> "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks. >> " >> >> Map tasks is based on the number of input files and folders. Even though >> hive uses a CombinedInput format you still can get a number of mappers. >> >> Edward >> > > With hadoop 20 and the Combine InputFormat you should get fairly decent performance even with many small files. My current employer is about to open source FileCrusher, a stand alone and map reduce application that merges Text and Sequence files into one big one. So if you hang tight for a couple days a can point you at a utility that might help.
-
Re: set mapred.map.tasks=1 not workAlex Kozlov 2010-06-10, 04:15
Hi Wd,
Try: *hive.merge.mapfiles*=true *hive.merge.size.per.task*=1000000 (or some other large number) Alex K On Wed, Jun 9, 2010 at 6:55 PM, wd <[EMAIL PROTECTED]> wrote: > I have lots of small files in hive, the mapred is too slow .... Is there a > way to improve the speed ? > > 2010/6/10 Edward Capriolo <[EMAIL PROTECTED]> > >> >> >> On Wed, Jun 9, 2010 at 3:04 AM, wd <[EMAIL PROTECTED]> wrote: >> >>> I've tried hive 0.5, the option not work too. >>> And find this page[ >>> http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results] >>> via google. >>> >>> 2010/6/9 wd <[EMAIL PROTECTED]> >>> >>> hi, >>>> >>>> I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive >>>> cli, but seemes it doesn't work, total map tasks still over 300+. >>>> >>>> Is this a svn version problem? >>>> >>> >>> >> You answered your own question, look in the link >> >> "You cannot force *mapred.map.tasks* but can specify mapred.reduce.tasks. >> " >> >> Map tasks is based on the number of input files and folders. Even though >> hive uses a CombinedInput format you still can get a number of mappers. >> >> Edward >> > >
-
RE: set mapred.map.tasks=1 not workNamit Jain 2010-06-10, 05:20
use CombineHiveInputFormat
check your hive.input.format ________________________________________ From: Alex Kozlov [[EMAIL PROTECTED]] Sent: Wednesday, June 09, 2010 9:15 PM To: [EMAIL PROTECTED] Subject: Re: set mapred.map.tasks=1 not work Hi Wd, Try: hive.merge.mapfiles=true hive.merge.size.per.task=1000000 (or some other large number) Alex K On Wed, Jun 9, 2010 at 6:55 PM, wd <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: I have lots of small files in hive, the mapred is too slow .... Is there a way to improve the speed ? 2010/6/10 Edward Capriolo <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> On Wed, Jun 9, 2010 at 3:04 AM, wd <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: I've tried hive 0.5, the option not work too. And find this page[http://markmail.org/message/k32nrcb2ncsq67ef?q=mapred.map.tasks+#query:mapred.map.tasks%20+page:1+mid:k32nrcb2ncsq67ef+state:results] via google. 2010/6/9 wd <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> hi, I'm using hive svn rev946854. And try to set mapred.map.tasks=1 at hive cli, but seemes it doesn't work, total map tasks still over 300+. Is this a svn version problem? You answered your own question, look in the link "You cannot force mapred.map.tasks but can specify mapred.reduce.tasks. " Map tasks is based on the number of input files and folders. Even though hive uses a CombinedInput format you still can get a number of mappers. Edward
-
Re: set mapred.map.tasks=1 not workwd 2010-06-10, 06:21
Thanks everyone, I'll try CombineHiveInputFormat. :)
2010/6/10 Namit Jain <[EMAIL PROTECTED]> > CombineHiveInputFormat >
-
Re: set mapred.map.tasks=1 not workwd 2010-06-10, 06:52
set *hive.input.format=*org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
and set hive.merge.size.per.task=1000000; set hive.merge.mapfiles=true; seames all useless here, time token for execute 'select a, count(1) from t1 group by a' is almost the same. Have I missed some other settings ? 2010/6/10 wd <[EMAIL PROTECTED]> > Thanks everyone, I'll try CombineHiveInputFormat. :) > > 2010/6/10 Namit Jain <[EMAIL PROTECTED]> > >> CombineHiveInputFormat >> > >
-
Re: set mapred.map.tasks=1 not workEdward Capriolo 2010-06-10, 13:09
Also consider setting up jvm reuse this will deal with some mapper
startup penalty. How long is you query taking how much data is there? How many nodes? On Thursday, June 10, 2010, wd <[EMAIL PROTECTED]> wrote: > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > > and > > set hive.merge.size.per.task=1000000; > set hive.merge.mapfiles=true; > > seames all useless here, time token for execute 'select a, count(1) from t1 group by a' is almost the same. > > Have I missed some other settings ? > > 2010/6/10 wd <[EMAIL PROTECTED]> > > Thanks everyone, I'll try CombineHiveInputFormat. :) > > 2010/6/10 Namit Jain <[EMAIL PROTECTED]> > > > CombineHiveInputFormat > >
-
Re: set mapred.map.tasks=1 not workwd 2010-06-11, 02:26
I've tried jvm reuse, useless too..
Total time is about 130s, data only 10M and all small files, 2 nodes. hive/hadoop will run 350+ maps ... 2010/6/10 Edward Capriolo <[EMAIL PROTECTED]> > Also consider setting up jvm reuse this will deal with some mapper > startup penalty. > > How long is you query taking how much data is there? How many nodes? > > On Thursday, June 10, 2010, wd <[EMAIL PROTECTED]> wrote: > > set > hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > > > > and > > > > set hive.merge.size.per.task=1000000; > > set hive.merge.mapfiles=true; > > > > seames all useless here, time token for execute 'select a, count(1) from > t1 group by a' is almost the same. > > > > Have I missed some other settings ? > > > > 2010/6/10 wd <[EMAIL PROTECTED]> > > > > Thanks everyone, I'll try CombineHiveInputFormat. :) > > > > 2010/6/10 Namit Jain <[EMAIL PROTECTED]> > > > > > > CombineHiveInputFormat > > > > > |