Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> secondary sort - number of reducers


Copy link to this message
-
Re: secondary sort - number of reducers
Is the hash code of that key  is negative.?
Do something like this

return groupKey.hashCode() & Integer.MAX_VALUE % numParts;

Regards,
Som Shekhar Sharma
+91-8197243810
On Fri, Aug 30, 2013 at 6:25 AM, Adeel Qureshi <[EMAIL PROTECTED]> wrote:
> okay so when i specify the number of reducers e.g. in my example i m using 4
> (for a much smaller data set) it works if I use a single column in my
> composite key .. but if I add multiple columns in the composite key
> separated by a delimi .. it then throws the illegal partition error (keys
> before the pipe are group keys and after the pipe are the sort keys and my
> partioner only uses the group keys
>
> java.io.IOException: Illegal partition for Atlanta:GA|Atlanta:GA:1:Adeel
> (-1)
>         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
>         at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
>         at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:39)
>         at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:1)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> public int getPartition(Text key, HCatRecord record, int numParts) {
> //extract the group key from composite key
> String groupKey = key.toString().split("\\|")[0];
> return groupKey.hashCode() % numParts;
> }
>
>
> On Thu, Aug 29, 2013 at 8:31 PM, Shekhar Sharma <[EMAIL PROTECTED]>
> wrote:
>>
>> No...partitionr decides which keys should go to which reducer...and
>> number of reducers you need to decide...No of reducers depends on
>> factors like number of key value pair, use case etc
>> Regards,
>> Som Shekhar Sharma
>> +91-8197243810
>>
>>
>> On Fri, Aug 30, 2013 at 5:54 AM, Adeel Qureshi <[EMAIL PROTECTED]>
>> wrote:
>> > so it cant figure out an appropriate number of reducers as it does for
>> > mappers .. in my case hadoop is using 2100+ mappers and then only 1
>> > reducer
>> > .. since im overriding the partitioner class shouldnt that decide how
>> > manyredeucers there should be based on how many different partition
>> > values
>> > being returned by the custom partiotioner
>> >
>> >
>> > On Thu, Aug 29, 2013 at 7:38 PM, Ian Wrigley <[EMAIL PROTECTED]> wrote:
>> >>
>> >> If you don't specify the number of Reducers, Hadoop will use the
>> >> default
>> >> -- which, unless you've changed it, is 1.
>> >>
>> >> Regards
>> >>
>> >> Ian.
>> >>
>> >> On Aug 29, 2013, at 4:23 PM, Adeel Qureshi <[EMAIL PROTECTED]>
>> >> wrote:
>> >>
>> >> I have implemented secondary sort in my MR job and for some reason if i
>> >> dont specify the number of reducers it uses 1 which doesnt seems right
>> >> because im working with 800M+ records and one reducer slows things down
>> >> significantly. Is this some kind of limitation with the secondary sort
>> >> that
>> >> it has to use a single reducer .. that kind of would defeat the purpose
>> >> of
>> >> having a scalable solution such as secondary sort. I would appreciate
>> >> any
>> >> help.
>> >>
>> >> Thanks
>> >> Adeel
>> >>
>> >>
>> >>
>> >> ---
>> >> Ian Wrigley
>> >> Sr. Curriculum Manager
>> >> Cloudera, Inc
>> >> Cell: (323) 819 4075
>> >>
>> >
>
>