Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Implementing a total sort over avro data


+
Steven Willis 2012-08-15, 21:33
Copy link to this message
-
Re: Implementing a total sort over avro data
Hey Steven,

This is a genuine bug in MR. I've filed
https://issues.apache.org/jira/browse/MAPREDUCE-4574 and will work on
it by this week's end.

On Thu, Aug 16, 2012 at 3:03 AM, Steven Willis <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I was wondering if it was possible to implement a total sort using the InputSampler.RandomSampler and TotalOrderPartitioner with avro mapreduce? I tried adding the following lines to my job:
>
> InputSampler.Sampler<AvroKey, AvroValue> sampler = new InputSampler.RandomSampler<AvroKey, AvroValue>(0.1, 10000, 10);
> InputSampler.writePartitionFile(jobConf, sampler);
> jobConf.setPartitionerClass(TotalOrderPartitioner.class);
> DistributedCache.addCacheFile(new URI(TotalOrderPartitioner.getPartitionFile(jobConf)), jobConf);
>
> But that just gives me:
>
> 12/08/15 17:23:05 INFO partition.InputSampler: Using 10000 samples
> Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.avro.mapred.AvroWrapper
>         at org.apache.avro.mapred.AvroKeyComparator.compare(AvroKeyComparator.java:30)
>         at java.util.Arrays.mergeSort(Arrays.java:1270)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.mergeSort(Arrays.java:1281)
>         at java.util.Arrays.sort(Arrays.java:1210)
>         at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:324)
>         at org.apache.hadoop.mapred.lib.InputSampler.writePartitionFile(InputSampler.java:39)
>         at com.compete.avro.ParallelDataPull.run(ParallelDataPull.java:223)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>         at com.compete.avro.ParallelDataPull.main(ParallelDataPull.java:55)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
> -Steven Willis

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB