Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Simple map-only job to create Block Sequence Files compressed with Snappy


Copy link to this message
-
Simple map-only job to create Block Sequence Files compressed with Snappy
Hi there,

I am trying to create a map-only job which takes as input some log files
and simply converts them into sequence files compressed with Snappy.

Although the job runs with no error - the output file that is created is
pretty much the same size as the file I started with. Really confused!

I've pasted the full script and the hadoop output below

The output file is just named part-m-00000 - this is the resultant map
output file that seems to have the same size as the input file

thanks!
Peter
public class snappyMapOutput {
 public static class MapFunction

extends Mapper<Object, Text, LongWritable, Text>{

  public void map(LongWritable key, Text value, Context context

) throws IOException, InterruptedException {
 context.write(key, value);

}

}
 public static void main(String[] args) throws Exception {
 Configuration conf = new Configuration();

String[] otherArgs = new GenericOptionsParser(conf,
args).getRemainingArgs();
 conf.setBoolean("mapred.compress.map.output", true);

conf.set("mapred.map.output.compression.codec",
"org.apache.hadoop.io.compress.SnappyCodec");

conf.set("mapred.output.compression.type", "BLOCK");

  Job job = new Job(conf, "Convert to BLOCK Sequence File Snappy Compressed"
);

job.setJarByClass(snappyMapOutput.class);

  job.setMapperClass(MapFunction.class);

job.setNumReduceTasks(0);
  job.setMapOutputKeyClass(LongWritable.class);

job.setMapOutputValueClass(Text.class);
 job.setOutputFormatClass(SequenceFileOutputFormat.class);

 FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
 System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

13/01/11 19:19:38 INFO input.FileInputFormat: Total input paths to process
: 1
13/01/11 19:19:38 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/01/11 19:19:38 INFO lzo.LzoCodec: Successfully loaded & initialized
native-lzo library [hadoop-lzo rev 6bb1b7f8b9044d8df9b4d2b6641db7658aab3cf8]
13/01/11 19:19:38 WARN snappy.LoadSnappy: Snappy native library is available
13/01/11 19:19:38 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
13/01/11 19:19:38 INFO snappy.LoadSnappy: Snappy native library loaded
13/01/11 19:19:39 INFO mapred.JobClient: Running job: job_201301111838_0006
13/01/11 19:19:40 INFO mapred.JobClient:  map 0% reduce 0%
13/01/11 19:19:45 INFO mapred.JobClient:  map 100% reduce 0%
13/01/11 19:19:45 INFO mapred.JobClient: Job complete: job_201301111838_0006
13/01/11 19:19:45 INFO mapred.JobClient: Counters: 19
13/01/11 19:19:45 INFO mapred.JobClient:   Job Counters
13/01/11 19:19:45 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=4566
13/01/11 19:19:45 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
13/01/11 19:19:45 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
13/01/11 19:19:45 INFO mapred.JobClient:     Launched map tasks=1
13/01/11 19:19:45 INFO mapred.JobClient:     Data-local map tasks=1
13/01/11 19:19:45 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/01/11 19:19:45 INFO mapred.JobClient:   File Output Format Counters
13/01/11 19:19:45 INFO mapred.JobClient:     Bytes Written=72951075
13/01/11 19:19:45 INFO mapred.JobClient:   FileSystemCounters
13/01/11 19:19:45 INFO mapred.JobClient:     HDFS_BYTES_READ=70983803
13/01/11 19:19:45 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=24107
13/01/11 19:19:45 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=72951075
13/01/11 19:19:45 INFO mapred.JobClient:   File Input Format Counters
13/01/11 19:19:45 INFO mapred.JobClient:     Bytes Read=70983680
13/01/11 19:19:45 INFO mapred.JobClient:   Map-Reduce Framework
13/01/11 19:19:45 INFO mapred.JobClient:     Map input records=79756
13/01/11 19:19:45 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=109174784
13/01/11 19:19:45 INFO mapred.JobClient:     Spilled Records=0
13/01/11 19:19:45 INFO mapred.JobClient:     CPU time spent (ms)=2040
13/01/11 19:19:45 INFO mapred.JobClient:     Total committed heap usage
(bytes)=187105280
13/01/11 19:19:45 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=1084190720
13/01/11 19:19:45 INFO mapred.JobClient:     Map output records=79756
13/01/11 19:19:45 INFO mapred.JobClient:     SPLIT_RAW_BYTES=123
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB