Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How to specify delimiters in MultipleInputPaths


Copy link to this message
-
How to specify delimiters in MultipleInputPaths
I want to use MultipleInputs and use multiple mappers to process different
files.
Let's say in all mappers i want to use KeyValueTextInputFormat. The
challenge is that separator for this input format seems to be set at a job
level.

So if i have two files where one is COMMA separated and the other is TAB
separated, can it be handled?

An example code of what i am trying to do

        Configuration configuration = new Configuration();
        configuration.set("key.value.separator.in.input.line", ",");

        Job job = new Job(configuration, "multiple-inputs-mapper");

        //TODO: how to set different delimiters for KeyValueTextInputFormat
for different Mappers
        MultipleInputs.addInputPath(job, new
Path("src/main/resources/multiinput/input1"),
KeyValueTextInputFormat.class, Mapper1.class);
        MultipleInputs.addInputPath(job, new
Path("src/main/resources/multiinput/input2"),
KeyValueTextInputFormat.class, Mapper2.class);
        job.setReducerClass(ExampleReducer.class);
        job.setNumReduceTasks(2);
        //TODO: How to set delimiter between key and values in the
textinputFormat
        job.setOutputFormatClass(TextOutputFormat.class);

        //set the mapper output types for keys and values as we we have
used TextOutputFormat
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        FileOutputFormat.setOutputPath(job, new
Path("/tmp/multi-input-tweet-join"));
--
Thanks,
- Inder
"You are average of the 5 people you spend the most time with"
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB