Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> How to specify delimiters in MultipleInputPaths


Copy link to this message
-
How to specify delimiters in MultipleInputPaths
I want to use MultipleInputs and use multiple mappers to process different
files.
Let's say in all mappers i want to use KeyValueTextInputFormat. The
challenge is that separator for this input format seems to be set at a job
level.

So if i have two files where one is COMMA separated and the other is TAB
separated, can it be handled?

An example code of what i am trying to do

        Configuration configuration = new Configuration();
        configuration.set("key.value.separator.in.input.line", ",");

        Job job = new Job(configuration, "multiple-inputs-mapper");

        //TODO: how to set different delimiters for KeyValueTextInputFormat
for different Mappers
        MultipleInputs.addInputPath(job, new
Path("src/main/resources/multiinput/input1"),
KeyValueTextInputFormat.class, Mapper1.class);
        MultipleInputs.addInputPath(job, new
Path("src/main/resources/multiinput/input2"),
KeyValueTextInputFormat.class, Mapper2.class);
        job.setReducerClass(ExampleReducer.class);
        job.setNumReduceTasks(2);
        //TODO: How to set delimiter between key and values in the
textinputFormat
        job.setOutputFormatClass(TextOutputFormat.class);

        //set the mapper output types for keys and values as we we have
used TextOutputFormat
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        FileOutputFormat.setOutputPath(job, new
Path("/tmp/multi-input-tweet-join"));
--
Thanks,
- Inder
"You are average of the 5 people you spend the most time with"