Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Processing xml documents using StreamXmlRecordReader


+
Mohammad Tariq 2012-06-18, 21:19
+
madhu phatak 2012-06-19, 10:58
+
Mohammad Tariq 2012-06-19, 11:05
+
Mohammad Tariq 2012-06-19, 11:19
Copy link to this message
-
Re: Processing xml documents using StreamXmlRecordReader
Seems like StreamInputFormat not yet ported to new API.That's why you are
not able to set as InputFormatClass. You can file a  jira for this issue.

On Tue, Jun 19, 2012 at 4:49 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> My driver function looks like this -
>
> public static void main(String[] args) throws IOException,
> InterruptedException, ClassNotFoundException {
>                // TODO Auto-generated method stub
>
>                Configuration conf = new Configuration();
>                Job job = new Job();
>                conf.set("stream.recordreader.class",
> "org.apache.hadoop.streaming.StreamXmlRecordReader");
>                conf.set("stream.recordreader.begin", "<info>");
>                conf.set("stream.recordreader.end", "</info>");
>                job.setInputFormatClass(StreamInputFormat.class);
>                job.setOutputKeyClass(Text.class);
>                job.setOutputValueClass(IntWritable.class);
>                FileInputFormat.addInputPath(job, new
> Path("/mapin/demo.xml"));
>                FileOutputFormat.setOutputPath(job, new
> Path("/mapout/demo"));
>                job.waitForCompletion(true);
>        }
>
> Could you please out my mistake??
>
> Regards,
>     Mohammad Tariq
>
>
> On Tue, Jun 19, 2012 at 4:35 PM, Mohammad Tariq <[EMAIL PROTECTED]>
> wrote:
> > Hello Madhu,
> >
> >             Thanks for the response. Actually I was trying to use the
> > new API (Job). Have you tried that. I was not able to set the
> > InputFormat using the Job API.
> >
> > Regards,
> >     Mohammad Tariq
> >
> >
> > On Tue, Jun 19, 2012 at 4:28 PM, madhu phatak <[EMAIL PROTECTED]>
> wrote:
> >> Hi,
> >>  Set the following properties in driver class
> >>
> >>   jobConf.set("stream.recordreader.class",
> >> "org.apache.hadoop.streaming.StreamXmlRecordReader");
> >> jobConf.set("stream.recordreader.begin",
> >> "start-tag");
> >> jobConf.set("stream.recordreader.end",
> >> "end-tag");
> >>                         jobConf.setInputFormat(StreamInputFormat,class);
> >>
> >>  In Mapper, xml record will come as key of type Text,so your mapper will
> >> look like
> >>
> >>   public class MyMapper<K,V>  implements Mapper<Text,Text,K,V>
> >>
> >>
> >> On Tue, Jun 19, 2012 at 2:49 AM, Mohammad Tariq <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>> Hello list,
> >>>
> >>>        Could anyone, who has written MapReduce jobs to process xml
> >>> documents stored in there cluster using "StreamXmlRecordReader" share
> >>> his/her experience??...or if you can provide me some pointers
> >>> addressing that..Many thanks.
> >>>
> >>> Regards,
> >>>     Mohammad Tariq
> >>
> >>
> >>
> >>
> >> --
> >> https://github.com/zinnia-phatak-dev/Nectar
> >>
>

--
https://github.com/zinnia-phatak-dev/Nectar
+
Mohammad Tariq 2012-06-19, 12:24
+
Mohammad Tariq 2012-06-19, 12:28
+
madhu phatak 2012-06-19, 12:41
+
madhu phatak 2012-06-21, 07:07
+
Mohammad Tariq 2012-06-21, 07:12