Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: Inputformat


Copy link to this message
-
Re: Inputformat
Niels Basjes 2013-06-21, 23:25
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
 On Jun 21, 2013 11:38 PM, "jamal sasha" <[EMAIL PROTECTED]> wrote:

> Hi,
>
>   I am using one of the libraries which rely on InputFormat.
> Right now, it is reading xml files spanning across mutiple lines.
> So currently the input format is like:
>
> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>
>   public static final String START_TAG = "<page>";
>   public static final String END_TAG = "</page>";
>
>   @Override
>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
>       JobConf conf, Reporter reporter) throws IOException {
>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>     return new XMLRecordReader((FileSplit) split, conf);
>   }
> }
> So, in above if the data is like:
>
> <page>
>
>  soemthing \n
> somthing \n
>
> </page>
>
> It process this sort of data..
>
>
> Now, i want to use the same framework but for json files but lasting just
> single line..
>
> So I guess my
> my START_TAG can be "{"
>
> Will my END_TAG be "}\n"
>
> it can't be "}" as there can be nested json in this data?
>
> Any clues
> Thanks
>