Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: Inputformat


Copy link to this message
-
Re: Inputformat
jamal sasha 2013-06-22, 23:06
Then how should I approach this issue?
On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:

> If you try to hammer in a nail (json file) with a screwdriver (
> XMLInputReader) then perhaps the reason it won't work may be that you are
> using the wrong tool?
>  On Jun 21, 2013 11:38 PM, "jamal sasha" <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>>   I am using one of the libraries which rely on InputFormat.
>> Right now, it is reading xml files spanning across mutiple lines.
>> So currently the input format is like:
>>
>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>
>>   public static final String START_TAG = "<page>";
>>   public static final String END_TAG = "</page>";
>>
>>   @Override
>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>> split,
>>       JobConf conf, Reporter reporter) throws IOException {
>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>     return new XMLRecordReader((FileSplit) split, conf);
>>   }
>> }
>> So, in above if the data is like:
>>
>> <page>
>>
>>  soemthing \n
>> somthing \n
>>
>> </page>
>>
>> It process this sort of data..
>>
>>
>> Now, i want to use the same framework but for json files but lasting just
>> single line..
>>
>> So I guess my
>> my START_TAG can be "{"
>>
>> Will my END_TAG be "}\n"
>>
>> it can't be "}" as there can be nested json in this data?
>>
>> Any clues
>> Thanks
>>
>