Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Inputformat


Copy link to this message
-
Re: Inputformat
you had to write a JSONInputFormat, or google first to find it.

--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, "jamal sasha" <[EMAIL PROTECTED]> wrote:

> Then how should I approach this issue?
>
>
> On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:
>
>> If you try to hammer in a nail (json file) with a screwdriver (
>> XMLInputReader) then perhaps the reason it won't work may be that you are
>> using the wrong tool?
>>  On Jun 21, 2013 11:38 PM, "jamal sasha" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>>   I am using one of the libraries which rely on InputFormat.
>>> Right now, it is reading xml files spanning across mutiple lines.
>>> So currently the input format is like:
>>>
>>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>>
>>>   public static final String START_TAG = "<page>";
>>>   public static final String END_TAG = "</page>";
>>>
>>>   @Override
>>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>>> split,
>>>       JobConf conf, Reporter reporter) throws IOException {
>>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>>     return new XMLRecordReader((FileSplit) split, conf);
>>>   }
>>> }
>>> So, in above if the data is like:
>>>
>>> <page>
>>>
>>>  soemthing \n
>>> somthing \n
>>>
>>> </page>
>>>
>>> It process this sort of data..
>>>
>>>
>>> Now, i want to use the same framework but for json files but lasting
>>> just single line..
>>>
>>> So I guess my
>>> my START_TAG can be "{"
>>>
>>> Will my END_TAG be "}\n"
>>>
>>> it can't be "}" as there can be nested json in this data?
>>>
>>> Any clues
>>> Thanks
>>>
>>
>