Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Inputformat


Copy link to this message
-
Re: Inputformat
you had to write a JSONInputFormat, or google first to find it.

--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, "jamal sasha" <[EMAIL PROTECTED]> wrote:

> Then how should I approach this issue?
>
>
> On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:
>
>> If you try to hammer in a nail (json file) with a screwdriver (
>> XMLInputReader) then perhaps the reason it won't work may be that you are
>> using the wrong tool?
>>  On Jun 21, 2013 11:38 PM, "jamal sasha" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>>   I am using one of the libraries which rely on InputFormat.
>>> Right now, it is reading xml files spanning across mutiple lines.
>>> So currently the input format is like:
>>>
>>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>>
>>>   public static final String START_TAG = "<page>";
>>>   public static final String END_TAG = "</page>";
>>>
>>>   @Override
>>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>>> split,
>>>       JobConf conf, Reporter reporter) throws IOException {
>>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>>     return new XMLRecordReader((FileSplit) split, conf);
>>>   }
>>> }
>>> So, in above if the data is like:
>>>
>>> <page>
>>>
>>>  soemthing \n
>>> somthing \n
>>>
>>> </page>
>>>
>>> It process this sort of data..
>>>
>>>
>>> Now, i want to use the same framework but for json files but lasting
>>> just single line..
>>>
>>> So I guess my
>>> my START_TAG can be "{"
>>>
>>> Will my END_TAG be "}\n"
>>>
>>> it can't be "}" as there can be nested json in this data?
>>>
>>> Any clues
>>> Thanks
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB