Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Inputformat


Copy link to this message
-
Re: Inputformat
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
 On Jun 21, 2013 11:38 PM, "jamal sasha" <[EMAIL PROTECTED]> wrote:

> Hi,
>
>   I am using one of the libraries which rely on InputFormat.
> Right now, it is reading xml files spanning across mutiple lines.
> So currently the input format is like:
>
> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>
>   public static final String START_TAG = "<page>";
>   public static final String END_TAG = "</page>";
>
>   @Override
>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
>       JobConf conf, Reporter reporter) throws IOException {
>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>     return new XMLRecordReader((FileSplit) split, conf);
>   }
> }
> So, in above if the data is like:
>
> <page>
>
>  soemthing \n
> somthing \n
>
> </page>
>
> It process this sort of data..
>
>
> Now, i want to use the same framework but for json files but lasting just
> single line..
>
> So I guess my
> my START_TAG can be "{"
>
> Will my END_TAG be "}\n"
>
> it can't be "}" as there can be nested json in this data?
>
> Any clues
> Thanks
>
+
jamal sasha 2013-06-22, 23:06
+
Azuryy Yu 2013-06-23, 04:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB