Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Inputformat


Hi,

  I am using one of the libraries which rely on InputFormat.
Right now, it is reading xml files spanning across mutiple lines.
So currently the input format is like:

public class XMLInputReader extends FileInputFormat<LongWritable, Text> {

  public static final String START_TAG = "<page>";
  public static final String END_TAG = "</page>";

  @Override
  public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
      JobConf conf, Reporter reporter) throws IOException {
    conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
    conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
    return new XMLRecordReader((FileSplit) split, conf);
  }
}
So, in above if the data is like:

<page>

 soemthing \n
somthing \n

</page>

It process this sort of data..
Now, i want to use the same framework but for json files but lasting just
single line..

So I guess my
my START_TAG can be "{"

Will my END_TAG be "}\n"

it can't be "}" as there can be nested json in this data?

Any clues
Thanks
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB