Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Deserializing into multiple records


Copy link to this message
-
Re: Deserializing into multiple records
Use avro or protobuf support.

On Tuesday, April 8, 2014, Petter von Dolwitz (Hem) <
[EMAIL PROTECTED]> wrote:
solutions for dealing with JSON data in hive fields but nothing I saw
actually decomposes nested JSON into a set of discreet records. Its super
useful for us.
[EMAIL PROTECTED]> wrote:
org.apache.hadoop.mapred.FileInputFormat) accompanied by a custom
RecordReader (implements org.apache.hadoop.mapred.RecordReader). The
RecordReader will be used to read your documents and from there you can
decide which units you will return as records (return by the next()
method). You'll still probably need a SerDe that transforms your data into
Hive data types using 1:1 mapping.
runs (and possible in the results) to avoid JOIN operations but the raw
files will not contain duplicate data.
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
being able to query. Each single document logically breaks down into a set
of individual records. In order to use Hive, we preprocess each input
document into a set of discreet records, which we save on HDFS and create
an external table on top of.
records. It would be much more efficient to deserialize the document into a
set of records when a query is made. That way, we can just save the raw
documents on HDFS.
be a 1:1 relationship. Is there any way to deserialize a record into
multiple records?

Sorry this was sent from mobile. Will do less grammar and spell check than
usual.