Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Trying to write a custom HiveOutputFormat


Copy link to this message
-
Re: Trying to write a custom HiveOutputFormat
You could also look at the OrcSerde and how it works.

https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java

Basically, OrcSerde on "serialize" just wraps the row and object inspector
in a fake writable. That is passed down to the OutputFormat. On
"deserialize" it does the reverse and just passes back the object from the
InputFormat.

-- Owen
On Mon, May 13, 2013 at 6:54 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> You need to use a combination of output format and serde, this might allow
> you to do something like present struct objects to the input format rather
> then Text objects.
>
> You may want to take a look at the protobuf input format we use:
> https://github.com/edwardcapriolo/hive-protobuf/
>
> You could reverse the logic here and design an output format.
>
>
> On Mon, May 13, 2013 at 8:14 AM, Rui Martins <[EMAIL PROTECTED]>wrote:
>
>> Hi guys,
>>
>> I'm currently writing my on HiveOutputFormat as I would like to write the
>> output of hive queries into a specific protobuf format my team is using.
>> I have managed to do this however, the Writable object I get from Hive as
>> a result of a SELECT query is of type Text. This means that I have to split
>> the string to find my fields but that's very error prone, specially if some
>> fields are strings that may contain spaces.
>>
>> My question is:
>> 1) How do I get a Hive Writable that gives me each field of each result
>> row?
>>
>> Thank you,
>> rui
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB