Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> table  from sequence file


Copy link to this message
-
Re: table from sequence file
I think it will be better to take a look at LazySimpleSerDe to see how it
serializes and deserializes Struct types. Your implementation should be such
that it works with this SerDe seamlessly.

More specifically, creating a simple POJO may not work due to
inherent marshaling/encoding semantics that must be observed to conform to
the ByteWritable contracts.

Arvind

On Fri, Apr 16, 2010 at 11:04 AM, Sagar Naik <[EMAIL PROTECTED]> wrote:

> Hi Arvind,
> Thanks for explanation.
>
> I am newbie so I am not familiar with terms.
> Struct implementation is POJO or some thing else.
>
> My guess is struct is a simple POJO . If so then simple POJO represented in
> BYTES will be passed to BytesWritable .
> And it should work ?
>
>
>
> -Sagar
>
> On Apr 16, 2010, at 9:58 AM, Arvind Prabhakar wrote:
>
> Sagar,
>
> Unfortunately it is more complicated than that. The idea behind the record
> reader implementation is to actually convert the underlying writable into a
> type that is understood by the SerDe layer. At this time, the SerDe layer
> seems to understand ByteWritable and Text types. So - if you could take your
> custom type and emit a ByteWritable that represents a struct implementation
> of the same, it would work.
>
> Another alternative which would be simple to implement would be to do the
> following:
>
> 1. Modify your custom writable so that it has a toString() method that
> generates a parsable representation of the fields. For example you could use
> the JSON representation in your toString() method.
>
> 2. Create the external table with inputformat
> 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' and  outputformat
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', mapping the
> entire value type to a single string column.
>
> 3. Use the UDFJson to extract the individual attributes from the JSON
> string that is emitted from the select query.
>
> You can use this output to populate a new table that now has the parsed
> values separated out in the warehouse.
>
> Arvind
>
>
> On Thu, Apr 15, 2010 at 6:01 PM, Sagar Naik <[EMAIL PROTECTED]> wrote:
>
>> Hi Arvind,
>>
>> U guessed it correct.
>>
>> We have custom writables.
>> I saw the TextRecordReader implementation to get an idea on RecordReader.
>>
>> It looks like createRow creates an instance and next(...) populates this
>> instance.
>> The createRow returns an instance of Writable.
>>
>> Is the Writable Instance same as "struct" from u r reply
>>
>> How is this Writable instance mapped to column names ?
>> Is there something in commandline syntax which binds the Writable instance
>> to column names and values ?
>> Or ObjectInspector will do it magically
>>
>> -Sagar
>> On Apr 15, 2010, at 12:00 PM, Arvind Prabhakar wrote:
>>
>> Hi Sagar,
>>
>> Looks like your source file has custom writable types in it. If that is
>> the case, implementing a SerDe that works with that type may not be that
>> straight forward, although doable.
>>
>> An alternative would be to implement a custom RecordReader that converts
>> the value of your custom writable to Struct type which can then be queried
>> directly.
>>
>> Arvind
>>
>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[EMAIL PROTECTED]> wrote:
>>
>>> Hi
>>>
>>> My data is in the value field of a sequence file.
>>> The value field has subfields in it. I am trying to create table using
>>> these subfields.
>>> Example:
>>> <KEY> <VALUE>
>>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>>> So i am trying to create a table from VALUE_FIELD*
>>>
>>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>>
>>> I am planing to a write a custom SerDe implementation and custom
>>> SequenceFileReader
>>> Pl let me knw if I am on the right track.
>>>
>>>
>>> -Sagar
>>
>>
>>
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB