Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> table  from sequence file


Copy link to this message
-
Re: table from sequence file
Sagar,

Unfortunately it is more complicated than that. The idea behind the record
reader implementation is to actually convert the underlying writable into a
type that is understood by the SerDe layer. At this time, the SerDe layer
seems to understand ByteWritable and Text types. So - if you could take your
custom type and emit a ByteWritable that represents a struct implementation
of the same, it would work.

Another alternative which would be simple to implement would be to do the
following:

1. Modify your custom writable so that it has a toString() method that
generates a parsable representation of the fields. For example you could use
the JSON representation in your toString() method.

2. Create the external table with inputformat
'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' and  outputformat
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', mapping the
entire value type to a single string column.

3. Use the UDFJson to extract the individual attributes from the JSON string
that is emitted from the select query.

You can use this output to populate a new table that now has the parsed
values separated out in the warehouse.

Arvind
On Thu, Apr 15, 2010 at 6:01 PM, Sagar Naik <[EMAIL PROTECTED]> wrote:

> Hi Arvind,
>
> U guessed it correct.
>
> We have custom writables.
> I saw the TextRecordReader implementation to get an idea on RecordReader.
>
> It looks like createRow creates an instance and next(...) populates this
> instance.
> The createRow returns an instance of Writable.
>
> Is the Writable Instance same as "struct" from u r reply
>
> How is this Writable instance mapped to column names ?
> Is there something in commandline syntax which binds the Writable instance
> to column names and values ?
> Or ObjectInspector will do it magically
>
> -Sagar
> On Apr 15, 2010, at 12:00 PM, Arvind Prabhakar wrote:
>
> Hi Sagar,
>
> Looks like your source file has custom writable types in it. If that is the
> case, implementing a SerDe that works with that type may not be that
> straight forward, although doable.
>
> An alternative would be to implement a custom RecordReader that converts
> the value of your custom writable to Struct type which can then be queried
> directly.
>
> Arvind
>
> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[EMAIL PROTECTED]> wrote:
>
>> Hi
>>
>> My data is in the value field of a sequence file.
>> The value field has subfields in it. I am trying to create table using
>> these subfields.
>> Example:
>> <KEY> <VALUE>
>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>> So i am trying to create a table from VALUE_FIELD*
>>
>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>
>> I am planing to a write a custom SerDe implementation and custom
>> SequenceFileReader
>> Pl let me knw if I am on the right track.
>>
>>
>> -Sagar
>
>
>
>