Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> table  from sequence file


Copy link to this message
-
Re: table from sequence file
Sagar,

Unfortunately it is more complicated than that. The idea behind the record
reader implementation is to actually convert the underlying writable into a
type that is understood by the SerDe layer. At this time, the SerDe layer
seems to understand ByteWritable and Text types. So - if you could take your
custom type and emit a ByteWritable that represents a struct implementation
of the same, it would work.

Another alternative which would be simple to implement would be to do the
following:

1. Modify your custom writable so that it has a toString() method that
generates a parsable representation of the fields. For example you could use
the JSON representation in your toString() method.

2. Create the external table with inputformat
'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat' and  outputformat
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', mapping the
entire value type to a single string column.

3. Use the UDFJson to extract the individual attributes from the JSON string
that is emitted from the select query.

You can use this output to populate a new table that now has the parsed
values separated out in the warehouse.

Arvind
On Thu, Apr 15, 2010 at 6:01 PM, Sagar Naik <[EMAIL PROTECTED]> wrote:

> Hi Arvind,
>
> U guessed it correct.
>
> We have custom writables.
> I saw the TextRecordReader implementation to get an idea on RecordReader.
>
> It looks like createRow creates an instance and next(...) populates this
> instance.
> The createRow returns an instance of Writable.
>
> Is the Writable Instance same as "struct" from u r reply
>
> How is this Writable instance mapped to column names ?
> Is there something in commandline syntax which binds the Writable instance
> to column names and values ?
> Or ObjectInspector will do it magically
>
> -Sagar
> On Apr 15, 2010, at 12:00 PM, Arvind Prabhakar wrote:
>
> Hi Sagar,
>
> Looks like your source file has custom writable types in it. If that is the
> case, implementing a SerDe that works with that type may not be that
> straight forward, although doable.
>
> An alternative would be to implement a custom RecordReader that converts
> the value of your custom writable to Struct type which can then be queried
> directly.
>
> Arvind
>
> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[EMAIL PROTECTED]> wrote:
>
>> Hi
>>
>> My data is in the value field of a sequence file.
>> The value field has subfields in it. I am trying to create table using
>> these subfields.
>> Example:
>> <KEY> <VALUE>
>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>> So i am trying to create a table from VALUE_FIELD*
>>
>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>
>> I am planing to a write a custom SerDe implementation and custom
>> SequenceFileReader
>> Pl let me knw if I am on the right track.
>>
>>
>> -Sagar
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB