Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> table  from sequence file


Copy link to this message
-
Re: table from sequence file
On Thu, Apr 15, 2010 at 1:23 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

>
>
> On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <[EMAIL PROTECTED]>wrote:
>
>> Hi Sagar,
>>
>> Looks like your source file has custom writable types in it. If that is
>> the case, implementing a SerDe that works with that type may not be that
>> straight forward, although doable.
>>
>> An alternative would be to implement a custom RecordReader that converts
>> the value of your custom writable to Struct type which can then be queried
>> directly.
>>
>> Arvind
>>
>>
>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[EMAIL PROTECTED]> wrote:
>>
>>> Hi
>>>
>>> My data is in the value field of a sequence file.
>>> The value field has subfields in it. I am trying to create table using
>>> these subfields.
>>> Example:
>>> <KEY> <VALUE>
>>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>>> So i am trying to create a table from VALUE_FIELD*
>>>
>>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2 as
>>> string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>>
>>> I am planing to a write a custom SerDe implementation and custom
>>> SequenceFileReader
>>> Pl let me knw if I am on the right track.
>>>
>>>
>>> -Sagar
>>
>>
>>
> I am actually having lots of trouble with this.
> I have a sequence file that opens fine with
> /home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text
> /home/edward/Downloads/seq/seq
>
> create external table keyonly( ver string , theid int, thedate string )
> row format delimited fields terminated by ','
> STORED AS
> inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
> outputformat
> 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'
>
> location '/home/edward/Downloads/seq';
>
>
>
> Also tried
> inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
> or stored as SEQUENCEFILE
>
> I always get this...
>
> 2010-04-15 13:10:43,849 ERROR CliDriver (SessionState.java:printError(255))
> - Failed with exception java.io.IOException:java.io.EOFException
> java.io.IOException: java.io.EOFException
>     at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332)
>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120)
>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>     at
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510)
>     at
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at junit.framework.TestCase.runTest(TestCase.java:154)
>     at junit.framework.TestCase.runBare(TestCase.java:127)
>     at junit.framework.TestResult$1.protect(TestResult.java:106)
>     at junit.framework.TestResult.runProtected(TestResult.java:124)
>     at junit.framework.TestResult.run(TestResult.java:109)
>     at junit.framework.TestCase.run(TestCase.java:118)
>     at junit.framework.TestSuite.runTest(TestSuite.java:208)
>     at junit.framework.TestSuite.run(TestSuite.java:203)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
>     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
> Caused by: java.io.EOFException
>     at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
>     at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
The SequenceFileAsTextInputFormat converts the sequence record values to
string using the toString() invocation. Assuming that your data has a custom
writable that has multiple fields in it, I don't think it is possible for
you to map the individual bits to different columns.

Can you try doing the following:

create external table dummy( fullvalue string)
stored as inputformat
'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
outputformat'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location '/home/edward/Downloads/seq';

and then doing a select * from dummy.

Arvind
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB