Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> table  from sequence file


Copy link to this message
-
Re: table from sequence file
On Thu, Apr 15, 2010 at 7:00 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

>
>
> On Thu, Apr 15, 2010 at 7:23 PM, Arvind Prabhakar <[EMAIL PROTECTED]>wrote:
>
>> On Thu, Apr 15, 2010 at 1:23 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
>>
>>>
>>>
>>> On Thu, Apr 15, 2010 at 3:00 PM, Arvind Prabhakar <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi Sagar,
>>>>
>>>> Looks like your source file has custom writable types in it. If that is
>>>> the case, implementing a SerDe that works with that type may not be that
>>>> straight forward, although doable.
>>>>
>>>> An alternative would be to implement a custom RecordReader that converts
>>>> the value of your custom writable to Struct type which can then be queried
>>>> directly.
>>>>
>>>> Arvind
>>>>
>>>>
>>>> On Thu, Apr 15, 2010 at 1:06 AM, Sagar Naik <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> My data is in the value field of a sequence file.
>>>>> The value field has subfields in it. I am trying to create table using
>>>>> these subfields.
>>>>> Example:
>>>>> <KEY> <VALUE>
>>>>> <KEY_FIELD1, KEYFIELD 2>  forms the key
>>>>> <VALUE_FIELD1, VALUE_FIELD2, VALUE_FIELD3>.
>>>>> So i am trying to create a table from VALUE_FIELD*
>>>>>
>>>>> CREATE EXTERNAL TABLE table_name (VALUE_FIELD1 as BIGINT, VALUE_FIELD2
>>>>> as string, VALUE_FIELD3 as BIGINT ) STORED AS SEQUENCEFILE;
>>>>>
>>>>> I am planing to a write a custom SerDe implementation and custom
>>>>> SequenceFileReader
>>>>> Pl let me knw if I am on the right track.
>>>>>
>>>>>
>>>>> -Sagar
>>>>
>>>>
>>>>
>>> I am actually having lots of trouble with this.
>>> I have a sequence file that opens fine with
>>> /home/edward/hadoop/hadoop-0.20.2/bin/hadoop dfs -text
>>> /home/edward/Downloads/seq/seq
>>>
>>> create external table keyonly( ver string , theid int, thedate string )
>>> row format delimited fields terminated by ','
>>> STORED AS
>>> inputformat 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
>>> outputformat
>>> 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'
>>>
>>> location '/home/edward/Downloads/seq';
>>>
>>>
>>>
>>> Also tried
>>> inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
>>> or stored as SEQUENCEFILE
>>>
>>> I always get this...
>>>
>>> 2010-04-15 13:10:43,849 ERROR CliDriver
>>> (SessionState.java:printError(255)) - Failed with exception
>>> java.io.IOException:java.io.EOFException
>>> java.io.IOException: java.io.EOFException
>>>     at
>>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:332)
>>>     at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:120)
>>>     at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:681)
>>>     at
>>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
>>>     at
>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>>>     at
>>> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:510)
>>>     at
>>> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_key_only(TestCliDriver.java:79)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>     at junit.framework.TestCase.runTest(TestCase.java:154)
>>>     at junit.framework.TestCase.runBare(TestCase.java:127)
>>>     at junit.framework.TestResult$1.protect(TestResult.java:106)
>>>     at junit.framework.TestResult.runProtected(TestResult.java:124)
>>>     at junit.framework.TestResult.run(TestResult.java:109)
>>>     at junit.framework.TestCase.run(TestCase.java:118)
>>>     at junit.framework.TestSuite.runTest(TestSuite.java:208)
>>>     at junit.framework.TestSuite.run(TestSuite.java:203)
>>>     at
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
>>>     at
The compression being used here - gzip - is not suitable for splitting of
the input files. That could be the reason why you are seeing this exception.
Can you try using a different compression scheme such as bzip2, or perhaps
by not compressing the files at all?

Arvind