Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Bucketing external tables


Copy link to this message
-
Re: Bucketing external tables
Thanks, Mark.

I found the problem. For some reason, Hive is not able to write Avro output
file when the schema has a complex field with NULL option. It read without
any problem; but cannot write with that structure.  For example,  Insert
was failing on this array of structure field.

{ "name": "Passenger", "type":
                       [{"type":"array","items":
                           {"type":"record",
                             "name": "PAXStruct",
                             "fields": [
                                       { "name":"PAXCode",
"type":["string", "null"] },
                                       {
"name":"PAXQuantity","type":["int", "null"] }
                                       ]
                           }
                        }, "null"]
     }

I removed the last "null" clause and it's working okay now.

Regards,
Sadu
On Thu, Apr 4, 2013 at 12:36 AM, Mark Grover <[EMAIL PROTECTED]>wrote:

> Can you please check your Jobtracker logs? The is a generic error related
> to grabbing the Task Attempt Log URL, the real error is in JT logs.
>
>
> On Wed, Apr 3, 2013 at 7:17 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>
>> Hi Dean,
>>
>> I tried inserting a bucketed hive table from a non-bucketed table using
>> insert overwrite .... select from clause; but I get the following error.
>>
>> ----------------------------------------------------------------------------------
>> Exception in thread "Thread-225" java.lang.NullPointerException
>>         at
>> org.apache.hadoop.hive.shims.Hadoop23Shims.getTaskAttemptLogUrl(Hadoop23Shims.java:44)
>>         at
>> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:186)
>>         at
>> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:142)
>>         at java.lang.Thread.run(Thread.java:662)
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.MapRedTask
>>
>> --------------------------------------------------------------------------------------------------------------------------
>>
>> Both tables have same structure except that that one has CLUSTERED BY
>> CLAUSE and other not.
>>
>> Some columns are defined as Array of Structs. The Insert statement works
>> fine if I take out those complex columns. Are there any known issues
>> loading STRUCT or ARRAY OF STRUCT fields?
>>
>>
>> Thanks for your time and help.
>>
>> Sadu
>>
>>
>>
>>
>> On Sat, Mar 30, 2013 at 7:00 PM, Dean Wampler <
>> [EMAIL PROTECTED]> wrote:
>>
>>> The table can be external. You should be able to use this data with
>>> other tools, because all bucketing does is ensure that all occurrences for
>>> records with a given key are written into the same block. This is why
>>> clustered/blocked data can be joined on those keys using map-side joins;
>>> Hive knows it can cache ab individual block in memory and the block will
>>> hold all records across the table for the keys in that block.
>>>
>>> So, Java MR apps and Pig can still read the records, but they won't
>>> necessarily understand how the data is organized. I.e., it might appear
>>> unsorted. Perhaps HCatalog will allow other tools to exploit the structure,
>>> but I'm not sure.
>>>
>>> dean
>>>
>>>
>>> On Sat, Mar 30, 2013 at 5:44 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>>>
>>>> Thanks, Dean.
>>>>
>>>> Does that mean, this bucketing is exclusively Hive feature and not
>>>> available to others like Java, Pig, etc?
>>>>
>>>> And also, my final tables have to be managed tables; not external
>>>> tables, right?
>>>>  .
>>>> Thank again for your time and help.
>>>>
>>>> Sadu
>>>>
>>>>
>>>>
>>>> On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> I don't know of any way to avoid creating new tables and moving the
>>>>> data. In fact, that's the official way to do it, from a temp table to the
>>>>> final table, so Hive can ensure the bucketing is done correctly:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB