Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Bucketing external tables


+
Sadananda Hegde 2013-03-29, 21:58
+
Dean Wampler 2013-03-29, 22:57
+
Sadananda Hegde 2013-03-30, 22:44
+
Dean Wampler 2013-03-31, 00:00
+
Sadananda Hegde 2013-04-04, 02:17
+
Mark Grover 2013-04-04, 05:36
+
Sadananda Hegde 2013-04-05, 22:02
Copy link to this message
-
Re: Bucketing external tables
Mark Grover 2013-04-06, 15:07
Glad to hear!

On Fri, Apr 5, 2013 at 3:02 PM, Sadananda Hegde <[EMAIL PROTECTED]> wrote:

> Thanks, Mark.
>
> I found the problem. For some reason, Hive is not able to write Avro
> output file when the schema has a complex field with NULL option. It read
> without any problem; but cannot write with that structure.  For example,
> Insert was failing on this array of structure field.
>
> { "name": "Passenger", "type":
>                        [{"type":"array","items":
>                            {"type":"record",
>                              "name": "PAXStruct",
>                              "fields": [
>                                        { "name":"PAXCode",
> "type":["string", "null"] },
>                                        {
> "name":"PAXQuantity","type":["int", "null"] }
>                                        ]
>                            }
>                         }, "null"]
>      }
>
> I removed the last "null" clause and it's working okay now.
>
> Regards,
> Sadu
>
>
> On Thu, Apr 4, 2013 at 12:36 AM, Mark Grover <[EMAIL PROTECTED]>wrote:
>
>> Can you please check your Jobtracker logs? The is a generic error related
>> to grabbing the Task Attempt Log URL, the real error is in JT logs.
>>
>>
>> On Wed, Apr 3, 2013 at 7:17 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>>
>>> Hi Dean,
>>>
>>> I tried inserting a bucketed hive table from a non-bucketed table using
>>> insert overwrite .... select from clause; but I get the following error.
>>>
>>> ----------------------------------------------------------------------------------
>>> Exception in thread "Thread-225" java.lang.NullPointerException
>>>         at
>>> org.apache.hadoop.hive.shims.Hadoop23Shims.getTaskAttemptLogUrl(Hadoop23Shims.java:44)
>>>         at
>>> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:186)
>>>         at
>>> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:142)
>>>         at java.lang.Thread.run(Thread.java:662)
>>> FAILED: Execution Error, return code 2 from
>>> org.apache.hadoop.hive.ql.exec.MapRedTask
>>>
>>> --------------------------------------------------------------------------------------------------------------------------
>>>
>>> Both tables have same structure except that that one has CLUSTERED BY
>>> CLAUSE and other not.
>>>
>>> Some columns are defined as Array of Structs. The Insert statement works
>>> fine if I take out those complex columns. Are there any known issues
>>> loading STRUCT or ARRAY OF STRUCT fields?
>>>
>>>
>>> Thanks for your time and help.
>>>
>>> Sadu
>>>
>>>
>>>
>>>
>>> On Sat, Mar 30, 2013 at 7:00 PM, Dean Wampler <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> The table can be external. You should be able to use this data with
>>>> other tools, because all bucketing does is ensure that all occurrences for
>>>> records with a given key are written into the same block. This is why
>>>> clustered/blocked data can be joined on those keys using map-side joins;
>>>> Hive knows it can cache ab individual block in memory and the block will
>>>> hold all records across the table for the keys in that block.
>>>>
>>>> So, Java MR apps and Pig can still read the records, but they won't
>>>> necessarily understand how the data is organized. I.e., it might appear
>>>> unsorted. Perhaps HCatalog will allow other tools to exploit the structure,
>>>> but I'm not sure.
>>>>
>>>> dean
>>>>
>>>>
>>>> On Sat, Mar 30, 2013 at 5:44 PM, Sadananda Hegde <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Thanks, Dean.
>>>>>
>>>>> Does that mean, this bucketing is exclusively Hive feature and not
>>>>> available to others like Java, Pig, etc?
>>>>>
>>>>> And also, my final tables have to be managed tables; not external
>>>>> tables, right?
>>>>>  .
>>>>> Thank again for your time and help.
>>>>>
>>>>> Sadu
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>
+
Sadananda Hegde 2013-04-11, 17:46
+
Bejoy KS 2013-04-16, 15:13