Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Optimizing hive queries


Copy link to this message
-
Re: Optimizing hive queries
Owen O'Malley 2013-03-29, 06:54
On Thu, Mar 28, 2013 at 11:08 PM, Jagat Singh <[EMAIL PROTECTED]> wrote:

> Hello Owen,
>
> Thanks for your reply.
>
> I am seeing its providing the advantage which Avro provided , of adding
> and removing fields.
>

ORC files like Avro files are self-describing. They include the type
structure of the records in the metadata of the file. It will take more
integration work with hive to make the schemas very flexible with ORC.
> Can you please write some sample code for hive table which is partitioned
> and each partitioned has different schema.
>

As with all tables:

create table people (first_name string, last_name string) partitioned by
(state string);
load data local inpath 'part-0' overwrite into table people partition
(state='ca');
alter table people add columns (address string);
load data local inpath 'part-1' overwrite into table people partition
(state='nv');

You'll end up with the first partition with 2 columns (and thus implicitly
the third one is null) and the second partition with 3 columns.

-- Owen

>
> I tried searching but could not find any example.
>
> Thanks in advance for your help.
>
> Regards,
>
> Jagat Singh
>
>
> On Fri, Mar 29, 2013 at 4:48 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
>
>> Actually, Hive already has the ability to have different schemas for
>> different partitions. (Although of course it would be nice to have the
>> alter table be more flexible!)
>>
>> The "versioned metadata" means that the ORC file's metadata is stored in
>> ProtoBufs so that we can add (or remove) fields to the metadata. That means
>> that for some changes to ORC file format we can provide both forward and
>> backward compatibility.
>>
>> -- Owen
>>
>>
>> On Thu, Mar 28, 2013 at 10:25 PM, Jagat Singh <[EMAIL PROTECTED]>wrote:
>>
>>> Hello Nitin,
>>>
>>> Thanks for sharing.
>>>
>>> Do we have more details on
>>>
>>> Versioned metadata feature of ORC ? , is it like handling varying
>>> schemas in Hive?
>>>
>>> Regards,
>>>
>>> Jagat Singh
>>>
>>>
>>>
>>> On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> Here is is a nice presentation from Owen from Hortonworks on
>>>> "Optimizing hive queries"
>>>>
>>>> http://www.slideshare.net/oom65/optimize-hivequeriespptx
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>