Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Optimizing hive queries


Copy link to this message
-
Re: Optimizing hive queries
On Thu, Mar 28, 2013 at 11:08 PM, Jagat Singh <[EMAIL PROTECTED]> wrote:

> Hello Owen,
>
> Thanks for your reply.
>
> I am seeing its providing the advantage which Avro provided , of adding
> and removing fields.
>

ORC files like Avro files are self-describing. They include the type
structure of the records in the metadata of the file. It will take more
integration work with hive to make the schemas very flexible with ORC.
> Can you please write some sample code for hive table which is partitioned
> and each partitioned has different schema.
>

As with all tables:

create table people (first_name string, last_name string) partitioned by
(state string);
load data local inpath 'part-0' overwrite into table people partition
(state='ca');
alter table people add columns (address string);
load data local inpath 'part-1' overwrite into table people partition
(state='nv');

You'll end up with the first partition with 2 columns (and thus implicitly
the third one is null) and the second partition with 3 columns.

-- Owen

>
> I tried searching but could not find any example.
>
> Thanks in advance for your help.
>
> Regards,
>
> Jagat Singh
>
>
> On Fri, Mar 29, 2013 at 4:48 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
>
>> Actually, Hive already has the ability to have different schemas for
>> different partitions. (Although of course it would be nice to have the
>> alter table be more flexible!)
>>
>> The "versioned metadata" means that the ORC file's metadata is stored in
>> ProtoBufs so that we can add (or remove) fields to the metadata. That means
>> that for some changes to ORC file format we can provide both forward and
>> backward compatibility.
>>
>> -- Owen
>>
>>
>> On Thu, Mar 28, 2013 at 10:25 PM, Jagat Singh <[EMAIL PROTECTED]>wrote:
>>
>>> Hello Nitin,
>>>
>>> Thanks for sharing.
>>>
>>> Do we have more details on
>>>
>>> Versioned metadata feature of ORC ? , is it like handling varying
>>> schemas in Hive?
>>>
>>> Regards,
>>>
>>> Jagat Singh
>>>
>>>
>>>
>>> On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> Here is is a nice presentation from Owen from Hortonworks on
>>>> "Optimizing hive queries"
>>>>
>>>> http://www.slideshare.net/oom65/optimize-hivequeriespptx
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB