Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> How to handle for new columns?


Copy link to this message
-
Re: How to handle for new columns?
I did a quick test with hive-0.7. Querying old files should result the new
column as null. You need not traverse data. Hive doesn't look at the data
until it is queried (when queried, the new column will be inferred as null).

Thanks,
Aniket

On Thu, Mar 1, 2012 at 2:14 PM, Anson Abraham <[EMAIL PROTECTED]>wrote:

> I am trying to avoid traversing through the old files, adding null value.
>  But if you're saying that I can add a new field in hive table -- no it
> does not work.  I get errors as a result.  I know in pig this can be done,
> where it'll make the old records for that field null.  Sorry I should
> mention that I'm on hive .0.7.1.
> does 0.8.0  support this function? of if old files doesn't have column it
> will make it null?  again, this is an external table.
>
>
> On Thu, Mar 1, 2012 at 5:02 PM, Aniket Mokashi <[EMAIL PROTECTED]>wrote:
>
>> If you add a column to the table in the end, for old files your new field
>> will be NULL. Is it not what you observe?
>>
>> Thanks,
>> Aniket
>>
>>
>> On Thu, Mar 1, 2012 at 12:06 PM, Anson Abraham <[EMAIL PROTECTED]>wrote:
>>
>>> If i have a hive table, which is an external table, and have my "log
>>> files" being read into it, if a new file is imported into the hdfs and the
>>> file has a new column, how can i get hive to handle the old files w/o the
>>> new column, if I do an alter adding column into the hive table.
>>> So example, i have a few files w/ these fields:
>>>
>>> empid, empname, deptno
>>>
>>> and so my hive table
>>> CREATE EXTERNAL TABLE IF NOT EXISTS Employee (
>>> empid BIGINT
>>> ,empname string
>>> deptno BIGINT
>>> )
>>> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
>>> STORED AS TEXTFILE LOCATION 'hdfs://namenode1/employee/';
>>>
>>>
>>>
>>> but if I have a new file imported into the hdfs directory w/ a new column
>>> empid, empname, deptno, salary
>>>
>>> I can't do an alter of the employee table adding salary b/c of the
>>> historical files.  I used external tables b/c I wanted the table to
>>> dynamically get all the log files into hive table, when a new file is
>>> generated.
>>>
>>> I know the long way is basically adding fields through all the old
>>> files, but prefer of a more scalable way to do this.  Anyone know of any?
>>> Thanks
>>>
>>
>>
>>
>> --
>> "...:::Aniket:::... Quetzalco@tl"
>>
>
>
--
"...:::Aniket:::... Quetzalco@tl"