Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - FROM INSERT after ADD COLUMN


Copy link to this message
-
Re: FROM INSERT after ADD COLUMN
Shreepadma Venugopalan 2012-12-10, 18:32
On Sun, Dec 9, 2012 at 10:32 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:

> I will reopen the subject a bit.
>
> I don't know the details of the RCFile implementation in Hive but if the
> data were stored that way it is theoretically possible to add the column
> data even without append and without rewriting the whole file. Does someone
> has more information on that matter?
>
> Regards
>
> Bertrand
>
>
> On Mon, Dec 10, 2012 at 2:02 AM, <[EMAIL PROTECTED]> wrote:
>
>> Hello Shreepadma,
>>
>> That's definitely very helpful. I doubted that this would be the case,
>> but I was thinking that maybe there's a way to do it using a merge task. I
>> will change my data structure to make it a bit like HBase, and I hope Hive
>> would still be the right choice for me.. it can be backed by HBase anyway
>> :). Thank you very much, your quick reply saved me a lot of time!
>>
>> Sincerely,
>> Younos
>>
>>
>> Quoting Shreepadma Venugopalan <[EMAIL PROTECTED]>:
>>
>>  Hi Younos,
>>>
>>> Since HiveQL doesn't support an insert..value statement, you can't insert
>>> values into a specific column. Let's assume your table had the following
>>> structure before the alter table..add columns statement was executed,
>>>
>>> tab (a string, b bigint, c double)
>>>
>>> Furthermore, let's assume that it had 100 rows. Now, let's assume you did
>>> an alter table tab add columns (d binary). The new table structure will
>>> look like below,
>>>
>>> tab (a string, b bigint, c double, d binary)
>>>
>>> You can't insert binary data into the 100 rows that were present prior to
>>> the alter table statement by executing a HiveQL statement. HiveQL doesn't
>>> support an insert..values statement like most RDBMSs. However, you can
>>> delete the existing files and add new files that contain records
>>> corresponding to the new table structure. Alternatively, you can skip the
>>> deletion step and just add new files that correspond to the new table
>>> structure. When you execute a HiveQL query, null will be returned for
>>> those
>>> columns for which the data doesn't exist.
>>>
>>> Hope this helps.
>>>
>>> Thanks.
>>> Shreepadma
>>>
>>>
>>> On Sun, Dec 9, 2012 at 4:35 PM, <[EMAIL PROTECTED]> wrote:
>>>
>>>  Hello,
>>>>
>>>> I couldn't find any example of how to populate columns that were added
>>>> to
>>>> a table. How would Hive tell which row to append by each value of the
>>>> newly
>>>> added columns? Does it do a column name matching?
>>>>
>>>> Sincerely,
>>>> Younos
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> Best regards,
>> Younos Aboulnaga
>>
>> Masters candidate
>> David Cheriton school of computer science
>> University of Waterloo
>> http://cs.uwaterloo.ca
>>
>> E-Mail: [EMAIL PROTECTED]
>> Mobile: +1 (519) 497-5669
>>
>>
>>
>>
>
>
> --
> Bertrand Dechoux
>