Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - FROM INSERT after ADD COLUMN


Copy link to this message
-
Re: FROM INSERT after ADD COLUMN
yaboulna@... 2012-12-10, 20:01
Is there an index in the RC File to avoid a complete pass on the  
record "keys" for matching old and new records. Also, wouldn't the  
RCFile need to be rebuilt anyway, since the file actually stores  
blocks of n rows by m column achieving a certain block size? I haven't  
carefully read the RCFile paper, but that's what I understood by  
skimming through it.

-- Younos

Quoting Shreepadma Venugopalan <[EMAIL PROTECTED]>:

> Sorry hit the send too soon :)
>
> While storing data in a column major format such as RCFile would help with
> adding new column data after executing an alter table...add columns
> statement, Hive doesn't provide a way to do it today. It is possible to do
> so outside of Hive today, but we would need to enhance Hive to add new
> column data when the data is stored in a column major format.
>
> Thanks.
> Shreepadma
>
>
> On Mon, Dec 10, 2012 at 10:32 AM, Shreepadma Venugopalan <
> [EMAIL PROTECTED]> wrote:
>
>>
>>
>>
>> On Sun, Dec 9, 2012 at 10:32 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>>
>>> I will reopen the subject a bit.
>>>
>>> I don't know the details of the RCFile implementation in Hive but if the
>>> data were stored that way it is theoretically possible to add the column
>>> data even without append and without rewriting the whole file. Does someone
>>> has more information on that matter?
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Mon, Dec 10, 2012 at 2:02 AM, <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hello Shreepadma,
>>>>
>>>> That's definitely very helpful. I doubted that this would be the case,
>>>> but I was thinking that maybe there's a way to do it using a merge task. I
>>>> will change my data structure to make it a bit like HBase, and I hope Hive
>>>> would still be the right choice for me.. it can be backed by HBase anyway
>>>> :). Thank you very much, your quick reply saved me a lot of time!
>>>>
>>>> Sincerely,
>>>> Younos
>>>>
>>>>
>>>> Quoting Shreepadma Venugopalan <[EMAIL PROTECTED]>:
>>>>
>>>>  Hi Younos,
>>>>>
>>>>> Since HiveQL doesn't support an insert..value statement, you can't
>>>>> insert
>>>>> values into a specific column. Let's assume your table had the following
>>>>> structure before the alter table..add columns statement was executed,
>>>>>
>>>>> tab (a string, b bigint, c double)
>>>>>
>>>>> Furthermore, let's assume that it had 100 rows. Now, let's assume you
>>>>> did
>>>>> an alter table tab add columns (d binary). The new table structure will
>>>>> look like below,
>>>>>
>>>>> tab (a string, b bigint, c double, d binary)
>>>>>
>>>>> You can't insert binary data into the 100 rows that were present prior
>>>>> to
>>>>> the alter table statement by executing a HiveQL statement. HiveQL
>>>>> doesn't
>>>>> support an insert..values statement like most RDBMSs. However, you can
>>>>> delete the existing files and add new files that contain records
>>>>> corresponding to the new table structure. Alternatively, you can skip
>>>>> the
>>>>> deletion step and just add new files that correspond to the new table
>>>>> structure. When you execute a HiveQL query, null will be returned for
>>>>> those
>>>>> columns for which the data doesn't exist.
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Thanks.
>>>>> Shreepadma
>>>>>
>>>>>
>>>>> On Sun, Dec 9, 2012 at 4:35 PM, <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>  Hello,
>>>>>>
>>>>>> I couldn't find any example of how to populate columns that were added
>>>>>> to
>>>>>> a table. How would Hive tell which row to append by each value of the
>>>>>> newly
>>>>>> added columns? Does it do a column name matching?
>>>>>>
>>>>>> Sincerely,
>>>>>> Younos
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> Best regards,
>>>> Younos Aboulnaga
>>>>
>>>> Masters candidate
>>>> David Cheriton school of computer science
>>>> University of Waterloo
>>>> http://cs.uwaterloo.ca
>>>>
>>>> E-Mail: [EMAIL PROTECTED]
>>>> Mobile: +1 (519) 497-5669
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>

Best regards,
Younos Aboulnaga

Masters candidate
David Cheriton school of computer science
University of Waterloo
http://cs.uwaterloo.ca

E-Mail: [EMAIL PROTECTED]
Mobile: +1 (519) 497-5669