Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> updating RegexSerde on existing partitions


Copy link to this message
-
Re: updating RegexSerde on existing partitions
Thanks for the update, Viral.

One way I can think of is to drop the metadata for the entire table and
recreate the table with the updated regex and recover partitions from HDFS.
This, of course, works if your table is external.

Mark

On Fri, Jan 25, 2013 at 12:21 PM, Viral Bajaria <[EMAIL PROTECTED]>wrote:

> Well the NPE was a mistake on my part. I was over-escaping the regex and
> that was causing the NPE. The test that I ran was using the CLI, while the
> update of regex was in code.
>
> Regarding updating the regex for old partitions, I have not come across
> any better way besides running it over all partitions. I was going to setup
> a test instance and see if I could just update the metastore tables
> directly since that will be faster than running ALTER TABLE one by one over
> 1000's of partitions. Still debating though. I will let you know how it
> goes.
>
> Thanks,
> Viral
>
>
> On Fri, Jan 25, 2013 at 11:38 AM, Mark Grover <[EMAIL PROTECTED]
> > wrote:
>
>> Viral,
>> To me, I think the only option seems to be here to alter all the existing
>> partitions.
>>
>> I would be interested in knowing what you ended up finding/doing.
>>
>> Thanks,
>> Mark
>>
>>
>> On Sun, Jan 20, 2013 at 3:58 PM, Viral Bajaria <[EMAIL PROTECTED]>wrote:
>>
>>> Hi,
>>>
>>> I have a hive table which has pre-defined schema and I use RegexSerde to
>>> read data from the underlying files. I wanted to add a new column to this
>>> table and so after running the ALTER TABLE command, I updated the
>>> 'input.regex' property for SERDEPROPERTIES. This did not help in any way.
>>> The newly added column always returned null data. I figured the issue must
>>> be because the past partitions are each tagged with their own SerDe
>>> definition in PARTITIONS table. So I went ahead and loaded a new file and
>>> saw that the new file did pick up the updated regex definition but when I
>>> run a query against this new partition I always get a NullPointerException
>>> with no additional information.
>>>
>>> I was hoping I didn't have to run a ALTER TABLE on each partition to
>>> update the regex property but maybe that's the only possible solution. But
>>> before I go ahead and do that, I want to make sure my whole table will not
>>> break with NPE's given that newly added partitions cannot be read with the
>>> updated regex definition.
>>>
>>> I have already tested the regex is fine by creating a temp table with
>>> that regex definition and loading the same file into it and I was able to
>>> query with no issues.
>>>
>>> Anyone faced this issue before ? Any suggestions ? Or once defined it's
>>> impossible to change RegexSerde tables ?
>>>
>>> Thanks,
>>> Viral
>>>
>>
>>
>