Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Suggestion for Metastore Oprerations around ORC Files


Copy link to this message
-
Suggestion for Metastore Oprerations around ORC Files
I was testing out the conversion of a table to ORC.  Using previous posts,
I did alter table tablename set fileformat ORC;  This worked great  All new
partitions created were ORC, the RC and ORC files played nice next to each
other.

Then I had a hypothesis. I have tables that almost always have hive jobs
running and inserting data. Ideally, I don't want to stop those.  In my
head, I saw a problem, if I converted the table mid INSERT job, what would
happen?

Ideally, the rc format that existed when the job started would be honored,
the files would be written as RC files, and all would be well.  What I
think actually happened is that the setting was not honored; either the
writers changed to ORC mid files causing major borkage, or, and this is
what I suspect happened, the writers used RC file format, but when the
partition metadata was updated, it was ORC? Either way, I am not an expert,
but I could cause all subsequent queries to fail when I did that.

Like I said, almost everything about the conversion of ORC is going well,
but I'd recommend a change that would allow the setting to be changed, and
that current running jobs would honor the old setting for partitions, and
all would be well, and any new jobs would use the new settings.
Also, for the group: how does things respond when you are doing insert
append operations, and the first jobs where RC files and then other files
in the same partition are ORC?

Thanks!