-Re: Hive Loading Zip CSV Files
Mark Grover 2012-11-19, 17:29
My apologies for the delayed response.
I don't have any other specific resources I can direct you to, sorry. Your
best bet is to search online to see examples.
I did a quick search. This looks like a good one:
However, again, I haven't personally used it so there is not
much corroboration I can provide behind it.
Here is an example from the Hive source code:
Hope that helps.
On Tue, Nov 13, 2012 at 1:47 PM, ben <[EMAIL PROTECTED]> wrote:
> Hi Mark,
> Can you direct me to where I could create my own InputFormat for Zip
> Files? To create a ZipFileInputFormat for Hive?
> On Tuesday, November 13, 2012 10:54:25 AM UTC-8, Mark Grover wrote:
>> bcc: cdh-user
>> This question might be more appropriate for the Apache Hive user list, so
>> redirecting it there.
>> However to answer your question:
>> From the little I've read about PKZip, they follow the standard zip
>> format. So the question you are really asking is if Hive supports reading
>> from zip files. As far as I know, the answer is no. This is because Hadoop
>> doesn't have an InputFormat for reading zip files: https://issues.apache.
>> There is also a Hive user email thread that tackles the same question:
>> mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%[EMAIL PROTECTED]%3E>
>> Having said that, a possible workaround would be to unzip the zip files
>> and use a different compression codec (e.g. Snappy) on SequenceFile's for
>> storing your files on HDFS.
>> Good luck!
>> On Tue, Nov 13, 2012 at 9:17 AM, ben <[EMAIL PROTECTED]> wrote:
>>> Anybody ever try to load CSV files compressed using PKZip into a Hive
>>> table stored as Sequence Files? Is there a SerDe out there for this?