Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Re: Hive Loading Zip CSV Files


+
Mark Grover 2012-11-29, 18:03
+
Mark Grover 2012-11-13, 23:32
+
Mark Grover 2012-11-14, 08:50
+
Mark Grover 2012-11-13, 18:54
Copy link to this message
-
Re: Hive Loading Zip CSV Files
bcc: cdh-user

Hi Ben,
My apologies for the delayed response.

I don't have any other specific resources I can direct you to, sorry. Your
best bet is to search online to see examples.

I did a quick search. This looks like a good one:
https://github.com/kevinweil/elephant-bird/wiki/How-to-use-Elephant-Bird-with-Hive
However, again, I haven't personally used it so there is not
much corroboration I can provide behind it.

Here is an example from the Hive source code:
http://svn.apache.org/viewvc/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextInputFormat.java?view=markup

Hope that helps.
Mark
On Tue, Nov 13, 2012 at 1:47 PM, ben <[EMAIL PROTECTED]> wrote:

> Hi Mark,
>
> Can you direct me to where I could create my own InputFormat for Zip
> Files? To create a ZipFileInputFormat for Hive?
>
> Thanks,
> Ben
>
>
> On Tuesday, November 13, 2012 10:54:25 AM UTC-8, Mark Grover wrote:
>
>> bcc: cdh-user
>>
>> This question might be more appropriate for the Apache Hive user list, so
>> redirecting it there.
>>
>> However to answer your question:
>> From the little I've read about PKZip, they follow the standard zip
>> format. So the question you are really asking is if Hive supports reading
>> from zip files. As far as I know, the answer is no. This is because Hadoop
>> doesn't have an InputFormat for reading zip files: https://issues.apache.
>> **org/jira/browse/MAPREDUCE-210<https://issues.apache.org/jira/browse/MAPREDUCE-210>
>> There is also a Hive user email thread that tackles the same question:
>> http://mail-**archives.apache.org/mod_mbox/**hive-user/201203.mbox/%**
>> 3CCAENxBwxkF--3PzCkpz1HX21=**Gb9YVASr2JL0U3yUL2tfGu010Q@**
>> mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%[EMAIL PROTECTED]%3E>
>>
>> Having said that, a possible workaround would be to unzip the zip files
>> and use a different compression codec (e.g. Snappy) on SequenceFile's for
>> storing your files on HDFS.
>>
>> Good luck!
>> Mark
>>
>>
>>
>> On Tue, Nov 13, 2012 at 9:17 AM, ben <[EMAIL PROTECTED]> wrote:
>>
>>> Anybody ever try to load CSV files compressed using PKZip into a Hive
>>> table stored as Sequence Files? Is there a SerDe out there for this?
>>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>>
>>>
>>>
>>>
>>
>>  --
>
>
>
>