Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: Hive Loading Zip CSV Files


Copy link to this message
-
Re: Hive Loading Zip CSV Files
bcc: cdh-user

Hi Ben,
My apologies for the delayed response.

I don't have any other specific resources I can direct you to, sorry. Your
best bet is to search online to see examples.

I did a quick search. This looks like a good one:
https://github.com/kevinweil/elephant-bird/wiki/How-to-use-Elephant-Bird-with-Hive
However, again, I haven't personally used it so there is not
much corroboration I can provide behind it.

Here is an example from the Hive source code:
http://svn.apache.org/viewvc/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextInputFormat.java?view=markup

Hope that helps.
Mark
On Tue, Nov 13, 2012 at 1:47 PM, ben <[EMAIL PROTECTED]> wrote:

> Hi Mark,
>
> Can you direct me to where I could create my own InputFormat for Zip
> Files? To create a ZipFileInputFormat for Hive?
>
> Thanks,
> Ben
>
>
> On Tuesday, November 13, 2012 10:54:25 AM UTC-8, Mark Grover wrote:
>
>> bcc: cdh-user
>>
>> This question might be more appropriate for the Apache Hive user list, so
>> redirecting it there.
>>
>> However to answer your question:
>> From the little I've read about PKZip, they follow the standard zip
>> format. So the question you are really asking is if Hive supports reading
>> from zip files. As far as I know, the answer is no. This is because Hadoop
>> doesn't have an InputFormat for reading zip files: https://issues.apache.
>> **org/jira/browse/MAPREDUCE-210<https://issues.apache.org/jira/browse/MAPREDUCE-210>
>> There is also a Hive user email thread that tackles the same question:
>> http://mail-**archives.apache.org/mod_mbox/**hive-user/201203.mbox/%**
>> 3CCAENxBwxkF--3PzCkpz1HX21=**Gb9YVASr2JL0U3yUL2tfGu010Q@**
>> mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%[EMAIL PROTECTED]%3E>
>>
>> Having said that, a possible workaround would be to unzip the zip files
>> and use a different compression codec (e.g. Snappy) on SequenceFile's for
>> storing your files on HDFS.
>>
>> Good luck!
>> Mark
>>
>>
>>
>> On Tue, Nov 13, 2012 at 9:17 AM, ben <[EMAIL PROTECTED]> wrote:
>>
>>> Anybody ever try to load CSV files compressed using PKZip into a Hive
>>> table stored as Sequence Files? Is there a SerDe out there for this?
>>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>>
>>>
>>>
>>>
>>
>>  --
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB