Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - How to load csv data into HIVE


Copy link to this message
-
RE: How to load csv data into HIVE
Connell, Chuck 2012-09-07, 14:57
I cannot promise which is faster. A lot depends on how clever your scripts are.

From: Sandeep Reddy P [mailto:[EMAIL PROTECTED]]
Sent: Friday, September 07, 2012 10:42 AM
To: [EMAIL PROTECTED]
Subject: Re: How to load csv data into HIVE

Hi,
I wrote a shell script to get csv data but when i run that script on a 12GB csv its taking more time. If i run a python script will that be faster?
On Fri, Sep 7, 2012 at 10:39 AM, Connell, Chuck <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
How about a Python script that changes it into plain tab-separated text? So it would look like this...

174969274<tab>14-mar-2006<tab>3522876<tab> <tab>14-mar-2006<tab>500000308<tab>65<tab>1<newline>
etc...

Tab-separated with newlines is easy to read and works perfectly on import.

Chuck Connell
Nuance R&D Data Team
Burlington, MA
781-565-4611<tel:781-565-4611>

From: Sandeep Reddy P [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Subject: How to load csv data into HIVE

Hi,
Here is the sample data
"174969274","14-mar-2006","
3522876","","14-mar-2006","500000308","65","1"|
"174969275","19-jul-2006","3523154","","19-jul-2006","500000308","65","1"|
"174969276","31-dec-2005","3530333","","31-dec-2005","500000308","65","1"|
"174969277","14-apr-2005","3531470","","14-apr-2005","500000308","65","1"|

How to load this kind of data into HIVE?
I'm using shell script to get rid of double quotes and '|' but its taking very long time to work on each csv which are 12GB each. What is the best way to do this?
--
Thanks,
sandeep