Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> How to load csv data into HIVE


Copy link to this message
-
RE: How to load csv data into HIVE
How about a Python script that changes it into plain tab-separated text? So it would look like this...

174969274<tab>14-mar-2006<tab>3522876<tab> <tab>14-mar-2006<tab>500000308<tab>65<tab>1<newline>
etc...

Tab-separated with newlines is easy to read and works perfectly on import.

Chuck Connell
Nuance R&D Data Team
Burlington, MA
781-565-4611

From: Sandeep Reddy P [mailto:[EMAIL PROTECTED]]
Subject: How to load csv data into HIVE

Hi,
Here is the sample data
"174969274","14-mar-2006","
3522876","","14-mar-2006","500000308","65","1"|
"174969275","19-jul-2006","3523154","","19-jul-2006","500000308","65","1"|
"174969276","31-dec-2005","3530333","","31-dec-2005","500000308","65","1"|
"174969277","14-apr-2005","3531470","","14-apr-2005","500000308","65","1"|

How to load this kind of data into HIVE?
I'm using shell script to get rid of double quotes and '|' but its taking very long time to work on each csv which are 12GB each. What is the best way to do this?

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB