Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk loading a CSV file into HBase


Copy link to this message
-
Re: Bulk loading a CSV file into HBase
Hi Stack,

I decompiled the ImportTsv class and added some sysout statements in main()
to figure out the problem. Please find the modified class here:
http://pastebin.com/sKQcMXe4

 With help of Keshav, i got to know that csv import works fine when i
provide "-Dimporttsv.separator=," as first commandline parameter after
specifying the classname.

Here is the command and console log  of the successful import of csv file:
sudo -u hdfs hadoop jar /usr/lib/hadoop/importdata.jar
com.intuit.ihub.hbase.poc.ImportData -Dimporttsv.separator=,
-Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city  testload  /temp/csv
-Dimporttsv.skip.bad.lines=true
Command line Arguments::-Dimporttsv.separator=,
Command line
Arguments::-Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city
Command line Arguments::testload
Command line Arguments::/temp/csv
Command line Arguments::-Dimporttsv.skip.bad.lines=true
OtherArguments==>testload
OtherArguments==>/temp/csv
OtherArguments==>-D
OtherArguments==>importtsv.skip.bad.lines=true
SEPARATOR as per jobconf:,
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:host.name
=ihub-namenode1
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_20
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_20/jre
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/usr/lib/hadoop-0.20/conf:/usr/java/jdk1.6.0_20/jre//lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r06.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/hbase.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/zookeeper.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hadoop/lib:/usr/lib/hbase/lib:/usr/lib/sqoop/lib:/etc/hbase/conf
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/java/jdk1.6.0_20/jre/lib/amd64/server:/usr/java/jdk1.6.0_20/jre/lib/amd64:/usr/java/jdk1.6.0_20/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.32-71.el6.x86_64
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:user.name
=hdfs
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:user.home=/usr/lib/hadoop-0.20
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/root
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=ihub-jobtracker1:2181 sessionTimeout=180000
watcher=hconnection
12/03/07 10:01:33 INFO zookeeper.ClientCnxn: Opening socket connection to
server ihub-jobtracker1/192.168.1.98:2181
12/03/07 10:01:33 INFO zookeeper.ClientCnxn: Socket connection established
to ihub-jobtracker1/192.168.1.98:2181, initiating session
12/03/07 10:01:33 INFO zookeeper.ClientCnxn: Session establishment complete
on server ihub-jobtracker1/192.168.1.98:2181, sessionid 0x135d53c669a00ab, negotiated timeout = 40000
12/03/07 10:01:33 INFO mapreduce.TableOutputFormat: Created table instance
for testload
12/03/07 10:01:33 INFO input.FileInputFormat: Total input paths to process
12/03/07 10:01:33 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/03/07 10:01:33 WARN snappy.LoadSnappy: Snappy native library not loaded
12/03/07 10:01:34 INFO mapred.JobClient: Running job: job_201203021306_0028
12/03/07 10:01:35 INFO mapred.JobClient:  map 0% reduce 0%
12/03/07 10:01:40 INFO mapred.JobClient:  map 100% reduce 0%
12/03/07 10:01:41 INFO mapred.JobClient: Job complete: job_201203021306_0028
12/03/07 10:01:41 INFO mapred.JobClient: Counters: 13
12/03/07 10:01:41 INFO mapred.JobClient:   Job Counters
12/03/07 10:01:41 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5177
12/03/07 10:01:41 INF