Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk loading a CSV file into HBase


Copy link to this message
-
Re: Bulk loading a CSV file into HBase
Hi Stack,

I decompiled the ImportTsv class and added some sysout statements in main()
to figure out the problem. Please find the modified class here:
http://pastebin.com/sKQcMXe4

 With help of Keshav, i got to know that csv import works fine when i
provide "-Dimporttsv.separator=," as first commandline parameter after
specifying the classname.

Here is the command and console log  of the successful import of csv file:
sudo -u hdfs hadoop jar /usr/lib/hadoop/importdata.jar
com.intuit.ihub.hbase.poc.ImportData -Dimporttsv.separator=,
-Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city  testload  /temp/csv
-Dimporttsv.skip.bad.lines=true
Command line Arguments::-Dimporttsv.separator=,
Command line
Arguments::-Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city
Command line Arguments::testload
Command line Arguments::/temp/csv
Command line Arguments::-Dimporttsv.skip.bad.lines=true
OtherArguments==>testload
OtherArguments==>/temp/csv
OtherArguments==>-D
OtherArguments==>importtsv.skip.bad.lines=true
SEPARATOR as per jobconf:,
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:host.name
=ihub-namenode1
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_20
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_20/jre
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/usr/lib/hadoop-0.20/conf:/usr/java/jdk1.6.0_20/jre//lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r06.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/hbase.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/zookeeper.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hadoop/lib:/usr/lib/hbase/lib:/usr/lib/sqoop/lib:/etc/hbase/conf
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/java/jdk1.6.0_20/jre/lib/amd64/server:/usr/java/jdk1.6.0_20/jre/lib/amd64:/usr/java/jdk1.6.0_20/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.32-71.el6.x86_64
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client environment:user.name
=hdfs
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:user.home=/usr/lib/hadoop-0.20
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/root
12/03/07 10:01:33 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=ihub-jobtracker1:2181 sessionTimeout=180000
watcher=hconnection
12/03/07 10:01:33 INFO zookeeper.ClientCnxn: Opening socket connection to
server ihub-jobtracker1/192.168.1.98:2181
12/03/07 10:01:33 INFO zookeeper.ClientCnxn: Socket connection established
to ihub-jobtracker1/192.168.1.98:2181, initiating session
12/03/07 10:01:33 INFO zookeeper.ClientCnxn: Session establishment complete
on server ihub-jobtracker1/192.168.1.98:2181, sessionid 0x135d53c669a00ab, negotiated timeout = 40000
12/03/07 10:01:33 INFO mapreduce.TableOutputFormat: Created table instance
for testload
12/03/07 10:01:33 INFO input.FileInputFormat: Total input paths to process
12/03/07 10:01:33 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/03/07 10:01:33 WARN snappy.LoadSnappy: Snappy native library not loaded
12/03/07 10:01:34 INFO mapred.JobClient: Running job: job_201203021306_0028
12/03/07 10:01:35 INFO mapred.JobClient:  map 0% reduce 0%
12/03/07 10:01:40 INFO mapred.JobClient:  map 100% reduce 0%
12/03/07 10:01:41 INFO mapred.JobClient: Job complete: job_201203021306_0028
12/03/07 10:01:41 INFO mapred.JobClient: Counters: 13
12/03/07 10:01:41 INFO mapred.JobClient:   Job Counters
12/03/07 10:01:41 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5177
12/03/07 10:01:41 INF
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB