Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk loading a CSV file into HBase


Copy link to this message
-
Bulk loading a CSV file into HBase
Hi All,

I am getting a "Bad line at offset" error in Stderr log of tasks while
testing bulk loading a CSV file into HBase. I am using cdh3u2. Import of a
TSV works fine.

Here is the command i ran:
sudo -u hdfs hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u2.jar importtsv
-Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city  testload  /temp/csv
-Dimporttsv.skip.bad.lines=true '-Dimporttsv.separator=,'

Job Stdout logs:
[root@ihub-namenode1 ihub]# sudo -u hdfs hadoop jar
/usr/lib/hbase/hbase-0.90.4-cdh3u2.jar importtsv
-Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city  testload  /temp/csv
-Dimporttsv.skip.bad.lines=true '-Dimporttsv.separator=,'
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client environment:host.name
=ihub-namenode1
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_20
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_20/jre
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/usr/lib/hadoop-0.20/conf:/usr/java/jdk1.6.0_20/jre//lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r06.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/zookeeper.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hadoop/lib:/usr/lib/hbase/lib:/usr/lib/sqoop/lib:/etc/hbase/conf
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/java/jdk1.6.0_20/jre/lib/amd64/server:/usr/java/jdk1.6.0_20/jre/lib/amd64:/usr/java/jdk1.6.0_20/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.32-71.el6.x86_64
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client environment:user.name
=hdfs
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:user.home=/usr/lib/hadoop-0.20
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/home/ihub
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=ihub-jobtracker1:2181 sessionTimeout=180000
watcher=hconnection
12/03/05 11:42:42 INFO zookeeper.ClientCnxn: Opening socket connection to
server ihub-jobtracker1/192.168.1.98:2181
12/03/05 11:42:42 INFO zookeeper.ClientCnxn: Socket connection established
to ihub-jobtracker1/192.168.1.98:2181, initiating session
12/03/05 11:42:42 INFO zookeeper.ClientCnxn: Session establishment complete
on server ihub-jobtracker1/192.168.1.98:2181, sessionid 0x135d53c669a007a, negotiated timeout = 40000
12/03/05 11:42:42 INFO mapreduce.TableOutputFormat: Created table instance
for testload
12/03/05 11:42:42 INFO input.FileInputFormat: Total input paths to process
12/03/05 11:42:42 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/03/05 11:42:42 WARN snappy.LoadSnappy: Snappy native library not loaded
12/03/05 11:42:42 INFO mapred.JobClient: Running job: job_201203021306_0017
12/03/05 11:42:43 INFO mapred.JobClient:  map 0% reduce 0%
12/03/05 11:42:48 INFO mapred.JobClient:  map 100% reduce 0%
12/03/05 11:42:48 INFO mapred.JobClient: Job complete: job_201203021306_0017
12/03/05 11:42:48 INFO mapred.JobClient: Counters: 13
12/03/05 11:42:48 INFO mapred.JobClient:   Job Counters
12/03/05 11:42:48 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5063
12/03/05 11:42:48 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/03/05 11:42:48 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/03/05 11:42:48 INFO mapred.JobClient:     Launched map tasks=1
12/03/05 11:42:48 INFO mapred.JobClient:     Data-local map tasks=1
12/03/05 11:42:48 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
12/03/05 1