Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk loading a CSV file into HBase


Copy link to this message
-
Bulk loading a CSV file into HBase
Hi All,

I am getting a "Bad line at offset" error in Stderr log of tasks while
testing bulk loading a CSV file into HBase. I am using cdh3u2. Import of a
TSV works fine.

Here is the command i ran:
sudo -u hdfs hadoop jar /usr/lib/hbase/hbase-0.90.4-cdh3u2.jar importtsv
-Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city  testload  /temp/csv
-Dimporttsv.skip.bad.lines=true '-Dimporttsv.separator=,'

Job Stdout logs:
[root@ihub-namenode1 ihub]# sudo -u hdfs hadoop jar
/usr/lib/hbase/hbase-0.90.4-cdh3u2.jar importtsv
-Dimporttsv.columns=HBASE_ROW_KEY,data:name,data:city  testload  /temp/csv
-Dimporttsv.skip.bad.lines=true '-Dimporttsv.separator=,'
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client environment:host.name
=ihub-namenode1
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_20
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_20/jre
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/usr/lib/hadoop-0.20/conf:/usr/java/jdk1.6.0_20/jre//lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r06.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/zookeeper.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hadoop/lib:/usr/lib/hbase/lib:/usr/lib/sqoop/lib:/etc/hbase/conf
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/java/jdk1.6.0_20/jre/lib/amd64/server:/usr/java/jdk1.6.0_20/jre/lib/amd64:/usr/java/jdk1.6.0_20/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.32-71.el6.x86_64
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client environment:user.name
=hdfs
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:user.home=/usr/lib/hadoop-0.20
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/home/ihub
12/03/05 11:42:42 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=ihub-jobtracker1:2181 sessionTimeout=180000
watcher=hconnection
12/03/05 11:42:42 INFO zookeeper.ClientCnxn: Opening socket connection to
server ihub-jobtracker1/192.168.1.98:2181
12/03/05 11:42:42 INFO zookeeper.ClientCnxn: Socket connection established
to ihub-jobtracker1/192.168.1.98:2181, initiating session
12/03/05 11:42:42 INFO zookeeper.ClientCnxn: Session establishment complete
on server ihub-jobtracker1/192.168.1.98:2181, sessionid 0x135d53c669a007a, negotiated timeout = 40000
12/03/05 11:42:42 INFO mapreduce.TableOutputFormat: Created table instance
for testload
12/03/05 11:42:42 INFO input.FileInputFormat: Total input paths to process
12/03/05 11:42:42 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
12/03/05 11:42:42 WARN snappy.LoadSnappy: Snappy native library not loaded
12/03/05 11:42:42 INFO mapred.JobClient: Running job: job_201203021306_0017
12/03/05 11:42:43 INFO mapred.JobClient:  map 0% reduce 0%
12/03/05 11:42:48 INFO mapred.JobClient:  map 100% reduce 0%
12/03/05 11:42:48 INFO mapred.JobClient: Job complete: job_201203021306_0017
12/03/05 11:42:48 INFO mapred.JobClient: Counters: 13
12/03/05 11:42:48 INFO mapred.JobClient:   Job Counters
12/03/05 11:42:48 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5063
12/03/05 11:42:48 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/03/05 11:42:48 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/03/05 11:42:48 INFO mapred.JobClient:     Launched map tasks=1
12/03/05 11:42:48 INFO mapred.JobClient:     Data-local map tasks=1
12/03/05 11:42:48 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
12/03/05 1
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB