Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Using Pig to load data imported with Sqoop


Copy link to this message
-
Re: Using Pig to load data imported with Sqoop
I believe that Pig's SequenceFileStorage is not compatible with custom writables at the moment. Per the docs the storage is only able to work with following ones:

  Text, IntWritable, LongWritable, FloatWritable, DoubleWritable, BooleanWritable, ByteWritable

Jarcec

On Mon, Nov 04, 2013 at 07:18:43PM +1100, Andre Araujo wrote:
> Hi, all,
>
> I've loaded some data with Sqoop from Oracle onto HDFS, storing it as
> SequenceFiles and I'm having problems loading it with Pig.
> I'm using Sqoop 1.4.3 and used the following steps (simplified example
> using the DUAL table).
>
> Any ideas of why it loads incorrectly? Am I missing any steps?
>
> Thanks,
> Andre
>
>
>
> *1. Imported data from the table onto HDFS (the DUAL table has only 1 row
> with 1 field containing the string "X") *
>
> sqoop import -D mapred.child.java.opts="$JDBC_JAVA_OPTS" --connect $CONNSTR
>  -m 1 --query "select DUMMY from dual where \$CONDITIONS" --target-dir test
> --as-sequencefile --class-name com.acme.Dual
>
> The Dual.java file is attached.
>
> *2. Generated the Dual.jar file:*
>
> javac -cp
> /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/sqoop/sqoop-1.4.3-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/client-0.20/hadoop-core-2.0.0-mr1-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/hadoop-common.jar:.
> com/acme/Dual.java
> jar cf /tmp/Dual.jar com/acme/Dual.class
>
> *3. Tried to load the data with Pig, however, the field value is read as 0
> (zero) instead of the string "X"):*
>
> REGISTER
> /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/pig/piggybank.jar;
> REGISTER
> /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/sqoop/sqoop-1.4.3-cdh4.3.0.jar
> REGISTER
> /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/client-0.20/hadoop-core-2.0.0-mr1-cdh4.3.0.jar
> REGISTER
> /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/hadoop-common.jar
> REGISTER /tmp/Dual.jar
> DEFINE SequenceFileLoader
> org.apache.pig.piggybank.storage.SequenceFileLoader();
> log = LOAD 'test' USING SequenceFileLoader AS (DUMMY:chararray);
> DUMP log;
>
>
> ...
> 2013-11-04 03:21:32,325 [main] INFO
>  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
>
> HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt
>  Features
> 2.0.0-cdh4.3.0  0.11.0-cdh4.3.0 araujo  2013-11-04 03:21:12     2013-11-04
> 03:21:32     UNKNOWN
>
> Success!
>
> Job Stats (time in seconds):
> JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime
>  MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime
> MedianReducetime    Alias    Feature Outputs
> job_201310230912_0065   1       0       6       6       6       6       0
>     0       0       0       log     MAP_ONLY        hdfs://
> n1.hadoop.cto.pythian.com:8020/tmp/temp-805635901/tmp-702886222,
>
> Input(s):
> Successfully read 1 records (479 bytes) from: "hdfs://
> n1.hadoop.cto.pythian.com:8020/user/araujo/test"
>
> Output(s):
> Successfully stored 1 records (8 bytes) in: "hdfs://
> n1.hadoop.cto.pythian.com:8020/tmp/temp-805635901/tmp-702886222"
>
> Counters:
> Total records written : 1
> Total bytes written : 8
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
>
> Job DAG:
> job_201310230912_0065
>
>
> 2013-11-04 03:21:32,338 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Success!
> 2013-11-04 03:21:32,342 [main] INFO  org.apache.pig.data.SchemaTupleBackend
> - Key [pig.schematuple] was not set... will not generate code.
> 2013-11-04 03:21:32,350 [main] INFO
>  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 2013-11-04 03:21:32,350 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths to process : 1
> *(0)  <--- THIS SHOULD SHOW "X"*
>
>
> --
> André Araújo
> Database Administrator / SDM
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB