Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Problem loading sequence files with Elephant Bird


+
Chris Diehl 2012-05-16, 18:47
+
Andy Schlaikjer 2012-05-17, 20:20
+
Chris Diehl 2012-05-18, 00:07
Copy link to this message
-
Re: Problem loading sequence files with Elephant Bird
Chris, the console output mentions file "/opt/shared_storage/log_
analysis_pig_python_scripts/pig_1337299061301.log". Does this contain any
kind of stack trace? Were you running the script in local mode or on a
cluster? If the latter, there should be at least map task log output
someplace that may also have some clues.

Does path '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
contain SequenceFile<Text, Text> data? If not, you'll have to configure
SequenceFileLoader further to properly deserialize the key-value pairs.

Andy
On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <[EMAIL PROTECTED]> wrote:

> Andy,
>
> Here's what I'm seeing when I run the following script. There's no
> information beyond what is here in the log file.
>
> Chris
>
> REGISTER
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
> %declare NULL_CONVERTER
> 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>
> rmf /data/SearchLogJSON;
>
> -- Load raw log data
> raw_logs = LOAD
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
> $SEQFILE_LOADER ();
>
> -- Store the JSON
> STORE raw_logs INTO '/data/SearchLogJSON/';
>
> -------------------
>
> -sh-3.2$ pig dump_log_json.pig
> 2012-05-17 23:57:41,304 [main] INFO  org.apache.pig.Main - Logging error
> messages to:
> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> 2012-05-17 23:57:41,586 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: XXX
> 2012-05-17 23:57:41,932 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to map-reduce job tracker at: XXX
> 2012-05-17 23:57:42,204 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: UNKNOWN
> 2012-05-17 23:57:42,204 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2012-05-17 23:57:42,301 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
> scope-1 Operator Key: scope-1)
> 2012-05-17 23:57:42,317 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2012-05-17 23:57:42,349 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2012-05-17 23:57:42,349 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2012-05-17 23:57:42,529 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> to the job
> 2012-05-17 23:57:42,545 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-05-17 23:57:44,706 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2012-05-17 23:57:44,734 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2012-05-17 23:57:45,053 [Thread-4] INFO
>  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 2012-05-17 23:57:45,057 [Thread-4] INFO
>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths (combined) to process : 1
> 2012-05-17 23:57:45,236 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2012-05-17 23:57:45,849 [main] INFO
+
Chris Diehl 2012-05-18, 21:27
+
Chris Diehl 2012-05-18, 23:07
+
Raghu Angadi 2012-05-21, 20:39
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB