Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Problem loading sequence files with Elephant Bird


Copy link to this message
-
Re: Problem loading sequence files with Elephant Bird
Andy Schlaikjer 2012-05-18, 18:24
Chris, the console output mentions file "/opt/shared_storage/log_
analysis_pig_python_scripts/pig_1337299061301.log". Does this contain any
kind of stack trace? Were you running the script in local mode or on a
cluster? If the latter, there should be at least map task log output
someplace that may also have some clues.

Does path '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
contain SequenceFile<Text, Text> data? If not, you'll have to configure
SequenceFileLoader further to properly deserialize the key-value pairs.

Andy
On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <[EMAIL PROTECTED]> wrote:

> Andy,
>
> Here's what I'm seeing when I run the following script. There's no
> information beyond what is here in the log file.
>
> Chris
>
> REGISTER
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
> %declare NULL_CONVERTER
> 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>
> rmf /data/SearchLogJSON;
>
> -- Load raw log data
> raw_logs = LOAD
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
> $SEQFILE_LOADER ();
>
> -- Store the JSON
> STORE raw_logs INTO '/data/SearchLogJSON/';
>
> -------------------
>
> -sh-3.2$ pig dump_log_json.pig
> 2012-05-17 23:57:41,304 [main] INFO  org.apache.pig.Main - Logging error
> messages to:
> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> 2012-05-17 23:57:41,586 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: XXX
> 2012-05-17 23:57:41,932 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to map-reduce job tracker at: XXX
> 2012-05-17 23:57:42,204 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: UNKNOWN
> 2012-05-17 23:57:42,204 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2012-05-17 23:57:42,301 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
> scope-1 Operator Key: scope-1)
> 2012-05-17 23:57:42,317 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2012-05-17 23:57:42,349 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2012-05-17 23:57:42,349 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2012-05-17 23:57:42,529 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> to the job
> 2012-05-17 23:57:42,545 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-05-17 23:57:44,706 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2012-05-17 23:57:44,734 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2012-05-17 23:57:45,053 [Thread-4] INFO
>  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 2012-05-17 23:57:45,057 [Thread-4] INFO
>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths (combined) to process : 1
> 2012-05-17 23:57:45,236 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2012-05-17 23:57:45,849 [main] INFO