Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Job setup for a pig run takes ages


Copy link to this message
-
Re: Job setup for a pig run takes ages
What loader are you using? Jt is not the place to look at, try jstacking your pig process. Most likely it's talking to the NamaNode most of the time because the loader is doing some per-file lookups.

On Jun 13, 2012, at 11:24 AM, Danfeng Li <[EMAIL PROTECTED]> wrote:

> We also run into the long setup time issue, but our problem is different
>
> 1. The setup time takes about 20minutes, we can't see anything on the jobtracker during this setup time.
> 2. Our data is saved in flat file, uncompressed.
> 3. Our code consists of many small pig files, they are used in the following way in the main pig file
> data_1 = load ...
> data_2 = load ...
> ...
> data_n = load ...
>
> run -param ... pigfile1.pig
> run -param ... pigfile2.pig
> ...
>
> store out1 ..
> store out2 ..
> ...
> 4. here's the part of the log file during the setup time, notice the time difference between "13:46:42" to "14:05:23", during that time, we can't see anything on the jobtracker.
> ...
> 2012-06-13 13:46:30,488 [main] INFO  org.apache.pig.Main - Logging error messages to: pig_1339609590477.log
> 2012-06-13 13:46:30,796 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:9000
> 2012-06-13 13:46:30,950 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master:9001
> 2012-06-13 13:46:32,766 [main] WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for rationale_fir. Using value : Account position (\\$
> 2012-06-13 13:46:32,766 [main] WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for rationale_sec. Using value K,
> 2012-06-13 13:46:32,766 [main] WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for rationale_thi. Using value %)
> 2012-06-13 13:46:32,767 [main] WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for detail_statment_pre. Using value  - matures on
> 2012-06-13 13:46:32,767 [main] WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for detail_statment_post. Using value .
> 2012-06-13 13:46:32,767 [main] WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for rationale_fir. Using value : Maturity date
> 2012-06-13 13:46:32,767 [main] WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for rationale_sec. Using value  Account position (\\$
> 2012-06-13 13:46:32,767 [main] WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for rationale_thi. Using value K,
> 2012-06-13 13:46:32,767 [main] WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for catalyst_pre. Using value  matures on
> 2012-06-13 13:46:32,767 [main] WARN  org.apache.pig.tools.parameters.PreprocessorContext - Warning : Multiple values found for catalyst_post. Using value .
> 2012-06-13 13:46:42,749 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: REPLICATED_JOIN,HASH_JOIN,COGROUP,GROUP_BY,ORDER_BY,DISTINCT,STREAMING,FILTER,CROSS,UNION
> 2012-06-13 13:46:42,749 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2012-06-13 14:05:23,460 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for var_raw: $0, $1, $2, $6, $7, $8, $9, $10
> 2012-06-13 14:05:23,474 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for var_mf: $5, $6, $7, $8, $9, $11, $12, $14, $15, $16, $17, $18, $19, $21, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45
> 2012-06-13 14:05:23,475 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for starmine: $0, $3, $4, $5, $6, $9, $10, $11