Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Job setup for a pig run takes ages


+
Markus Resch 2012-05-31, 09:38
+
Prashant Kommireddi 2012-05-31, 09:57
+
Markus Resch 2012-06-01, 13:34
Copy link to this message
-
Re: Job setup for a pig run takes ages
Hey Markus,

I am also interested to look at your pig script. I think there is some
insight to be gained here.

Thanks,
Ashutosh
On Fri, Jun 1, 2012 at 6:34 AM, Markus Resch <[EMAIL PROTECTED]> wrote:

> Hi Prashant, Hi Thejas,
>
> thanks for your very quick answer.
> No, this is not a typo. Those time stamps are true and as I said the
> machines are not very busy during this time.
>
> As this is our test cluster I am sure I am the only one who is running
> jobs on it. Another issue we have is that we are currently only able to
> run one job at a time but this shouldn't be the topic of this request.
> We even have no continuous input stream to that cluster but copied a
> bunch of data to it some time ago.
> From my perspective the 464 GB of input data you are mentioned is the
> uncompressed amount of the 160GByte compressed files. Which I get when I
> use hadoop -f dus on that folder.
>
> Another interesting fact for you could be that we're running the
> cloudera CDH3 Update 3 version on our systems.
>
> I suspect this could be due to some fancy avro schema validation
> implicitly executed by the avro storage? If so, can this be avoided?
>
> Sadly I'm currently not able to provide you the actual script currently
> as it contains confidential information but I will try to provide you a
> version as soon as possible. But I'd rather think of a configuration
> problem to the hadoop or pig anyways as the script works fine with a
> smaller amount of input data
>
> I would ask the hadoop mailing list if this issue would occur during the
> actual mapred run but as this occur even before a single mapred job is
> launched I suspect pig to have a problem.
>
> Thanks
> Markus
>
> This is the full log until the main work job starts:
> mapred@ournamenode$ pig OurScript.pig
> 2012-05-30 15:27:21,052 [main] INFO  org.apache.pig.Main - Logging error
> messages to: /tmp/pig_1338384441037.log
> 2012-05-30 15:27:21,368 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: hdfs://OurNamenode:9000
> 2012-05-30 15:27:21,609 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to map-reduce job tracker at:
> dev-jobtracker001.eu-fra.adtech.com:54311
> 2012-05-30 15:57:27,814 [main] WARN  org.apache.pig.PigServer -
> Encountered Warning IMPLICIT_CAST_TO_LONG 1 time(s).
> 2012-05-30 15:57:27,816 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: REPLICATED_JOIN,COGROUP,GROUP_BY,FILTER
> 2012-05-30 15:57:27,816 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2012-05-30 16:06:55,304 [main] INFO
> org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned
> for CampaignInfo: $0, $1, $2, $4, $5, $6, $8, $9
> 2012-05-30 16:06:55,308 [main] INFO
> org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned
> for dataImport: $2, $3, $4
> 2012-05-30 16:06:55,441 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> OutputData1:
> Store(SomeOutputFile1.csv:org.apache.pig.builtin.PigStorage) - scope-521
> Operator Key: scope-521)
> 2012-05-30 16:06:55,441 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> OutputData2:
> Store(/SomeOutputFile2.csv:org.apache.pig.builtin.PigStorage) -
> scope-524 Operator Key: scope-524)
> 2012-05-30 16:06:55,441 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> OutputData2:
> Store(/SomeOutputFile3.csv:org.apache.pig.builtin.PigStorage) -
> scope-483 Operator Key: scope-483)
> 2012-05-30 16:06:55,453 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
> - File concatenation threshold: 100 optimistic? false
> 2012-05-30 16:06:55,467 [main] INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
> paths to process : 1
+
Thejas Nair 2012-06-01, 16:45
+
Markus Resch 2012-06-04, 17:28
+
Thejas Nair 2012-06-05, 22:35
+
Alex Rovner 2012-06-12, 23:16
+
Markus Resch 2012-06-13, 09:23
+
Danfeng Li 2012-06-13, 18:24
+
Dmitriy Ryaboy 2012-06-16, 15:24
+
Danfeng Li 2012-06-18, 21:51
+
Dmitriy Ryaboy 2012-06-18, 23:26
+
Danfeng Li 2012-06-19, 14:42
+
Thejas Nair 2012-06-19, 16:35
+
Julien Le Dem 2012-06-20, 23:48
+
Thejas Nair 2012-06-01, 02:39
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB