Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Reducers slowing down? (UNCLASSIFIED)


Copy link to this message
-
Reducers slowing down? (UNCLASSIFIED)
Classification: UNCLASSIFIED
Caveats: NONE

Hello, I'm using pig0.6.0 running the following script on a 27 datanode
cluster running RedHat Enterprise 5.4:

 -- Holds the Pig UDF wrapper around the SecondString SoftTFIDF function
REGISTER /home/CandidateIdentification.jar;

-- SecondString itself
REGISTER /home/secondstring-20060615.jar;

-- |People| ~ 62,500,000 from the English GigaWord 4th Edition
People = LOAD '/data/UniquePeoplePerStory' USING PigStorage(',') AS
(file:chararray, name:chararray);

-- |Actors| ~ 8,000 from the Stanford Movie Database
Actors = LOAD '/data/Actors' USING PigStorage(',') AS (actor:chararray);

-- |ToCompare| ~ 500,000,000,000
ToCompare = CROSS Actors, People PARALLEL 30;
 
-- Score 'em and store 'em
Results = FOREACH ToCompare GENERATE $0, $1, $2,
ARL.CandidateIdentificationUDF.Similarity($2, $0);

STORE Results INTO '/data/ScoredPeople' USING PigStorage(',');

The first 100,000,000,000 reduce output records were produced in some 25
hours. But after 75 hours it has produced a total of 140,000,000,000
(instead of the 300,000,000,000 I was extrapolating) and seems to be
producing them at a slower and slower rate. What is going on? Did I screw
something up?

Thanks,
Robert

Classification: UNCLASSIFIED
Caveats: NONE
+
Thejas Nair 2010-03-05, 23:17
+
Mridul Muralidharan 2010-03-06, 02:38
+
Winkler, Robert 2010-03-08, 19:58
+
Winkler, Robert 2010-03-11, 19:28
+
Dmitriy Ryaboy 2010-03-11, 19:49
+
Winkler, Robert 2010-03-11, 21:13
+
Dmitriy Ryaboy 2010-03-11, 21:39
+
Richard Ding 2010-03-12, 00:43
+
Dmitriy Ryaboy 2010-03-12, 00:58
+
Richard Ding 2010-03-12, 01:09
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB