Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> None. wtf is None?

Copy link to this message
None. wtf is None?
Can someone explain this script to me? It is freaking me out. When did Pig
start spitting out 'None' in place of null?

register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
register /me/pig/contrib/piggybank/java/piggybank.jar

define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

rmf /tmp/sent_mails
rmf /tmp/replies

/* Get rid of emails with reply_to, as they confuse everything in mailing
lists. */
avro_emails = load '/me/tmp/thu_emails' using AvroStorage();
clean_emails = filter avro_emails by froms is not null and reply_tos is

/* Treat emails without in_reply_to as sent emails */
combined_emails = foreach clean_emails generate froms, tos, message_id;
*sent_mails = foreach combined_emails generate flatten(froms.address) as
from, *
*                                              flatten(tos.address) as to, *
*                                              message_id;*
store sent_mails into '/tmp/sent_mails';

/* Treat in_reply_tos separately, as our FLATTEN() will filter otu the
nulls */
*replies = filter clean_emails by in_reply_to is not null;*
*replies = foreach replies generate flatten(froms.address) as from,*
*                                   flatten(tos.address) as to,*
*                                   in_reply_to;*
store replies into '/tmp/replies';
Despite filtering replies to emails that only have the 'in_reply_to'
field... I get the same number of records in both relations I store:

russell-jurneys-macbook-pro:pig rjurney$ cat /tmp/sent_mails/p*|wc -l
russell-jurneys-macbook-pro:pig rjurney$ cat /tmp/replies/p*|wc -l
Investigating shows me:

cat /tmp/replies/part-00001

Where did *None* come from? I thought FLATTEN would prune records with
empty columns, and I'm ok with it not but... what operators does None
respond to? It is not null. How do I prune these?
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
Alan Gates 2012-07-24, 20:43
Robert Yerex 2012-07-24, 13:50
Russell Jurney 2012-07-24, 14:30
Robert Yerex 2012-07-24, 16:51