Pig, mail # user - Issue loading emails using Avro storage. - 2014-03-03, 10:18
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
Issue loading emails using Avro storage.

I ve been stuck with this issue for a while and unable to get any help.

I was wondering if anyone can help.

I m trying to load email messages into a messages relation and unable to and i was wondeirng if anyone may have a sample email dataset which would allow me to play around with this script:
Following is the code from Agile Data Science book:

/* Load the emails in avro format (edit the path to match where you saved them) using the AvroStorage UDF from Piggybank */
messages = LOAD '/me/Data/test_mbox' USING AvroStorage();

I have manually downloaded my gmail which ends up being 350MB and then i have tried loading this file into messages and i got this error message:

2014-03-03 01:52:26,294 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "(" "( "" at line 1, column 84.
Was expecting one of:
"as" ...
"parallel" ...
";" ...
"." ...
"$" ...
Details at logfile: /home/cloudera/pig_1393839871002.log

I have then downloaded a sample email dataset and tried to load that one into the messages relation above
i get the same error.

Then i tried saving the following content from the book in a file and load it into the relation and i get the same error message:

here is the content:


Will keep the weeds from taking over.

Russell Jurney datasyndrome.com

I have also tried sending an email to russel but no response.

I am wondering if anyone may have a sample email dataset which would load with the avro so i can try out my next steps.
Any help will b appreciated really.
Please let me know.

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB