Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Issue loading emails using Avro storage.


Copy link to this message
-
Issue loading emails using Avro storage.
Hi

I ve been stuck with this issue for a while and unable to get any help.

I was wondering if anyone can help.

I m trying to load email messages into a messages relation and unable to and i was wondeirng if anyone may have a sample email dataset which would allow me to play around with this script:
Following is the code from Agile Data Science book:

/* Load the emails in avro format (edit the path to match where you saved them) using the AvroStorage UDF from Piggybank */
messages = LOAD '/me/Data/test_mbox' USING AvroStorage();

I have manually downloaded my gmail which ends up being 350MB and then i have tried loading this file into messages and i got this error message:
*************************************

2014-03-03 01:52:26,294 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "(" "( "" at line 1, column 84.
Was expecting one of:
"as" ...
"parallel" ...
";" ...
"." ...
"$" ...
*************************************
Details at logfile: /home/cloudera/pig_1393839871002.log

I have then downloaded a sample email dataset and tried to load that one into the messages relation above
i get the same error.

Then i tried saving the following content from the book in a file and load it into the relation and i get the same error message:

here is the content:
*************************************

*************************************

Will keep the weeds from taking over.

Russell Jurney datasyndrome.com

I have also tried sending an email to russel but no response.

I am wondering if anyone may have a sample email dataset which would load with the avro so i can try out my next steps.
Any help will b appreciated really.
Please let me know.
Thanks
Sai