Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Issue loading emails using Avro storage.


Copy link to this message
-
Issue loading emails using Avro storage.
Hi

I ve been stuck with this issue for a while and unable to get any help.

I was wondering if anyone can help.

I m trying to load email messages into a messages relation and unable to and i was wondeirng if anyone may have a sample email dataset which would allow me to play around with this script:
Following is the code from Agile Data Science book:

/* Load the emails in avro format (edit the path to match where you saved them) using the AvroStorage UDF from Piggybank */
messages = LOAD '/me/Data/test_mbox' USING AvroStorage();

I have manually downloaded my gmail which ends up being 350MB and then i have tried loading this file into messages and i got this error message:
*************************************

2014-03-03 01:52:26,294 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "(" "( "" at line 1, column 84.
Was expecting one of:
"as" ...
"parallel" ...
";" ...
"." ...
"$" ...
*************************************
Details at logfile: /home/cloudera/pig_1393839871002.log

I have then downloaded a sample email dataset and tried to load that one into the messages relation above
i get the same error.

Then i tried saving the following content from the book in a file and load it into the relation and i get the same error message:

here is the content:
*************************************

*************************************

Will keep the weeds from taking over.

Russell Jurney datasyndrome.com

I have also tried sending an email to russel but no response.

I am wondering if anyone may have a sample email dataset which would load with the avro so i can try out my next steps.
Any help will b appreciated really.
Please let me know.
Thanks
Sai

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB