This example uses a small file - if you want to read a larger file you'll need to handle boto/s3 issues with downloading large files or have Python read directly from hdfs. I've found s3 actually works pretty well though for small files like this. Reading larger files in Python doesn't work very well because you have to worry about running out of memory when passing everything back from Python to Java.
Jeremy Karn / Lead Developer MORTAR DATA / 519 277 4391 / www.mortardata.com On Sun, Jul 20, 2014 at 5:14 PM, Russell Jurney <[EMAIL PROTECTED]> wrote:
On Thursday, July 24, 2014, Jeremy Karn <[EMAIL PROTECTED]> wrote:
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext