Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop scripting when to use dfs -put


Copy link to this message
-
Hadoop scripting when to use dfs -put
Hi, I originally posted this on the dumbo forum, but it's more a
general scripting hadoop issue.

When testing a simple script that created some local files
and then copied them to hdfs
with os.system("hadoop dfs -put /home/havard/bio_sci/file.json
/tmp/bio_sci/file.json")

the tasks fail with out of heap memory. The files are tiny, and I have
tried increasing the
heap size. When skipping the hadoop dfs -put, the tasks do not fail.

Is it wrong to use hadoop dfs -put inside running a script with
hadoop? Should I instead
transfer the files at the end with a combiner, or simply mount hdfs
locally and write directly to hdfs? Any general suggestions?
--
Håvard Wahl Kongsgård
NTNU

http://havard.security-review.net/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB