Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop scripting when to use dfs -put


Copy link to this message
-
Hadoop scripting when to use dfs -put
Hi, I originally posted this on the dumbo forum, but it's more a
general scripting hadoop issue.

When testing a simple script that created some local files
and then copied them to hdfs
with os.system("hadoop dfs -put /home/havard/bio_sci/file.json
/tmp/bio_sci/file.json")

the tasks fail with out of heap memory. The files are tiny, and I have
tried increasing the
heap size. When skipping the hadoop dfs -put, the tasks do not fail.

Is it wrong to use hadoop dfs -put inside running a script with
hadoop? Should I instead
transfer the files at the end with a combiner, or simply mount hdfs
locally and write directly to hdfs? Any general suggestions?
--
Håvard Wahl Kongsgård
NTNU

http://havard.security-review.net/