Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Hadoop scripting when to use dfs -put

Copy link to this message
Hadoop scripting when to use dfs -put
Håvard Wahl Kongsgård 2012-02-13, 18:39
Hi, I originally posted this on the dumbo forum, but it's more a
general scripting hadoop issue.

When testing a simple script that created some local files
and then copied them to hdfs
with os.system("hadoop dfs -put /home/havard/bio_sci/file.json

the tasks fail with out of heap memory. The files are tiny, and I have
tried increasing the
heap size. When skipping the hadoop dfs -put, the tasks do not fail.

Is it wrong to use hadoop dfs -put inside running a script with
hadoop? Should I instead
transfer the files at the end with a combiner, or simply mount hdfs
locally and write directly to hdfs? Any general suggestions?
Håvard Wahl Kongsgård