I've read that increasing above (default 4kb) number to, say 128kb, might speed things up.

My input is 40mln serialised records coming from RDMS and I noticed that with increased IO my job actually runs a tiny bit slower. Is that possible?

p.s. got two questions:
1. During Sqoop import I see that two additional files are generated in the HDFS folder, namely
Is there a way to redirect these files to a different directory? I cannot find an answer.

2. I run multiple reducers and each generate each own output. If I was to merge all the output, will running either of the below commands be recommended?

hadoop dfs -getmerge <output/*> <localdst>
hadoop dfs -cat output/* > output_All
hadoop dfs -get output_All <localdst>

Brad Sarsfield 2012-11-21, 17:00