Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Is it always called part-00000?


Copy link to this message
-
Re: Is it always called part-00000?
Hi Mark,

1. If you use the old API, the ouput file is named part-00000, and if you
use the new API, the output file will be part-r-00000, and there will be
usually more than 1 output files, the output file number is determined by
the reducer number of your map-reduce job.

2. If you'd like to consume the output of the first job, you just need to
set the output folder of the first job as the input of second job

On Mon, Jan 18, 2010 at 9:11 AM, Mark Kerzner <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am writing a second step to run after my first Hadoop job step finished.
> It is to pick up the results of the previous step and to do further
> processing on it. Therefore, I have two questions please.
>
>   1. Is the output file always called  part-00000?
>   2. Am I perhaps better off reading all files in the output directory and
>   how do I do it?
>
> Thank you,
> Mark
>
> PS. Thank you guys for answering my questions - that's a tremendous help
> and
> a great resource.
>
> Mark
>

--
Best Regards

Jeff Zhang
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB