Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Prepare input data for Hadoop


Copy link to this message
-
Re: Prepare input data for Hadoop
Use an external database (e.g., mysql) or some other transactional
bookkeeping system to record the state of all your datasets (STAGING,
UPLOADED, PROCESSED)

- Aaron
On Thu, Sep 17, 2009 at 7:17 PM, Huy Phan <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I have a question about strategy to prepare data for Hadoop to run their
> MapReduce job, we have to (somehow) copy input files from our local
> filesystem to HDFS, how can we make sure that one input file is not
> processed twice in different executions of the same MapReduce job (let's say
> my MapReduce job runs once each 30 mins) ?
> I don't want to delete my input files after finishing the MR job because I
> may want to re-use it later.
>
>
>
>