Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Reading part of file using Map Reduce


+
Pankaj Gupta 2012-10-31, 23:20
Copy link to this message
-
Re: Reading part of file using Map Reduce
IIRC you can do this, but MR had some issues if you passed it a
non-closed (but sync'd upon) file for splitting.

However, if you run into similar issues, try generating your own
splits over the big file via FileInputFormat#getSplits(…), which will
then work.

On Thu, Nov 1, 2012 at 4:50 AM, Pankaj Gupta <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Is it possible to run a MapReduce job on a part of file on HDFS? The use case is using a single file on HDFS as a stream to store all log events of a particular kind. New data can grow on top while Map Reduce can process old data. Of course one option would be to copy part of data into a separate file and give that to MapReduce but I was wondering if that extra copy can be avoided.
>
> Thanks,
> Pankaj

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB