Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Splitting files on new line using hadoop fs

Mohit Anchlia 2012-02-22, 20:15
Copy link to this message
Re: Splitting files on new line using hadoop fs
Hi Mohit
        AFAIK there is no default mechanism available for the same in hadoop. File is split into blocks just based on the configured block size during hdfs copy. While processing the file using Mapreduce the record reader takes care of the new lines even if a line spans across multiple blocks.

Could you explain more on the use case that demands such a requirement while hdfs copy itself?

------Original Message------
From: Mohit Anchlia
Subject: Splitting files on new line using hadoop fs
Sent: Feb 23, 2012 01:45

How can I copy large text files using "hadoop fs" such that split occurs
based on blocks + new lines instead of blocks alone? Is there a way to do

Bejoy K S

>From handheld, Please excuse typos.
Mohit Anchlia 2012-02-22, 20:29
bejoy.hadoop@... 2012-02-22, 20:44
Mohit Anchlia 2012-02-22, 22:57