-Re: Splitting files on new line using hadoop fs
bejoy.hadoop@... 2012-02-22, 20:23
AFAIK there is no default mechanism available for the same in hadoop. File is split into blocks just based on the configured block size during hdfs copy. While processing the file using Mapreduce the record reader takes care of the new lines even if a line spans across multiple blocks.
Could you explain more on the use case that demands such a requirement while hdfs copy itself?
From: Mohit Anchlia
To: [EMAIL PROTECTED]
ReplyTo: [EMAIL PROTECTED]
Subject: Splitting files on new line using hadoop fs
Sent: Feb 23, 2012 01:45
How can I copy large text files using "hadoop fs" such that split occurs
based on blocks + new lines instead of blocks alone? Is there a way to do
Bejoy K S
>From handheld, Please excuse typos.