Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Splitting files on new line using hadoop fs


Copy link to this message
-
Re: Splitting files on new line using hadoop fs
bejoy.hadoop@... 2012-02-22, 20:23
Hi Mohit
        AFAIK there is no default mechanism available for the same in hadoop. File is split into blocks just based on the configured block size during hdfs copy. While processing the file using Mapreduce the record reader takes care of the new lines even if a line spans across multiple blocks.

Could you explain more on the use case that demands such a requirement while hdfs copy itself?

------Original Message------
From: Mohit Anchlia
To: [EMAIL PROTECTED]
ReplyTo: [EMAIL PROTECTED]
Subject: Splitting files on new line using hadoop fs
Sent: Feb 23, 2012 01:45

How can I copy large text files using "hadoop fs" such that split occurs
based on blocks + new lines instead of blocks alone? Is there a way to do
this?

Regards
Bejoy K S

>From handheld, Please excuse typos.