Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> AW: How to split a big file in HDFS by size

Copy link to this message
Re: AW: How to split a big file in HDFS by size
Evert Lammerts at Sara.nl did something seemed to your problem, spliting
a big 2.7 TB file to chunks of 10 GB.
This work was presented on the BioAssist Programmers' Day on January of
this year and its name was
"Large-Scale Data Storage and Processing for Scientist in The Netherlands"


P.D: I sent the message with a copy to him

El 6/20/2011 10:38 AM, Niels Basjes escribi�:
> Hi,
> On Mon, Jun 20, 2011 at 16:13, Mapred Learn<[EMAIL PROTECTED]>  wrote:
>> But this file is a gzipped text file. In this case, it will only go to 1 mapper than the case if it was
>> split into 60 1 GB files which will make map-red job finish earlier than one 60 GB file as it will
>> Hv 60 mappers running in parallel. Isn't it so ?
> Yes, that is very true.

Marcos Lu�s Ort�z Valmaseda
  Software Engineer (UCI)