Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - AW: How to split a big file in HDFS by size


Copy link to this message
-
Re: AW: How to split a big file in HDFS by size
Marcos Ortiz 2011-06-20, 15:39
Evert Lammerts at Sara.nl did something seemed to your problem, spliting
a big 2.7 TB file to chunks of 10 GB.
This work was presented on the BioAssist Programmers' Day on January of
this year and its name was
"Large-Scale Data Storage and Processing for Scientist in The Netherlands"

http://www.slideshare.net/evertlammerts

P.D: I sent the message with a copy to him

El 6/20/2011 10:38 AM, Niels Basjes escribi�:
> Hi,
>
> On Mon, Jun 20, 2011 at 16:13, Mapred Learn<[EMAIL PROTECTED]>  wrote:
>    
>> But this file is a gzipped text file. In this case, it will only go to 1 mapper than the case if it was
>> split into 60 1 GB files which will make map-red job finish earlier than one 60 GB file as it will
>> Hv 60 mappers running in parallel. Isn't it so ?
>>      
> Yes, that is very true.
>
>    

--
Marcos Lu�s Ort�z Valmaseda
  Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://twitter.com/marcosluis2186