Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> AW: How to split a big file in HDFS by size


Copy link to this message
-
Re: AW: How to split a big file in HDFS by size
Evert Lammerts at Sara.nl did something seemed to your problem, spliting
a big 2.7 TB file to chunks of 10 GB.
This work was presented on the BioAssist Programmers' Day on January of
this year and its name was
"Large-Scale Data Storage and Processing for Scientist in The Netherlands"

http://www.slideshare.net/evertlammerts

P.D: I sent the message with a copy to him

El 6/20/2011 10:38 AM, Niels Basjes escribi�:
> Hi,
>
> On Mon, Jun 20, 2011 at 16:13, Mapred Learn<[EMAIL PROTECTED]>  wrote:
>    
>> But this file is a gzipped text file. In this case, it will only go to 1 mapper than the case if it was
>> split into 60 1 GB files which will make map-red job finish earlier than one 60 GB file as it will
>> Hv 60 mappers running in parallel. Isn't it so ?
>>      
> Yes, that is very true.
>
>    

--
Marcos Lu�s Ort�z Valmaseda
  Software Engineer (UCI)
  http://marcosluis2186.posterous.com
  http://twitter.com/marcosluis2186
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB