Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Writing small files to one big file in hdfs


Copy link to this message
-
Re: Writing small files to one big file in hdfs
I'd recommend making a SequenceFile[1] to store each XML file as a value.

-Joey

[1]
http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/io/SequenceFile.html

On Tue, Feb 21, 2012 at 12:15 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> We have small xml files. Currently I am planning to append these small
> files to one file in hdfs so that I can take advantage of splits, larger
> blocks and sequential IO. What I am unsure is if it's ok to append one file
> at a time to this hdfs file
>
> Could someone suggest if this is ok? Would like to know how other do it.
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB