Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> hdfs.fileType = CompressedStream


Copy link to this message
-
Re: hdfs.fileType = CompressedStream
You are using gzip so the files won't splittable.
You may be better off using snappy and sequence files.
On Thu, Jan 30, 2014 at 10:51 AM, Jimmy <[EMAIL PROTECTED]> wrote:

> I am running few tests and would like to confirm whether this is true...
>
> hdfs.codeC = gzip
> hdfs.fileType = CompressedStream
> hdfs.writeFormat = Text
> hdfs.batchSize = 100
>
>
> now lets assume I have large number of transactions I roll file every 10
> minutes
>
> it seems the tmp file stay 0bytes and flushes at once after 10 minutes vs
> if I dont use compression, the file will grow as data are written to HDFS
>
> is this correct?
>
> Do you see any drawback in using compressedstream and with very large
> files? In my case 120MB compressed file (block size) is 10x uncompressed
>
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB