Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HDFS Sink keeps .tmp files and closes with exception


+
Nishant Neeraj 2012-10-18, 20:18
+
Bhaskar V. Karambelkar 2012-10-18, 22:39
+
Hari Shreedharan 2012-10-18, 23:00
+
Nishant Neeraj 2012-10-19, 20:29
Copy link to this message
-
Re: HDFS Sink keeps .tmp files and closes with exception
Nishant,

a: if CDH4 was working for you, you could use it with hadoop-2.x or CDH3u5 with hadoop-1.x.
b: Looks like your rollSize/rollCount/rollInterval are all 0. Can you increase rollCount to say 1000 or so? If you see here: http://flume.apache.org/FlumeUserGuide.html#hdfs-sink, if you set the roll* configuration params to 0, they would never roll the files. If files are not rolled, they are not closed, and HDFS will show those as 0-sized files. Once the roll happens, HDFS GUI will show you the real file size. You can use any one of the three roll* config parameters to roll the files.

Thanks,
Hari
--
Hari Shreedharan
On Friday, October 19, 2012 at 1:29 PM, Nishant Neeraj wrote:

> Thanks for the responses.
>
> a: Got rid of all the CDH stuffs. (basically, started on a fresh AWS instance)
> b: Installed from binary files.
>
> It DID NOT work. Here is what I observed:
> flume-ng version: Flume 1.2.0
> Hadoop: 1.0.4
>
> This is what my configuration is:
>
> agent1.sinks.fileSink1.type = hdfs
> agent1.sinks.fileSink1.channel = memChannel1
> agent1.sinks.fileSink1.hdfs.path = hdfs://localhost:54310/flume/agg1/%y-%m-%d
> agent1.sinks.fileSink1.hdfs.filePrefix = agg2
> agent1.sinks.fileSink1.hdfs.rollInterval = 0
> agent1.sinks.fileSink1.hdfs.rollSize = 0
> agent1.sinks.fileSink1.hdfs.rollCount = 0
> agent1.sinks.fileSink1.hdfs.fileType = DataStream
> agent1.sinks.fileSink1.hdfs.writeFormat = Text
> #agent1.sinks.fileSink1.hdfs.batchSize = 10
>
> #1: startup error
> -----------------------------------
> With new intallation, I start to find this exception on start of Flume (it does not stop me from adding data to hdfs)
>
> 2012-10-19 19:48:32,191 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:70)] Creating instance of sink: fileSink1, type: hdfs
> 2012-10-19 19:48:32,296 (conf-file-poller-0) [DEBUG - org.apache.hadoop.conf.Configuration.<init>(Configuration.java:227)] java.io.IOException: config()
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:227)
> at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:214)
> at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:184)
> at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:236)
> at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:516)
> at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:238)
> at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
> at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadSinks (PropertiesFileConfigurationProvider.java:373)
> at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load (PropertiesFileConfigurationProvider.java:223)
> at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:123)
> at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
> at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run (AbstractFileConfigurationProvider.java:202)
> -- snip --
>
> #2: the old issue continues
> ------------------------------------
> When I start loading source, I see console shows that events gets generated. But HDFS GUI shows 0KB file with .tmp extention. Adding hdfs.batchSize has no effect, I would assume this should have flushed the content to the temp file. But no. I tried with smaller and bigger values of hdfs.batchSize, no effect.
>
> When I shutdown Flume, I see the data gets purged to the temp file. BUT the temp file is still holding the .tmp extention. So, basically NO WAY TO HAVE ONE SINGLE AGGRAGATED FILE of all the logs. If I make the rollSize setting to positive, things start to work, but forfeits the purpose.  
>
> Evenwith roll non-zero value, the last file stays as .tmp when I close Flume
>
> #3: Shutdown throws exception
+
Bhaskar V. Karambelkar 2012-10-19, 23:42
+
Hari Shreedharan 2012-10-20, 00:13
+
Nishant Neeraj 2012-10-20, 03:31
+
Nishant Neeraj 2012-10-20, 11:54
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB