|
|
Yanzhi.liu 2012-09-29, 04:04
Hello: I am using the mongodb database.My flume source is custom directory source. I am configuration with jdbc channel,but the flume.log was : 2012-09-28 20:56:49,468 INFO lifecycle.LifecycleSupervisor: Stopping component: org.apache.flume.channel.jdbc.JdbcChannel@1690ab 2012-09-28 20:56:49,612 INFO impl.JdbcChannelProviderImpl: Embedded Derby shutdown raised SQL STATE 45000 as expected. 2012-09-28 20:56:49,613 INFO properties.PropertiesFileConfigurationProvider: Creating channels 2012-09-28 20:56:49,613 WARN impl.JdbcChannelProviderImpl: No connection URL specified. Using embedded derby database instance. 2012-09-28 20:56:49,613 WARN impl.JdbcChannelProviderImpl: Overriding values for - driver: org.apache.derby.jdbc.EmbeddedDriver, user: saconnectUrl: jdbc:derby:/home/flume/.flume/jdbc-channel/db;create=true, jdbc properties file: null, dbtype: DERBY I want to know how to configuration that the jdbc channel will be run. Thanks very much! My Name: Yanzhi Liu
+
Yanzhi.liu 2012-09-29, 04:04
Hari Shreedharan 2012-09-29, 05:32
Is there any specific reason that you are using jdbc channel? I would recommend using the FileChannel. The File Channel is what we would currently recommend for use as a durable channel. We have improved the channel a lot in the recent weeks. To take advantage of the latest features added to the channel, you can build it and drop in the new jars, or wait for the next release, which should happen soon.
Thanks, Hari
-- Hari Shreedharan On Friday, September 28, 2012 at 9:04 PM, Yanzhi.liu wrote:
> Hello: > I am using the mongodb database.My flume source is custom directory source. > I am configuration with jdbc channel,but the flume.log was : > 2012-09-28 20:56:49,468 INFO lifecycle.LifecycleSupervisor: Stopping component: org.apache.flume.channel.jdbc.JdbcChannel@1690ab (mailto:org.apache.flume.channel.jdbc.JdbcChannel@1690ab) > 2012-09-28 20:56:49,612 INFO impl.JdbcChannelProviderImpl: Embedded Derby shutdown raised SQL STATE 45000 as expected. > 2012-09-28 20:56:49,613 INFO properties.PropertiesFileConfigurationProvider: Creating channels > 2012-09-28 20:56:49,613 WARN impl.JdbcChannelProviderImpl: No connection URL specified. Using embedded derby database instance. > 2012-09-28 20:56:49,613 WARN impl.JdbcChannelProviderImpl: Overriding values for - driver: org.apache.derby.jdbc.EmbeddedDriver, user: saconnectUrl: jdbc:derby:/home/flume/.flume/jdbc-channel/db;create=true, jdbc properties file: null, dbtype: DERBY > I want to know how to configuration that the jdbc channel will be run. > Thanks very much! > My Name: > Yanzhi Liu > > > > > > > >
+
Hari Shreedharan 2012-09-29, 05:32
Yanzhi.liu 2012-09-29, 06:49
Hello Hari: Thanks for your question.But,I am using jdbc channel also use the file channel.File channel has a problem when there is more than one source to the file channel transmission the filechannel the Datadir accumulation of a large number of files, the the hdfs sink can not quickly deal with these files, it will cause a file lock is lost, so that can not continue, eventually leading to the entire flume cluster comprehensive stop.In order to better monitor, I therefore joined jdbc channel, how the number of event mangodb statistics can prevent data loss. So I want to get a good configuration for jdbc channel. My Name: Yanzhi Liu
------------------ 原始邮件 ------------------ 发件人: "Hari Shreedharan"<[EMAIL PROTECTED]>; 发送时间: 2012年9月29日(星期六) 中午1:32 收件人: "user"<[EMAIL PROTECTED]>; 主题: Re: About jdbc channel
Is there any specific reason that you are using jdbc channel? I would recommend using the FileChannel. The File Channel is what we would currently recommend for use as a durable channel. We have improved the channel a lot in the recent weeks. To take advantage of the latest features added to the channel, you can build it and drop in the new jars, or wait for the next release, which should happen soon.
Thanks, Hari
-- Hari Shreedharan On Friday, September 28, 2012 at 9:04 PM, Yanzhi.liu wrote: Hello: I am using the mongodb database.My flume source is custom directory source. I am configuration with jdbc channel,but the flume.log was : 2012-09-28 20:56:49,468 INFO lifecycle.LifecycleSupervisor: Stopping component: org.apache.flume.channel.jdbc.JdbcChannel@1690ab 2012-09-28 20:56:49,612 INFO impl.JdbcChannelProviderImpl: Embedded Derby shutdown raised SQL STATE 45000 as expected. 2012-09-28 20:56:49,613 INFO properties.PropertiesFileConfigurationProvider: Creating channels 2012-09-28 20:56:49,613 WARN impl.JdbcChannelProviderImpl: No connection URL specified. Using embedded derby database instance. 2012-09-28 20:56:49,613 WARN impl.JdbcChannelProviderImpl: Overriding values for - driver: org.apache.derby.jdbc.EmbeddedDriver, user: saconnectUrl: jdbc:derby:/home/flume/.flume/jdbc-channel/db;create=true, jdbc properties file: null, dbtype: DERBY I want to know how to configuration that the jdbc channel will be run. Thanks very much! My Name: Yanzhi Liu
+
Yanzhi.liu 2012-09-29, 06:49
Hari Shreedharan 2012-09-29, 07:04
Hi Yanzhi,
I am not sure what file lock you are talking about. File Channel by itself does not do any time based locking. The only locking it does is to ensure that multiple channels do not use the same data directory (so there is no issue of lock being lost - the lock file is simply deleted at channel stop). Also, the file channel deletes files as the data gets transmitted. The file channel maxFileSize is configurable, and supports a maximum of around 1.52GB. Adding multiple HDFS sinks can improve performance too.
What version of Flume are you using? I'd suggest trying out File Channel from trunk (or the upcoming v1.3.0). JDBC channel is generally a lot slower. I have tested Flume in various configurations and never encountered issues with the file channel. Can you give me details on the file channel problems you faced? It might be a simple config issue, and easily fixable.
As for the JDBC channel, I am not sure of a good configuration - as I have not really used it much. Please wait for someone else to reply if you still feel the JDBC channel is better. Thanks Hari
-- Hari Shreedharan On Friday, September 28, 2012 at 11:49 PM, Yanzhi.liu wrote:
> Hello Hari: > Thanks for your question.But,I am using jdbc channel also use the file channel.File channel has a problem when there is more than one source to the file channel transmission the filechannel the Datadir accumulation of a large number of files, the the hdfs sink can not quickly deal with these files, it will cause a file lock is lost, so that can not continue, eventually leading to the entire flume cluster comprehensive stop.In order to better monitor, I therefore joined jdbc channel, how the number of event mangodb statistics can prevent data loss. > So I want to get a good configuration for jdbc channel. > My Name: > Yanzhi Liu > > > > > > > > > ------------------ 原始邮件 ------------------ > 发件人: "Hari Shreedharan"<[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>; > 发送时间: 2012年9月29日(星期六) 中午1:32 > 收件人: "user"<[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>; > 主题: Re: About jdbc channel > > > Is there any specific reason that you are using jdbc channel? I would recommend using the FileChannel. The File Channel is what we would currently recommend for use as a durable channel. We have improved the channel a lot in the recent weeks. To take advantage of the latest features added to the channel, you can build it and drop in the new jars, or wait for the next release, which should happen soon. > > Thanks, > Hari > > -- > Hari Shreedharan > > > On Friday, September 28, 2012 at 9:04 PM, Yanzhi.liu wrote: > > > Hello: > > I am using the mongodb database.My flume source is custom directory source. > > I am configuration with jdbc channel,but the flume.log was : > > 2012-09-28 20:56:49,468 INFO lifecycle.LifecycleSupervisor: Stopping component: org.apache.flume.channel.jdbc.JdbcChannel@1690ab (mailto:org.apache.flume.channel.jdbc.JdbcChannel@1690ab) > > 2012-09-28 20:56:49,612 INFO impl.JdbcChannelProviderImpl: Embedded Derby shutdown raised SQL STATE 45000 as expected. > > 2012-09-28 20:56:49,613 INFO properties.PropertiesFileConfigurationProvider: Creating channels > > 2012-09-28 20:56:49,613 WARN impl.JdbcChannelProviderImpl: No connection URL specified. Using embedded derby database instance. > > 2012-09-28 20:56:49,613 WARN impl.JdbcChannelProviderImpl: Overriding values for - driver: org.apache.derby.jdbc.EmbeddedDriver, user: saconnectUrl: jdbc:derby:/home/flume/.flume/jdbc-channel/db;create=true, jdbc properties file: null, dbtype: DERBY > > I want to know how to configuration that the jdbc channel will be run. > > Thanks very much! > > My Name: > > Yanzhi Liu > > > > > > > > > > > > > > > > > > > >
+
Hari Shreedharan 2012-09-29, 07:04
Yanzhi.liu 2012-09-30, 23:32
Hello Hari: I am using the flume 1.2.0.But I am talking about the flume 1.3.0.But I am caring the version for 1.3.0 that it is n't stable .But I am thinking about your ideas. Thank you very much for your ideas! My Name: Yanzhi Liu
------------------ 原始邮件 ------------------ 发件人: "Hari Shreedharan"<[EMAIL PROTECTED]>; 发送时间: 2012年9月29日(星期六) 下午3:04 收件人: "user"<[EMAIL PROTECTED]>; 主题: Re: About jdbc channel
Hi Yanzhi,
I am not sure what file lock you are talking about. File Channel by itself does not do any time based locking. The only locking it does is to ensure that multiple channels do not use the same data directory (so there is no issue of lock being lost - the lock file is simply deleted at channel stop). Also, the file channel deletes files as the data gets transmitted. The file channel maxFileSize is configurable, and supports a maximum of around 1.52GB. Adding multiple HDFS sinks can improve performance too.
What version of Flume are you using? I'd suggest trying out File Channel from trunk (or the upcoming v1.3.0). JDBC channel is generally a lot slower. I have tested Flume in various configurations and never encountered issues with the file channel. Can you give me details on the file channel problems you faced? It might be a simple config issue, and easily fixable.
As for the JDBC channel, I am not sure of a good configuration - as I have not really used it much. Please wait for someone else to reply if you still feel the JDBC channel is better.
Thanks Hari
-- Hari Shreedharan On Friday, September 28, 2012 at 11:49 PM, Yanzhi.liu wrote: Hello Hari: Thanks for your question.But,I am using jdbc channel also use the file channel.File channel has a problem when there is more than one source to the file channel transmission the filechannel the Datadir accumulation of a large number of files, the the hdfs sink can not quickly deal with these files, it will cause a file lock is lost, so that can not continue, eventually leading to the entire flume cluster comprehensive stop.In order to better monitor, I therefore joined jdbc channel, how the number of event mangodb statistics can prevent data loss. So I want to get a good configuration for jdbc channel. My Name: Yanzhi Liu
------------------ 原始邮件 ------------------ 发件人: "Hari Shreedharan"<[EMAIL PROTECTED]>; 发送时间: 2012年9月29日(星期六) 中午1:32 收件人: "user"<[EMAIL PROTECTED]>; 主题: Re: About jdbc channel
Is there any specific reason that you are using jdbc channel? I would recommend using the FileChannel. The File Channel is what we would currently recommend for use as a durable channel. We have improved the channel a lot in the recent weeks. To take advantage of the latest features added to the channel, you can build it and drop in the new jars, or wait for the next release, which should happen soon.
Thanks, Hari
-- Hari Shreedharan On Friday, September 28, 2012 at 9:04 PM, Yanzhi.liu wrote: Hello: I am using the mongodb database.My flume source is custom directory source. I am configuration with jdbc channel,but the flume.log was : 2012-09-28 20:56:49,468 INFO lifecycle.LifecycleSupervisor: Stopping component: org.apache.flume.channel.jdbc.JdbcChannel@1690ab 2012-09-28 20:56:49,612 INFO impl.JdbcChannelProviderImpl: Embedded Derby shutdown raised SQL STATE 45000 as expected. 2012-09-28 20:56:49,613 INFO properties.PropertiesFileConfigurationProvider: Creating channels 2012-09-28 20:56:49,613 WARN impl.JdbcChannelProviderImpl: No connection URL specified. Using embedded derby database instance. 2012-09-28 20:56:49,613 WARN impl.JdbcChannelProviderImpl: Overriding values for - driver: org.apache.derby.jdbc.EmbeddedDriver, user: saconnectUrl: jdbc:derby:/home/flume/.flume/jdbc-channel/db;create=true, jdbc properties file: null, dbtype: DERBY I want to know how to configuration that the jdbc channel will be run. Thanks very much! My Name: Yanzhi Liu
+
Yanzhi.liu 2012-09-30, 23:32
Hari Shreedharan 2012-10-01, 00:39
Hi Yanzhi, Flume-1.3.0 did have a bug which we fixed recently. This bug caused the File Channel to not delete some older files on time, causing a huge number of files to deleted in certain cases. The bug is https://issues.apache.org/jira/browse/FLUME-1606. This one has been fixed and will be there in the final version of Flume-1.3.0. This should fix the issue you are facing. You can either wait for the release or check out trunk and build it. Let me know if you still see massive backlogs. The performance of File Channel is likely to be an order of magnitude or more better than JDBC channel. Also it is not recommended to use File Channel on anything other than a local disk. Better not to use a network mounted disk (considering the guarantees that most network file systems give). Thanks, Hari -- Hari Shreedharan On Sunday, September 30, 2012 at 4:32 PM, Yanzhi.liu wrote: > Hello Hari: > I am using the flume 1.2.0.But I am talking about the flume 1.3.0.But I am caring the version for 1.3.0 that it is n't stable .But I am thinking about your ideas. > Thank you very much for your ideas! > My Name: > Yanzhi Liu > > > > > > > > > ------------------ 原始邮件 ------------------ > 发件人: "Hari Shreedharan"<[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>; > 发送时间: 2012年9月29日(星期六) 下午3:04 > 收件人: "user"<[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>; > 主题: Re: About jdbc channel > > > Hi Yanzhi, > > I am not sure what file lock you are talking about. File Channel by itself does not do any time based locking. The only locking it does is to ensure that multiple channels do not use the same data directory (so there is no issue of lock being lost - the lock file is simply deleted at channel stop). Also, the file channel deletes files as the data gets transmitted. The file channel maxFileSize is configurable, and supports a maximum of around 1.52GB. Adding multiple HDFS sinks can improve performance too. > > What version of Flume are you using? I'd suggest trying out File Channel from trunk (or the upcoming v1.3.0). JDBC channel is generally a lot slower. I have tested Flume in various configurations and never encountered issues with the file channel. Can you give me details on the file channel problems you faced? It might be a simple config issue, and easily fixable. > > As for the JDBC channel, I am not sure of a good configuration - as I have not really used it much. Please wait for someone else to reply if you still feel the JDBC channel is better. > > > Thanks > Hari > > -- > Hari Shreedharan > > > On Friday, September 28, 2012 at 11:49 PM, Yanzhi.liu wrote: > > > Hello Hari: > > Thanks for your question.But,I am using jdbc channel also use the file channel.File channel has a problem when there is more than one source to the file channel transmission the filechannel the Datadir accumulation of a large number of files, the the hdfs sink can not quickly deal with these files, it will cause a file lock is lost, so that can not continue, eventually leading to the entire flume cluster comprehensive stop.In order to better monitor, I therefore joined jdbc channel, how the number of event mangodb statistics can prevent data loss. > > So I want to get a good configuration for jdbc channel. > > My Name: > > Yanzhi Liu > > > > > > > > > > > > > > > > > > ------------------ 原始邮件 ------------------ > > 发件人: "Hari Shreedharan"<[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>; > > 发送时间: 2012年9月29��(星期六) 中午1:32 > > 收件人: "user"<[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>; > > 主题: Re: About jdbc channel > > > > > > Is there any specific reason that you are using jdbc channel? I would recommend using the FileChannel. The File Channel is what we would currently recommend for use as a durable channel. We have improved the channel a lot in the recent weeks. To take advantage of the latest features added to the channel, you can build it and drop in the new jars, or wait for the next release, which should happen soon.
+
Hari Shreedharan 2012-10-01, 00:39
Yanzhi.liu 2012-10-09, 03:56
Hello everybody: This is my configuration for flume 1.2.0. I want to know whether the jdbc channel could use the mangodb. agent_foo.sources = cpisDirSource agent_foo.channels = JdbcChannel agent_foo.sources.cpisDirSource.type = com.chinacache.cpis.sources.CpisDirSource agent_foo.sources.cpisDirSource.dir = /home/flume/dir agent_foo.sources.cpisDirSource.channels = fileChannel agent_foo.sources.cpisDirSource.batchSize = 1 agent_foo.sources.cpisDirSource.max-line = 10000 agent_foo.channels.JdbcChannel.type = jdbc agent_foo.channels.JdbcChannel.db.type = mongodb agent_foo.channels.JdbcChannel.driver.class = com.mongodb.jdbc.MongoDriver agent_foo.channels.JdbcChannel.dirver.url = http://192.168.42.135:28017/agent_foo.channels.JdbcChannel.db.username = "test"; agent_foo.channels.JdbcChannel.connection.properties.file = /home/flume/software/mongodb/bin/mongod -dbpath=/home/flume/mongodb -logpath=/home/flume/mongodb/mongodb.log -directoryperdb > /dev/null 2>&1 & #agent_foo.channels.JdbcChannel.db.password = agent_foo.channels.JdbcChannel.maximum.capacity = 1000000 2012-10-08 20:53:15,625 WARN impl.JdbcChannelProviderImpl: No connection URL specified. Using embedded derby database instance. 2012-10-08 20:53:15,626 WARN impl.JdbcChannelProviderImpl: Overriding values for - driver: org.apache.derby.jdbc.EmbeddedDriver, user: saconnectUrl: jdbc:derby:/home/flume/.flume/jdbc-channel/db;create=true, jdbc properties file: null, dbtype: DERBY Thanks you very much! My Name Yanzhi Liu
+
Yanzhi.liu 2012-10-09, 03:56
Brock Noland 2012-10-09, 14:33
Hi, JDBC probably could use mongo, but it's not implemented. There is code that would need to be in Flume to support MongoDB. I would suggest using either CDh4.1 or a build of the flume-1.3.0 branch (which we hope to release soon) both of which give you FileChannel. Brock On Mon, Oct 8, 2012 at 10:56 PM, Yanzhi.liu <[EMAIL PROTECTED]> wrote: > Hello everybody: > This is my configuration for flume 1.2.0. > I want to know whether the jdbc channel could use the mangodb. > agent_foo.sources = cpisDirSource > agent_foo.channels = JdbcChannel > agent_foo.sources.cpisDirSource.type > com.chinacache.cpis.sources.CpisDirSource > agent_foo.sources.cpisDirSource.dir = /home/flume/dir > agent_foo.sources.cpisDirSource.channels = fileChannel > agent_foo.sources.cpisDirSource.batchSize = 1 > agent_foo.sources.cpisDirSource.max-line = 10000 > agent_foo.channels.JdbcChannel.type = jdbc > agent_foo.channels.JdbcChannel.db.type = mongodb > agent_foo.channels.JdbcChannel.driver.class = com.mongodb.jdbc.MongoDriver > agent_foo.channels.JdbcChannel.dirver.url = http://192.168.42.135:28017/> agent_foo.channels.JdbcChannel.db.username = "test"; > agent_foo.channels.JdbcChannel.connection.properties.file > /home/flume/software/mongodb/bin/mongod -dbpath=/home/flume/mongodb > -logpath=/home/flume/mongodb/mongodb.log -directoryperdb > /dev/null 2>&1 & > > #agent_foo.channels.JdbcChannel.db.password > agent_foo.channels.JdbcChannel.maximum.capacity = 1000000 > > > 2012-10-08 20:53:15,625 WARN impl.JdbcChannelProviderImpl: No connection > URL specified. Using embedded derby database instance. > 2012-10-08 20:53:15,626 WARN impl.JdbcChannelProviderImpl: Overriding > values for - driver: org.apache.derby.jdbc.EmbeddedDriver, user: > saconnectUrl: jdbc:derby:/home/flume/.flume/jdbc-channel/db;create=true, > jdbc properties file: null, dbtype: DERBY > > Thanks you very much! > My Name > Yanzhi Liu > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
+
Brock Noland 2012-10-09, 14:33
|
|