Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - About jdbc channel


Copy link to this message
-
Re: About jdbc channel
Hari Shreedharan 2012-10-01, 00:39
Hi Yanzhi,

Flume-1.3.0 did have a bug which we fixed recently. This bug caused the File Channel to not delete some older files on time, causing a huge number of files to deleted in certain cases. The bug is https://issues.apache.org/jira/browse/FLUME-1606. This one has been fixed and will be there in the final version of Flume-1.3.0. This should fix the issue you are facing. You can either wait for the release or check out trunk and build it. Let me know if you still see massive backlogs. The performance of File Channel is likely to be an order of magnitude or more better than JDBC channel.

 Also it is not recommended to use File Channel on anything other than a local disk. Better not to use a network mounted disk (considering the guarantees that most network file systems give).
Thanks,
Hari
--  
Hari Shreedharan
On Sunday, September 30, 2012 at 4:32 PM, Yanzhi.liu wrote:

> Hello Hari:
>     I am using the flume 1.2.0.But I am talking about the flume 1.3.0.But I am caring the version for 1.3.0 that it is n't stable .But I am thinking about your ideas.
> Thank you very much for your ideas!
> My Name:
> Yanzhi Liu
>  
>  
>  
>  
>  
>  
>  
>  
> ------------------ 原始邮件 ------------------  
> 发件人: "Hari Shreedharan"<[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>;
> 发送时间: 2012年9月29日(星期六) 下午3:04
> 收件人: "user"<[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>;  
> 主题: Re: About jdbc channel
>  
>  
> Hi Yanzhi,  
>  
> I am not sure what file lock you are talking about. File Channel by itself does not do any time based locking. The only locking it does is to ensure that multiple channels do not use the same data directory (so there is no issue of lock being lost - the lock file is simply deleted at channel stop).  Also, the file channel deletes files as the data gets transmitted. The file channel maxFileSize is configurable, and supports a maximum of around 1.52GB. Adding multiple HDFS sinks can improve performance too.  
>  
> What version of Flume are you using? I'd suggest trying out File Channel from trunk (or the upcoming v1.3.0). JDBC channel is generally a lot slower. I have tested Flume in various configurations and never encountered issues with the file channel. Can you give me details on the file channel problems you faced? It might be a simple config issue, and easily fixable.  
>  
> As for the JDBC channel, I am not sure of a good configuration - as I have not really used it much. Please wait for someone else to reply if you still feel the JDBC channel is better.  
>  
>  
> Thanks  
> Hari
>  
> --  
> Hari Shreedharan
>  
>  
> On Friday, September 28, 2012 at 11:49 PM, Yanzhi.liu wrote:
>  
> > Hello Hari:
> >     Thanks for your question.But,I am using jdbc channel also use the file channel.File channel has a problem when there is more than one source to the file channel transmission the filechannel the Datadir accumulation of a large number of files, the the hdfs sink can not quickly deal with these files, it will cause a file lock is lost, so that can not continue, eventually leading to the entire flume cluster comprehensive stop.In order to better monitor, I therefore joined jdbc channel, how the number of event mangodb statistics can prevent data loss.
> >     So I want to get a good configuration for jdbc channel.
> > My Name:
> > Yanzhi Liu
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> > ------------------ 原始邮件 ------------------  
> > 发件人: "Hari Shreedharan"<[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>;
> > 发送时间: 2012年9月29日(星期六) 中午1:32
> > 收件人: "user"<[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>;  
> > 主题: Re: About jdbc channel
> >  
> >  
> > Is there any specific reason that you are using jdbc channel? I would recommend using the FileChannel. The File Channel is what we would currently recommend for use as a durable channel. We have improved the channel a lot in the recent weeks. To take advantage of the latest features added to the channel, you can build it and drop in the new jars, or wait for the next release, which should happen soon.