|
Emile Kao
2012-11-29, 17:14
Brock Noland
2012-11-29, 17:18
Roman Shaposhnik
2012-11-30, 01:17
Brock Noland
2012-11-30, 01:26
Emile Kao
2012-11-30, 08:51
Brock Noland
2012-11-30, 12:40
Roman Shaposhnik
2012-12-01, 00:40
Emile Kao
2012-12-03, 09:51
|
-
Flume and HDFS integrationEmile Kao 2012-11-29, 17:14
Dear support,
I would like to ask you some questions about issues I am facing trying to implement Flume in a customer environment. I am using following release of Flume: apache-flume-1.4.0-SNAPSHOT-bin Here are my questions: Question no.1 I have define the following command in flume.conf: agent1.sources.tail.command = tail -F /opt/apache2/logs/access_log Now the resulting files (FlumeData.xxxxxxxxxxxxx) are not readable. At least for human being. I guess they are in binary format. Now , my question is: is there a way to make or convert those files in an ascii / readable format for human being? Question no.2 I am trying to use the tailDir command without success. Here is the setting in flume,conf: agent1.sources.tail.command = tailDir("/opt/apache2/logs/") --> here is what I am getting as result...Can you help? 2012-11-29 16:48:17,548 (pool-6-thread-1) [ERROR - org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:284)] Failed while running command: tailDir("/opt/apache2/logs/") java.io.IOException: Cannot run program "tailDir("/opt/apache2/logs/")": java.io.IOException: error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:259) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.<init>(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) ... 7 more 2012-11-29 16:48:17,549 (pool-6-thread-1) [INFO - org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:307)] Command [tailDir("/opt/apache2/logs/")] exited with -1073741824 Many Thank!
-
Re: Flume and HDFS integrationBrock Noland 2012-11-29, 17:18
HI,
1) It's a sequence file, you can change it a text file if you want. See FileType here http://flume.apache.org/FlumeUserGuide.html#hdfs-sink 2) The "tailDir(" syntax is Flume 0.9 and is no long used. The first example has the correct syntax. Brock On Thu, Nov 29, 2012 at 11:14 AM, Emile Kao <[EMAIL PROTECTED]> wrote: > Dear support, > I would like to ask you some questions about issues I am facing trying to > implement Flume in a customer environment. > > I am using following release of Flume: apache-flume-1.4.0-SNAPSHOT-bin > > Here are my questions: > > Question no.1 > I have define the following command in flume.conf: > agent1.sources.tail.command = tail -F /opt/apache2/logs/access_log > > Now the resulting files (FlumeData.xxxxxxxxxxxxx) are not readable. At > least for human being. I guess they are in binary format. Now , my question > is: is there a way to make or convert those files in an ascii / readable > format for human being? > > > > Question no.2 > I am trying to use the tailDir command without success. Here is the > setting in flume,conf: > agent1.sources.tail.command = tailDir("/opt/apache2/logs/") > > > --> here is what I am getting as result...Can you help? > > 2012-11-29 16:48:17,548 (pool-6-thread-1) [ERROR - > org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:284)] > Failed while running command: tailDir("/opt/apache2/logs/") > java.io.IOException: Cannot run program "tailDir("/opt/apache2/logs/")": > java.io.IOException: error=2, No such file or directory > at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) > at > org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:259) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.IOException: java.io.IOException: error=2, No such file > or directory > at java.lang.UNIXProcess.<init>(UNIXProcess.java:148) > at java.lang.ProcessImpl.start(ProcessImpl.java:65) > at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) > ... 7 more > 2012-11-29 16:48:17,549 (pool-6-thread-1) [INFO - > org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:307)] > Command [tailDir("/opt/apache2/logs/")] exited with -1073741824 > > > Many Thank! > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
-
Re: Flume and HDFS integrationRoman Shaposhnik 2012-11-30, 01:17
On Thu, Nov 29, 2012 at 9:18 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
> HI, > > 1) It's a sequence file, you can change it a text file if you want. See > FileType here http://flume.apache.org/FlumeUserGuide.html#hdfs-sink Don't you also have to change a serialization format to get rid of the binary structure completely? IOW, you'd have to add something like: agent.sinks.hdfsSink.hdfs.serializer org.apache.flume.serialization.BodyTextEventSerializer ? Thanks, Roman.
-
Re: Flume and HDFS integrationBrock Noland 2012-11-30, 01:26
HI,
On Thu, Nov 29, 2012 at 7:17 PM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote: > On Thu, Nov 29, 2012 at 9:18 AM, Brock Noland <[EMAIL PROTECTED]> wrote: >> 1) It's a sequence file, you can change it a text file if you want. See >> FileType here http://flume.apache.org/FlumeUserGuide.html#hdfs-sink > > Don't you also have to change a serialization format to get rid of the binary > structure completely? IOW, you'd have to add something like: > agent.sinks.hdfsSink.hdfs.serializer > org.apache.flume.serialization.BodyTextEventSerializer BodyTextEventSerializer is the default serializer. Serializers decide how to turn Events into records while fileType decides what type of file the event is written to. Brock
-
Re: Flume and HDFS integrationEmile Kao 2012-11-30, 08:51
Hello Brock,
first of all thank you for answering my questions. I appreciate it since I am a real newbie in Flume / Hadoop , etc... But now I am confused. According to you statement, the filetype is the key here. Now just take a look on my flume.conf below: The filetype was from set to "DataStream". Now which is the right one now: SequenceFile, DataStream or CompressedStream? agent1.channels = MemoryChannel-2 agent1.channels.MemoryChannel-2.type = memory agent1.sources = tail agent1.sources.tail.channels = MemoryChannel-2 agent1.sources.tail.type = exec agent1.sources.tail.command = tail -F /opt/apache2/logs/access_log agent1.sinks = HDFS agent1.sinks.HDFS.channel = MemoryChannel-2 agent1.sinks.HDFS.type = hdfs agent1.sinks.HDFS.hdfs.file.Type = DataStream agent1.sinks.HDFS.hdfs.path = hdfs://localhost:9000 #agent1.sinks.HDFS.hdfs.path = /mnt/hdfs/data agent1.sinks.HDFS.hdfs.writeFormat = Text Many Thanks, Emile -------- Original-Nachricht -------- > Datum: Thu, 29 Nov 2012 19:26:37 -0600 > Von: Brock Noland <[EMAIL PROTECTED]> > An: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Betreff: Re: Flume and HDFS integration > HI, > > On Thu, Nov 29, 2012 at 7:17 PM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote: > > On Thu, Nov 29, 2012 at 9:18 AM, Brock Noland <[EMAIL PROTECTED]> > wrote: > >> 1) It's a sequence file, you can change it a text file if you want. See > >> FileType here http://flume.apache.org/FlumeUserGuide.html#hdfs-sink > > > > Don't you also have to change a serialization format to get rid of the > binary > > structure completely? IOW, you'd have to add something like: > > agent.sinks.hdfsSink.hdfs.serializer > > org.apache.flume.serialization.BodyTextEventSerializer > > BodyTextEventSerializer is the default serializer. Serializers decide > how to turn Events into records while fileType decides what type of > file the event is written to. > > Brock
-
Re: Flume and HDFS integrationBrock Noland 2012-11-30, 12:40
Hi,
On Fri, Nov 30, 2012 at 2:51 AM, Emile Kao <[EMAIL PROTECTED]> wrote: > agent1.sinks.HDFS.hdfs.file.Type = DataStream Its fileType not file.Type :) Exact text is located here http://flume.apache.org/FlumeUserGuide.html#hdfs-sink Cheers! Brock
-
Re: Flume and HDFS integrationRoman Shaposhnik 2012-12-01, 00:40
On Fri, Nov 30, 2012 at 12:51 AM, Emile Kao <[EMAIL PROTECTED]> wrote:
> Hello Brock, > first of all thank you for answering my questions. I appreciate it since I am a real newbie in Flume / Hadoop , etc... > > But now I am confused. According to you statement, the filetype is the key here. Now just take a look on my flume.conf below: > The filetype was from set to "DataStream". > Now which is the right one now: SequenceFile, DataStream or CompressedStream? Here's what works for me in the situation very similar to yours: # Sink configuration agent.sinks.sink1.type = hdfs agent.sinks.sink1.hdfs.path = /flume/cluster-logs agent.sinks.sink1.hdfs.writeFormat = Text agent.sinks.sink1.hdfs.fileType = DataStream agent.sinks.sink1.hdfs.filePrefix = events- agent.sinks.sink1.hdfs.round = true agent.sinks.sink1.hdfs.roundValue = 10 agent.sinks.sink1.hdfs.roundUnit = minute # agent.sinks.sink1.hdfs.serializer org.apache.flume.serialization.BodyTextEventSerializer Thanks, Roman.
-
Re: Flume and HDFS integrationEmile Kao 2012-12-03, 09:51
Hi Brock,
that was the mistake. Thank you! Cheers, Emile -------- Original-Nachricht -------- > Datum: Fri, 30 Nov 2012 06:40:41 -0600 > Von: Brock Noland <[EMAIL PROTECTED]> > An: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Betreff: Re: Flume and HDFS integration > Hi, > > On Fri, Nov 30, 2012 at 2:51 AM, Emile Kao <[EMAIL PROTECTED]> wrote: > > agent1.sinks.HDFS.hdfs.file.Type = DataStream > > Its fileType not file.Type :) Exact text is located here > http://flume.apache.org/FlumeUserGuide.html#hdfs-sink > > Cheers! > Brock |