Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Can HDFSSink write headers as well?


Copy link to this message
-
Re: Can HDFSSink write headers as well?
On Tue, Aug 21, 2012 at 8:16 PM, ashutosh(오픈플랫폼개발팀)
<[EMAIL PROTECTED]>wrote:

>  Hi All,
>
>
>
> I am using the “avro_event” serializer  with writable format as DataStream
> file type to store the events into hdfs.
>
> I would like to read the file for further analysis. I am new to avro and
> don’t have idea; how to develop the de-serializer to read the flume’s
> events written in hdfs file.
>
>
>
> If anyone could share the sample or example, it would be nice to me.
> Please help….
>
>
>

Look at this test to see how to read data. But in general you would want to
create your own serializer specific to your schema. Otherwise it makes
sense to just use sequence files.

http://svn.apache.org/repos/asf/flume/trunk/flume-ng-core/src/test/java/org/apache/flume/serialization/TestFlumeEventAvroEventSerializer.java
>  Thanks & Regards,
>
> Ashutosh Sharma
>
>
>
> *From:* Bhaskar V. Karambelkar [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, August 22, 2012 12:22 AM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Can HDFSSink write headers as well?
>
>
>
>
>
> On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー <[EMAIL PROTECTED]>
> wrote:
>
> Hi David,
>
> Currently there is no way to write headers to HDFS using the built-in
> Flume functionality.
>
>
>
> This is not entirely true, the following combination will write headers to
> HDFS, in an avro_data file format (binary).
>
>
>
> agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream
>
> agent.sinks.hdfsBinarySink.serializer =  avro_client
>
> agent.sinks.hdfsBinarySink.hdfs.writeFormat =  writable
>
>
>
> The serializer used is part of flume distribution viz.
>
>
> flume-ng-core/src/main/java/org/apache/flume/serialization/FlumeEventAvroEventSerializer.java
>
>
>
> A file thus written can be processed with AVRO mapreduce API found in AVRO
> distribution.
>
>
>
> Also note that simply using DataStream doesn't mean it's a text file, the
> serializer and hdfs.writeFormat also decide
>
> whether the file is text or binary.
>
>
>
> I've read the entire HDFS sink code and exprimented with it a lot, so if
> you want more details, let me know.
>
>
>
>
>
>
> If you are writing to text or binary files on HDFS (i.e. you have set
> hdfs.fileType = DataStream or CompressedStream in your config), then you
> can supply your own custom serializer, which will allow you to write
> headers to HDFS. You will need to write a serializer that implements
> org.apache.flume.serialization.EventSerializer.
>
> If, on the other hand, you are writing to HDFS SequenceFiles, then
> unfortunately there is no way to customize the way that events are
> serialized, so you cannot write event headers to HDFS. This is a known
> issue (FLUME-1100) and I have supplied a patch to fix it.
>
> Chris.
>
>
>
>
> On 2012/08/21 11:36, David Capwell wrote:
>
> I was wondering if I pass random data to an event's header, can the
> HDFSSink write it to HDFS?  I know it can use the headers to split the data
> into different paths, but what about writing the data to HDFS itself?
>
> thanks for your time reading this email.
>
>
>
>
>
>
> 이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나 저작권을 포함하고 있을 수 있습니다. 어떠한 권한 없이, 본 문서에
> 포함된 정보의 전부 또는 일부를 무단으로 제3자에게 공개, 배포, 복사 또는 사용하는 것을 엄격히 금지합니다. 만약, 본 메일이 잘못
> 전송된 경우, 발신인 또는 당사에 알려주시고, 본 메일을 즉시 삭제하여 주시기 바랍니다.
> This E-mail may contain confidential information and/or copyright
> material. This email is intended for the use of the addressee only. If you
> receive this email by mistake, please either delete it without reproducing,
> distributing or retaining copies thereof or notify the sender immediately.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB