|
|
-
Re: Data LineageTzur Turkenitz 2013-02-05, 15:10
Thank you, Connor.
>From what I understand I can use a serializer to write the data in my own format. The language in the documantation is a bit vauge, so if you could Connor help me with the following question: For a scenario where I know my logs files are delimited by \t, I would like to add a column at the start of every event row which indicates the Timestamp and FileName. can this be done by a Serializer? If it's possible I'll send it to our Java devs :) On Mon, Feb 4, 2013 at 8:51 PM, Connor Woodson <[EMAIL PROTECTED]>wrote: > You will want to look at the Serializer > <http://flume.apache.org/FlumeUserGuide.html#event-serializers>component. > The default serializer is TEXT, which will only write out the body of your > event discarding all headers. You can switch to one of the other > serializers, or if none of them suit your purpose you are able to create > your own that, for instance, could write the event in JSON format thus > preserving the headers. > > (Only two serializers are currently documented. You can see here<https://github.com/apache/flume/tree/trunk/flume-ng-core/src/main/java/org/apache/flume/serialization>all of the ones currently in Flume (it looks like there's only one > additional one there, and it might be exactly what you're looking for)). > > If you want more detail on creating a custom serializer, or how to use one > of the existing ones, please ask. > > - Connor > > > On Mon, Feb 4, 2013 at 7:38 AM, Tzur Turkenitz <[EMAIL PROTECTED]> wrote: > >> Hello All,**** >> >> ** ** >> >> In my company we are worried about data lineage. Big files can be split >> into smaller files (block size) inside HDFS, and smaller files can be >> aggregated into larger files. We want to have some kind of control >> regarding data lineage and the ability to map source files to files in >> HDFS. Using interceptors we can add various keys like timestamp, static, >> file header etc.**** >> >> ** ** >> >> After a file has been processed and inserted into HDFS, do those keys >> still exist and viewable if I choose to cat the file in HADOOP? (I did cat >> the files and didn’t see any of the keys) Or the keys only exist during the >> process and are not saved into the file.**** >> >> ** ** >> >> Alternatively is it possible to append those keys into the file using >> Flume's built in component?**** >> >> ** ** >> >> I appreciate the help,**** >> >> Tzur**** >> >> ** ** >> > > -- Regards, Tzur Turkenitz Vision.BI http://www.vision.bi/ "*Facts are stubborn things, but statistics are more pliable*" -Mark Twain |