Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Data Lineage


Copy link to this message
-
Data Lineage
Tzur Turkenitz 2013-02-04, 15:38
Hello All,

 

In my company we are worried about data lineage. Big files can be split into
smaller files (block size) inside HDFS, and smaller files can be aggregated
into larger files. We want to have some kind of control regarding data
lineage and the ability to map source files to files in HDFS. Using
interceptors we can add various keys like timestamp, static, file header
etc.

 

After a file has been processed and inserted into HDFS, do those keys still
exist and viewable if I choose to cat the file in HADOOP? (I did cat the
files and didn't see any of the keys) Or the keys only exist during the
process and are not saved into the file.

 

Alternatively is it possible to append those keys into the file using
Flume's built in component?

 

I appreciate the help,

Tzur

 

+
Connor Woodson 2013-02-05, 01:51
+
Tzur Turkenitz 2013-02-05, 15:10
+
Roshan Naik 2013-02-05, 19:49