Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Transforming 1 event to n events


Copy link to this message
-
Re: Transforming 1 event to n events
On Mon, Aug 13, 2012 at 3:34 PM, Jeremy Custenborder <
[EMAIL PROTECTED]> wrote:

> I need to have the multiple objects available to
> hive. The upstream object is actually a protobuf with hierarchy. I was
> planning on flattening the object for hive. Here is an example of what
> I'm collecting. The actual protobuf has many more fields, but this
> gives you an idea.
>
> requestid
> page
> timestamp
> useragent
> impressions =[12345, 43212,12344,12345,43122, etc]
>
> transforming for each impression.
>
> requestid
> page
> timestamp
> useragent
> index
> objectid
>
> This gives me one row in hive per impression. This might be a little
> more contextual. I picked the earlier example because I didn't want to
> get caught up in my use case.  I could move this code to serializers
> buy I need to do similar logic twice since I'm incrementing a counter
> in hbase per impression and adding a row per impression in hdfs(hive).
> If I transformed the event to multiple events earlier in the pipe. I
> would only have to write code to generate keys per event. At this
> point I'm going to implement two serializers. One to handle hdfs and
> one for hbase.
>

Hi Jeremy,

Thanks for the extra color. It's an interesting flow. As more people
continue to adopt Flume, I think we'll start to see patterns where the
design or implementation of Flume is lacking and we can work towards
bridging those gaps, and your use case provides valuable data on that. As
for where we are now, I'm happy to hear that you have found a way forward.

If you can keep us apprised as things progress with your Flume deployment I
would love to hear about it!

Regards,
Mike
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB