Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Seeking advice over choice of language and implementation


Copy link to this message
-
Re: Seeking advice over choice of language and implementation
Sunita,

Depending on your level of comfort, you can do one of the following:

1. Use Python to fetch your data and then send the events via HTTP to the
Flume HTTP Source [1]
2. Use Java to create a custom source [6] in Flume that handles the data
fetching and then puts it in a channel [3] so that it can be funneled into
the sinks [4] and [5]

Option 1 would be easier for you since you can get the data in Python and
just stream it down via HTTP to Flume.

Option 2 will be more involved since you need to write code that
communicates with external endpoints.

References
[1] http://goo.gl/5lHlg
[2] http://goo.gl/GnVbE
[3] http://goo.gl/t31Xh
[4] http://goo.gl/G9xS8
[5] http://goo.gl/Wn4W5
[6] http://goo.gl/Q0yyn
*Author and Instructor for the Upcoming Book and Lecture Series*
*Massive Log Data Aggregation, Processing, Searching and Visualization with
Open Source Software*
*http://massivelogdata.com*
On 18 July 2013 13:38, Sunita Arvind <[EMAIL PROTECTED]> wrote:

> Hello friends,
>
> I am new to flume and have written a python script to fetch some data from
> social media. My response is JSON. I am seeking help on following issues:
> 1. I am finding it hard to make python and flume talk. Is it just my
> ignorance or it is indeed a long route? AFAIK, I need to understand thrift
> API and Avro etc to achieve this. I also read about pipes. Would this be a
> simple implementation
>
> 2. I am equally comfortable (uncomfortable) in java. Hence wondering if
> its better to re-write my application in Java so that I can easily
> integrate it with flume. Are there any advantages of having a java
> application, as all of hadoop is java?
>
> 3. I need to schedule the agent to run on a daily basis. Which of the
> above approaches would help me achieve this easily?
>
> 4. Going by this -
> http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[EMAIL PROTECTED]%3Elooks like we need to manually clean up disk space even with flume. I am
> not clear on the advantages I would have with flume over using a simple
> cron job to do the task. I can manually write statements like "hadoop fs
> -put <location of output file on local> <location on hdfs>" in the cron job
> instead.
>
> Appreciate your help and guidance
>
> regards,
> Sunita
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB