Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Seeking advice over choice of language and implementation


Copy link to this message
-
Re: Seeking advice over choice of language and implementation
On Thu, Jul 18, 2013 at 11:08 PM, Sunita Arvind <[EMAIL PROTECTED]>wrote:

> Hello friends,
>
> I am new to flume and have written a python script to fetch some data from
> social media. My response is JSON. I am seeking help on following issues:
> 1. I am finding it hard to make python and flume talk. Is it just my
> ignorance or it is indeed a long route? AFAIK, I need to understand thrift
> API and Avro etc to achieve this. I also read about pipes. Would this be a
> simple implementation
>

Python would work fine. As said, you can use HTTP Source. Alternatively,
you can also use Avro source using Avro's python client
>
> 2. I am equally comfortable (uncomfortable) in java. Hence wondering if
> its better to re-write my application in Java so that I can easily
> integrate it with flume. Are there any advantages of having a java
> application, as all of hadoop is java?
>

The advantage would be that you can use Flume's Client SDK, reducing a lot
of work. IMHO, it doesn't matter to Flume as to who is pushing the data
>
> 3. I need to schedule the agent to run on a daily basis. Which of the
> above approaches would help me achieve this easily?
>

Looks like you have a batch job which would execute at a point of time
during the day. If that's the case, please have a re-look if you need
Flume. Flume can definitely be used, but you could directly do a load on
HDFS. Again, cannot conclude based on the information provided.
>
> 4. Going by this -
> http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[EMAIL PROTECTED]%3Elooks like we need to manually clean up disk space even with flume. I am
> not clear on the advantages I would have with flume over using a simple
> cron job to do the task. I can manually write statements like "hadoop fs
> -put <location of output file on local> <location on hdfs>" in the cron job
> instead.
>

The ML thread pointed is related to RollingFileSink, not HDFS sink, so it's
not valid in context of HDFS sink.

HTH !
>
> Appreciate your help and guidance
>
> regards,
> Sunita
>

--
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB