Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Flume for multi KB or MB docs?


+
Otis Gospodnetic 2012-10-15, 00:49
+
Mike Percy 2012-10-15, 22:15
+
Otis Gospodnetic 2012-10-16, 03:14
Copy link to this message
-
Re: Flume for multi KB or MB docs?
Mike Percy 2012-10-16, 05:50
Otis,
Yes those are my concerns, but 10MB might be ok. You will have to tune your
batch sizes to a lower range and watch out for GC but if you give the
process enough RAM it should work.

If you go that route, please let us know how it goes!

Regards
Mike

On Mon, Oct 15, 2012 at 8:14 PM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> Hi Mike,
>
> Thanks for the info!  Our docs, however, are not quite 100MB - more like
> 5MB max and most of the time under 10KB.  Would you still say Flume is not
> the right tool for the job?  If so, what is the main concern?  Is it about
> the number of documents Flume will keep in memory at any one time and thus
> require a potentially large heap and still risk OOMing?  Or is the main
> concern that writing such "large" documents to disk will be slow?
>
> My documents need to end up in Solr or ElasticSearch and maybe also in
> HDFS, so I was hoping I could get ES and HDFS sinks from Flume for free.
>
> Otis
> ----
> Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
>
>   ------------------------------
> *From:* Mike Percy <[EMAIL PROTECTED]>
> *To:* [EMAIL PROTECTED]; Otis Gospodnetic <[EMAIL PROTECTED]>
>
> *Sent:* Monday, October 15, 2012 6:15 PM
> *Subject:* Re: Flume for multi KB or MB docs?
>
> Hi Otis,
> Flume was designed as a streaming event transport system, not as a general
> purpose file transfer system. The two have quite different characteristics,
> so while binary files could be transported by Flume, if you tried to
> transport a 100MB PDF as a single event you may have issues around memory
> allocation, GC, transfer speed, etc., since we hold at least one event at a
> time in memory. However if you want to transfer a large log file and each
> line is an event then it's a perfect use case because you care about the
> individual events more than the file itself.
>
> For transferring very large binary files that are not events or records,
> you may want to look for something that it good at being a single-hop
> system with resume capability, like rsync, to transfer the files. Then I
> suppose you could use the hadoop fs shell and a small script to store the
> data onto HDFS. You probably wouldn't need all the fancy tagging, routing,
> and serialization features that Flume has.
>
> Hope this helps.
>
> Regards
> Mike
>
> On Sun, Oct 14, 2012 at 5:49 PM, Otis Gospodnetic <
> [EMAIL PROTECTED]> wrote:
>
> Hi,
>
> We're considering using Flume for transport of potentially large
> "documents" (think documents that can be as small as tweets or as large as
> PDF files).
>
> I'm wondering if Flume is suitable for transporting potentially large
> documents (in the most reliable mode, too) or if there is something
> inherent in Flume that makes it a poor choice for this use case?
>
> Thanks,
> Otis
> ----
> Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
>
>
>
>
>