-Re: streaming data ingest into HDFS
Russell Jurney 2011-12-16, 01:05
Just curious - what is the situation you're in where no collectors are
possible? Sounds interesting.
On Dec 15, 2011, at 5:01 PM, "Periya.Data" <[EMAIL PROTECTED]> wrote:
> Hi all,
> I would like to know what options I have to ingest terabytes of data
> that are being generated very fast from a small set of sources. I have
> thought about :
> 1. Flume
> 2. Have an intermediate staging server(s) where you can offload data and
> from there use dfs -put to load into HDFS.
> 3. Anything else??
> Suppose I am unable to use Flume (since the sources do not support their
> installation) and suppose that I do not have the luxury of having an
> intermediate staging place, what options do I have? In this case, I might
> have to directly (preferably in parallel) ingest data into HDFS.
> I have read about a technique to use Map-Reduce where the map would read
> data and use JAVA API to store in HDFS. We could have multiple threads of
> maps to get parallel ingestion. It would be nice to know about ways to
> ingest data "directly" into HDFS considering my assumptions.
> Suggestions are appreciated,