Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> streaming data ingest into HDFS


Copy link to this message
-
Re: streaming data ingest into HDFS
Just curious - what is the situation you're in where no collectors are
possible?  Sounds interesting.

Russell Jurney
twitter.com/rjurney
[EMAIL PROTECTED]
datasyndrome.com

On Dec 15, 2011, at 5:01 PM, "Periya.Data" <[EMAIL PROTECTED]> wrote:

> Hi all,
>     I would like to know what options I have to ingest terabytes of data
> that are being generated very fast from a small set of sources. I have
> thought about :
>
>   1. Flume
>   2. Have an intermediate staging server(s) where you can offload data and
>   from there use dfs -put to load into HDFS.
>   3. Anything else??
>
> Suppose I am unable to use Flume (since the sources do not support their
> installation) and suppose that I do not have the luxury of having an
> intermediate staging place, what options do I have? In this case, I might
> have to directly (preferably in parallel) ingest data into HDFS.
>
> I have read about a technique to use Map-Reduce where the map would read
> data and use JAVA API to store in HDFS. We could have multiple threads of
> maps to get parallel ingestion. It would be nice to know about ways to
> ingest data "directly" into HDFS considering my assumptions.
>
> Suggestions are appreciated,
>
> /PD.