Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - incremental loads into hadoop


Copy link to this message
-
Re: incremental loads into hadoop
Bejoy KS 2011-10-01, 19:19
Sam
      Try looking into Flume if you need to load incremental data into hdfs
. If the source data is present on some JDBC compliant data bases then you
can use SQOOP to get in the data directly into hdfs or hive incrementally.
For Big Data Aggregation and Analytics Hadoop is definitely a good choice,
as you can use Map Reduce or optimized tools on top of map reduce like hive
or pig that would cater the purpose very well. So in short for the two steps
you can go in with the following
1. Load into hadoop/hdfs - Use Flume or SQOOP as per your source
2. Process within hadoop/hdfs - Use Hive or Pig. These tools are well
optimised so go in for a custom map reduce if and only if you feel these
tools don't fit into some complex processing.

There may be other tools as well to get the source data into hdfs. Let us
leave it open for others to comment.

Hope It helps.

Thanks and Regards
Bejoy.K.S
On Sat, Oct 1, 2011 at 4:32 AM, Sam Seigal <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am relatively new to Hadoop and was wondering how to do incremental
> loads into HDFS.
>
> I have a continuous stream of data flowing into a service which is
> writing to an OLTP store. Due to the high volume of data, we cannot do
> aggregations on the OLTP store, since this starts affecting the write
> performance.
>
> We would like to offload this processing into a Hadoop cluster, mainly
> for doing aggregations/analytics.
>
> The question is how can this continuous stream of data be
> incrementally loaded and processed into Hadoop ?
>
> Thank you,
>
> Sam
>