Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - How to solve one Scenario in hadoop ?


Copy link to this message
-
How to solve one Scenario in hadoop ?
samir das mohapatra 2013-03-05, 05:27
Hi All,
   I have  one scenario  where our organization is trying to implement
hadoop.

Scenario Statement:

---------------------------------------

    Supoose  we have variouse data sources , for example RDBMS, HDFS,
Streaming .
*Source Dataset Types :*

 1.Single Source

2.Joining Sources

3.Filtered Data set

4.Specific columns
We nee to pull the data from one source to other , it could be from HDFS to
RDBMS or vice versa based on condition , that means out of whole data from
source  we need only the specific data,whole data,join data  into the
destination . So which direction we should go to pull the data based on the
above dataset type condition.
I am thinking .

 CASE-1   DATA  from HDFS to HDFS (different cluster) whole data
           :-  we will use *distcp  *

CASE-2    DATA  from HDFS to HDFS (different cluster) conditional data
(Filter data) :-  we will use  *CUSTOM MAP REDUCE PROGRAM Where we will do
the filter operation then load*

CASE-3    DATA from HDFS to RDBMS(Whole data): *SQOOP*

CASE-4   DATA from HDFS to RDBMS(conditional data): *SQOOP*

CASE-5   SOME DATA  FROM RDBMS and SOME DATA FROM HDFS then do filter and
load into HDFS : *JDBC WITH Map/Reduce program*
Note: Can any one suggest me, if I am wrong and we need to do something
other then this, which will be easy to do .
Regards,

samir.