|
|
-
How to solve one Scenario in hadoop ?samir das mohapatra 2013-03-05, 05:27
Hi All,
I have one scenario where our organization is trying to implement hadoop. Scenario Statement: --------------------------------------- Supoose we have variouse data sources , for example RDBMS, HDFS, Streaming . *Source Dataset Types :* 1.Single Source 2.Joining Sources 3.Filtered Data set 4.Specific columns We nee to pull the data from one source to other , it could be from HDFS to RDBMS or vice versa based on condition , that means out of whole data from source we need only the specific data,whole data,join data into the destination . So which direction we should go to pull the data based on the above dataset type condition. I am thinking . CASE-1 DATA from HDFS to HDFS (different cluster) whole data :- we will use *distcp * CASE-2 DATA from HDFS to HDFS (different cluster) conditional data (Filter data) :- we will use *CUSTOM MAP REDUCE PROGRAM Where we will do the filter operation then load* CASE-3 DATA from HDFS to RDBMS(Whole data): *SQOOP* CASE-4 DATA from HDFS to RDBMS(conditional data): *SQOOP* CASE-5 SOME DATA FROM RDBMS and SOME DATA FROM HDFS then do filter and load into HDFS : *JDBC WITH Map/Reduce program* Note: Can any one suggest me, if I am wrong and we need to do something other then this, which will be easy to do . Regards, samir. |