Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: How to solve one Scenario in hadoop ?


Copy link to this message
-
Re: How to solve one Scenario in hadoop ?
Vikas Jadhav 2013-03-06, 19:46
I will go with first case because if data size is large then it will
distribute data across multiple nodes.
On Tue, Mar 5, 2013 at 10:57 AM, samir das mohapatra <
[EMAIL PROTECTED]> wrote:

> Hi All,
>    I have  one scenario  where our organization is trying to implement
> hadoop.
>
> Scenario Statement:
>
> ---------------------------------------
>
>     Supoose  we have variouse data sources , for example RDBMS, HDFS,
> Streaming .
>
>
>  *Source Dataset Types :*
>
>  1.Single Source
>
> 2.Joining Sources
>
> 3.Filtered Data set
>
> 4.Specific columns
>
>
> We nee to pull the data from one source to other , it could be from HDFS
> to RDBMS or vice versa based on condition , that means out of whole data
> from source  we need only the specific data,whole data,join data  into the
> destination . So which direction we should go to pull the data based on the
> above dataset type condition.
>
>
> I am thinking .
>
>  CASE-1   DATA  from HDFS to HDFS (different cluster) whole data
>            :-  we will use *distcp  *
>
> CASE-2    DATA  from HDFS to HDFS (different cluster) conditional data
> (Filter data) :-  we will use  *CUSTOM MAP REDUCE PROGRAM Where we will
> do the filter operation then load*
>
> CASE-3    DATA from HDFS to RDBMS(Whole data): *SQOOP*
>
> CASE-4   DATA from HDFS to RDBMS(conditional data): *SQOOP*
>
> CASE-5   SOME DATA  FROM RDBMS and SOME DATA FROM HDFS then do filter and
> load into HDFS : *JDBC WITH Map/Reduce program*
>
>
> Note: Can any one suggest me, if I am wrong and we need to do something
> other then this, which will be easy to do .
>
>
> Regards,
>
> samir.
>
>
>
>
--
*
*
*

Thanx and Regards*
* Vikas Jadhav*
+
Dino Kečo 2013-03-06, 19:53