Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: How to solve one Scenario in hadoop ?


Copy link to this message
-
Re: How to solve one Scenario in hadoop ?
I will go with first case because if data size is large then it will
distribute data across multiple nodes.
On Tue, Mar 5, 2013 at 10:57 AM, samir das mohapatra <
[EMAIL PROTECTED]> wrote:

> Hi All,
>    I have  one scenario  where our organization is trying to implement
> hadoop.
>
> Scenario Statement:
>
> ---------------------------------------
>
>     Supoose  we have variouse data sources , for example RDBMS, HDFS,
> Streaming .
>
>
>  *Source Dataset Types :*
>
>  1.Single Source
>
> 2.Joining Sources
>
> 3.Filtered Data set
>
> 4.Specific columns
>
>
> We nee to pull the data from one source to other , it could be from HDFS
> to RDBMS or vice versa based on condition , that means out of whole data
> from source  we need only the specific data,whole data,join data  into the
> destination . So which direction we should go to pull the data based on the
> above dataset type condition.
>
>
> I am thinking .
>
>  CASE-1   DATA  from HDFS to HDFS (different cluster) whole data
>            :-  we will use *distcp  *
>
> CASE-2    DATA  from HDFS to HDFS (different cluster) conditional data
> (Filter data) :-  we will use  *CUSTOM MAP REDUCE PROGRAM Where we will
> do the filter operation then load*
>
> CASE-3    DATA from HDFS to RDBMS(Whole data): *SQOOP*
>
> CASE-4   DATA from HDFS to RDBMS(conditional data): *SQOOP*
>
> CASE-5   SOME DATA  FROM RDBMS and SOME DATA FROM HDFS then do filter and
> load into HDFS : *JDBC WITH Map/Reduce program*
>
>
> Note: Can any one suggest me, if I am wrong and we need to do something
> other then this, which will be easy to do .
>
>
> Regards,
>
> samir.
>
>
>
>
--
*
*
*

Thanx and Regards*
* Vikas Jadhav*
+
Dino Kečo 2013-03-06, 19:53
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB