Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Saving data in db instead of hdfs


+
Chengi Liu 2013-05-02, 21:03
+
Ahmed Radwan 2013-05-02, 21:17
Copy link to this message
-
Re: Saving data in db instead of hdfs
Hi,

just use Sqoop to push the data from HDFS to a database via JDBC.

Intro to Sqoop:
http://blog.cloudera.com/blog/2009/06/introducing-sqoop/

Or even use Hive-JDBC to connect to your result data from outside the
hadoop cluster.

You can also create your own OutputFormat (with Java API), which writes
data directly to the database, but be careful
with large result sets or even with a large number of reducers. This could
be a scalability issue, but a small
dataset coming out from one reducer can be handled that way.

OutputFormat and Streaming API:
http://blog.aggregateknowledge.com/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/

Best wishes
Mirko
2013/5/2 Chengi Liu <[EMAIL PROTECTED]>

> Hi,
>  I am using hadoop streaming api (python) for some processing.
> While I want the data to be processed via hadoop but I want to pipe it to
> db instead of hdfs.
> How do I do this?
> THanks
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB