Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Writing MR-Job: Something like OracleReducer, JDBCReducer ...


+
Steinmaurer Thomas 2011-09-16, 05:25
+
Sonal Goyal 2011-09-16, 07:22
+
Michel Segel 2011-09-16, 09:05
+
Sonal Goyal 2011-09-16, 16:06
+
Steinmaurer Thomas 2011-09-19, 05:35
+
Sonal Goyal 2011-09-16, 16:11
+
Michael Segel 2011-09-16, 17:05
Copy link to this message
-
Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...
Hi Michael,

Yes, thanks, I understand the fact that reducers can be expensive with all
the shuffling and the sorting, and you may not need them always. At the same
time, there are many cases where reducers are useful, like secondary
sorting. In many cases, one can have multiple map phases and not have a
reduce phase at all. Again, there will be many cases where one may want a
reducer, say trying to count the occurrence of words in a particular column.
With this thought chain, I do not feel ready to say that when dealing with
HBase, I really dont want to use a reducer. Please correct me if I am
wrong.

Thanks again.

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>

On Fri, Sep 16, 2011 at 10:35 PM, Michael Segel
<[EMAIL PROTECTED]>wrote:

>
> Sonal,
>
> Just because you have a m/r job doesn't mean that you need to reduce
> anything. You can have a job that contains only a mapper.
> Or your job runner can have a series of map jobs in serial.
>
> Most if not all of the map/reduce jobs where we pull data from HBase, don't
> require a reducer.
>
> To give you a simple example... if I want to determine the table schema
> where I am storing some sort of structured data...
> I just write a m/r job which opens a table, scan's the table counting the
> occurrence of each column name via dynamic counters.
>
> There is no need for a reducer.
>
> Does that help?
>
>
> > Date: Fri, 16 Sep 2011 21:41:01 +0530
> > Subject: Re: Writing MR-Job: Something like OracleReducer, JDBCReducer
> ...
> > From: [EMAIL PROTECTED]
> > To: [EMAIL PROTECTED]
> >
> > Michel,
> >
> > Sorry can you please help me understand what you mean when you say that
> when
> > dealing with HBase, you really dont want to use a reducer? Here, Hbase is
> > being used as the input to the MR job.
> >
> > Thanks
> > Sonal
> >
> >
> > On Fri, Sep 16, 2011 at 2:35 PM, Michel Segel <[EMAIL PROTECTED]
> >wrote:
> >
> > > I think you need to get a little bit more information.
> > > Reducers are expensive.
> > > When Thomas says that he is aggregating data, what exactly does he
> mean?
> > > When dealing w HBase, you really don't want to use a reducer.
> > >
> > > You may want to run two map jobs and it could be that just dumping the
> > > output via jdbc makes the most sense.
> > >
> > > We are starting to see a lot of questions where the OP isn't providing
> > > enough information so that the recommendation could be wrong...
> > >
> > >
> > > Sent from a remote device. Please excuse any typos...
> > >
> > > Mike Segel
> > >
> > > On Sep 16, 2011, at 2:22 AM, Sonal Goyal <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > There is a DBOutputFormat class in the
> org.apache,hadoop.mapreduce.lib.db
> > > > package, you could use that. Or you could write to the hdfs and then
> use
> > > > something like HIHO[1] to export to the db. I have been working
> > > extensively
> > > > in this area, you can write to me directly if you need any help.
> > > >
> > > > 1. https://github.com/sonalgoyal/hiho
> > > >
> > > > Best Regards,
> > > > Sonal
> > > > Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> > > > Nube Technologies <http://www.nubetech.co>
> > > >
> > > > <http://in.linkedin.com/in/sonalgoyal>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Sep 16, 2011 at 10:55 AM, Steinmaurer Thomas <
> > > > [EMAIL PROTECTED]> wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >>
> > > >>
> > > >> writing a MR-Job to process HBase data and store aggregated data in
> > > >> Oracle. How would you do that in a MR-job?
> > > >>
> > > >>
> > > >>
> > > >> Currently, for test purposes we write the result into a HBase table
> > > >> again by using a TableReducer. Is there something like a
> OracleReducer,
> > > >> RelationalReducer, JDBCReducer or whatever? Or should one simply use
> > > >> plan JDBC code in the reduce step?
> > > >>
> > > >>
+
Michael Segel 2011-09-16, 18:43
+
Chris Tarnas 2011-09-16, 18:58
+
Doug Meil 2011-09-16, 19:41
+
Michael Segel 2011-09-16, 20:24
+
Steinmaurer Thomas 2011-09-19, 05:41
+
Doug Meil 2011-09-19, 13:35
+
Steinmaurer Thomas 2011-09-19, 13:44
+
Michael Segel 2011-09-16, 20:11
+
Chris Tarnas 2011-09-16, 21:54
+
Chris Tarnas 2011-09-16, 22:34
+
Sam Seigal 2011-09-17, 00:16
+
Doug Meil 2011-09-17, 00:22
+
Doug Meil 2011-09-17, 00:24
+
Sam Seigal 2011-09-17, 01:00
+
Doug Meil 2011-09-17, 01:14
+
Sam Seigal 2011-09-17, 01:39
+
Sam Seigal 2011-09-17, 01:44
+
Doug Meil 2011-09-17, 01:47
+
Michel Segel 2011-09-17, 13:12
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB