Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Import data from mysql


+
Brian McSweeney 2011-01-08, 23:33
+
Sonal Goyal 2011-01-09, 02:57
+
Konstantin Boudnik 2011-01-09, 03:18
+
Brian McSweeney 2011-01-09, 13:21
+
arvind@...) 2011-01-09, 21:37
Copy link to this message
-
Re: Import data from mysql
Hey Brian,

One final point about Sqoop: it's a part of Cloudera's Distribution for
Hadoop, so it's Apache 2.0 licensed and tightly integrated with the other
platform components. This means, for example, that we have added a Sqoop
action to Oozie, which makes integrating data import and export into your
workflows trivial; see
http://archive.cloudera.com/cdh/3/oozie-2.2.1+82/WorkflowActionExtensionsSpec.html#AE.2_Sqoop_Actionfor
more details.

For further discussion of Sqoop, I'd recommend using the Sqoop user list at
https://groups.google.com/a/cloudera.org/group/sqoop-user. For questions
about CDH in general, see
https://groups.google.com/a/cloudera.org/group/cdh-user.

Regards,
Jeff

On Sun, Jan 9, 2011 at 1:37 PM, [EMAIL PROTECTED] <[EMAIL PROTECTED]>wrote:

> Hi Brian,
>
> Sqoop supports incremental imports that can be run against a live database
> system on a daily basis for importing the new data. Unless your data is
> large and cannot be split into comparable slices for parallel imports, I do
> not see any concerns regarding performance.
>
> Regarding the database library you have pointed out, it is fundamentally
> very close to what Sqoop does. However, Sqoop goes way beyond these
> libraries to ensure that you can address your use-case out of the box
> without having to modify anything. If on the other hand, you are more
> inclined to coding your own solution, then perhaps the other tools or these
> low leve APIs may come in handy.
>
> Arvind
>
> On Sun, Jan 9, 2011 at 5:21 AM, Brian McSweeney
> <[EMAIL PROTECTED]>wrote:
>
> > Thanks Konstantin,
> >
> > I had seen sqoop. I wonder is it normally used as a once off process or
> can
> > it also be effectively used on a live database system on a daily basis to
> > batch export. Are there performance issues with this approach? Or how
> would
> > it compare to some of the other classes that I have seen such as those in
> > the database library
> http://hadoop.apache.org/mapreduce/docs/current/api/
> >
> > I have also seen a few alternatives out there such as cascading and
> > cascading-dbmigrate
> >
> > http://architects.dzone.com/articles/tools-moving-sql-database
> >
> > But from the hadoop api above it also seems that some of this
> functionality
> > is perhaps now in the main api. I suppose any experience people have is
> > welcome. I would want to run a batch job to export every day, perform my
> > map
> > reduce, and then import the results back into mysql afterwards.
> >
> > cheers,
> > Brian
> >
> > On Sun, Jan 9, 2011 at 3:18 AM, Konstantin Boudnik <[EMAIL PROTECTED]>
> wrote:
> >
> > > There's a supported tool with all bells and whistles:
> > >  http://www.cloudera.com/downloads/sqoop/
> > >
> > > --
> > >   Take care,
> > > Konstantin (Cos) Boudnik
> > >
> > > On Sat, Jan 8, 2011 at 18:57, Sonal Goyal <[EMAIL PROTECTED]>
> wrote:
> > > > Hi Brian,
> > > >
> > > > You can check HIHO at https://github.com/sonalgoyal/hiho which can
> > help
> > > you
> > > > load data from any JDBC database to the Hadoop file system. If your
> > table
> > > > has a date or id field, or any indicator for modified/newly added
> rows,
> > > you
> > > > can import only the altered rows every day. Please let me know if you
> > > need
> > > > help.
> > > >
> > > > Thanks and Regards,
> > > > Sonal
> > > > <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
> > > > Salesforce, FTP servers and others <
> https://github.com/sonalgoyal/hiho
> > >
> > > > Nube Technologies <http://www.nubetech.co>
> > > >
> > > > <http://in.linkedin.com/in/sonalgoyal>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Jan 9, 2011 at 5:03 AM, Brian McSweeney
> > > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > >> Hi folks,
> > > >>
> > > >> I'm a TOTAL newbie on hadoop. I have an existing webapp that has a
> > > growing
> > > >> number of rows in a mysql database that I have to compare against
> one
> > > >> another once a day from a batch job. This is an exponential problem
+
Brian McSweeney 2011-01-10, 01:27
+
Brian McSweeney 2011-01-10, 01:27
+
Brian McSweeney 2011-01-09, 13:04
+
Ted Dunning 2011-01-09, 08:55
+
Brian McSweeney 2011-01-09, 13:26
+
Ted Dunning 2011-01-09, 21:18
+
Brian McSweeney 2011-01-10, 01:23
+
Mark Kerzner 2011-01-14, 06:02
+
Brian McSweeney 2011-01-14, 20:24
+
Black, Michael 2011-01-09, 12:20
+
Brian McSweeney 2011-01-09, 13:30
+
Black, Michael 2011-01-09, 13:51
+
Brian McSweeney 2011-01-10, 01:19
+
Black, Michael 2011-01-10, 13:21
+
Brian 2011-01-10, 20:00
+
Ted Dunning 2011-01-10, 21:51
+
Brian McSweeney 2011-01-11, 00:54
+
Black, Michael 2011-01-10, 20:46
+
Brian McSweeney 2011-01-10, 23:19
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB