Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - from relational to bigger data


Copy link to this message
-
Re: from relational to bigger data
Bertrand Dechoux 2013-12-20, 10:35
And to make things a bit more explicit, Sqoop is the name of a project "SQL
to Hadopp".
http://sqoop.apache.org/

As for crunch, I guess Chris used the generic term.
You could use MapReduce jobs with the Java API, Pig, Hive, Cascading,
Crunch (indeed)...

Regards

Bertrand

On Thu, Dec 19, 2013 at 10:59 PM, Vinay Bagare <[EMAIL PROTECTED]> wrote:

> I would also look at current setup.
> I agree with Chris that 500 GB is fairly insignificant.
>
>
> Best,
> Vinay Bagare
>
>
>
> On Dec 19, 2013, at 12:51 PM, Chris Embree <[EMAIL PROTECTED]> wrote:
>
> > In big data terms, 500G isn't big.  But, moving that much data around
> > every night is not trivial either.  I'm going to guess at a lot here,
> > but at a very high level.
> >
> > 1. Sqoop the data required to build the summary tables into Hadoop.
> > 2. Crunch the summaries into new tables (really just files on Hadoop)
> > 3. Sqoop the summarized data back out to Oracle
> > 4. Build Indices as needed.
> >
> > Depending on the size of the data being sqoop'd, this might help.  It
> > might also take longer.  A real solution would require more details
> > and analysis.
> >
> > Chris
> >
> > On 12/19/13, Jay Vee <[EMAIL PROTECTED]> wrote:
> >> We have a large relational database ( ~ 500 GB, hundreds of tables ).
> >>
> >> We have summary tables that we rebuild from scratch each night that
> takes
> >> about 10 hours.
> >> From these summary tables, we have a web interface that accesses the
> >> summary tables to build reports.
> >>
> >> There is a business reason for doing a complete rebuild of the summary
> >> tables each night, and using
> >> views (as in the sense of Oracle views) is not an option at this time.
> >>
> >> If I wanted to leverage Big Data technologies to speed up the summary
> table
> >> rebuild, what would be the first step into getting all data into some
> big
> >> data storage technology?
> >>
> >> Ideally in the end, we want to retain the summary tables in a relational
> >> database and have reporting work the same without modifications.
> >>
> >> It's just the crunching of the data and building these relational
> summary
> >> tables where we need a significant performance increase.
> >>
>
>