And to make things a bit more explicit, Sqoop is the name of a project "SQL
As for crunch, I guess Chris used the generic term.
You could use MapReduce jobs with the Java API, Pig, Hive, Cascading,
On Thu, Dec 19, 2013 at 10:59 PM, Vinay Bagare <[EMAIL PROTECTED]> wrote:
> I would also look at current setup.
> I agree with Chris that 500 GB is fairly insignificant.
> Vinay Bagare
> On Dec 19, 2013, at 12:51 PM, Chris Embree <[EMAIL PROTECTED]> wrote:
> > In big data terms, 500G isn't big. But, moving that much data around
> > every night is not trivial either. I'm going to guess at a lot here,
> > but at a very high level.
> > 1. Sqoop the data required to build the summary tables into Hadoop.
> > 2. Crunch the summaries into new tables (really just files on Hadoop)
> > 3. Sqoop the summarized data back out to Oracle
> > 4. Build Indices as needed.
> > Depending on the size of the data being sqoop'd, this might help. It
> > might also take longer. A real solution would require more details
> > and analysis.
> > Chris
> > On 12/19/13, Jay Vee <[EMAIL PROTECTED]> wrote:
> >> We have a large relational database ( ~ 500 GB, hundreds of tables ).
> >> We have summary tables that we rebuild from scratch each night that
> >> about 10 hours.
> >> From these summary tables, we have a web interface that accesses the
> >> summary tables to build reports.
> >> There is a business reason for doing a complete rebuild of the summary
> >> tables each night, and using
> >> views (as in the sense of Oracle views) is not an option at this time.
> >> If I wanted to leverage Big Data technologies to speed up the summary
> >> rebuild, what would be the first step into getting all data into some
> >> data storage technology?
> >> Ideally in the end, we want to retain the summary tables in a relational
> >> database and have reporting work the same without modifications.
> >> It's just the crunching of the data and building these relational
> >> tables where we need a significant performance increase.