Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: from relational to bigger data


Copy link to this message
-
Re: from relational to bigger data
Chris Embree 2013-12-19, 20:51
In big data terms, 500G isn't big.  But, moving that much data around
every night is not trivial either.  I'm going to guess at a lot here,
but at a very high level.

1. Sqoop the data required to build the summary tables into Hadoop.
2. Crunch the summaries into new tables (really just files on Hadoop)
3. Sqoop the summarized data back out to Oracle
4. Build Indices as needed.

Depending on the size of the data being sqoop'd, this might help.  It
might also take longer.  A real solution would require more details
and analysis.

Chris

On 12/19/13, Jay Vee <[EMAIL PROTECTED]> wrote:
> We have a large relational database ( ~ 500 GB, hundreds of tables ).
>
> We have summary tables that we rebuild from scratch each night that takes
> about 10 hours.
> From these summary tables, we have a web interface that accesses the
> summary tables to build reports.
>
> There is a business reason for doing a complete rebuild of the summary
> tables each night, and using
> views (as in the sense of Oracle views) is not an option at this time.
>
> If I wanted to leverage Big Data technologies to speed up the summary table
> rebuild, what would be the first step into getting all data into some big
> data storage technology?
>
> Ideally in the end, we want to retain the summary tables in a relational
> database and have reporting work the same without modifications.
>
> It's just the crunching of the data and building these relational summary
> tables where we need a significant performance increase.
>