Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> from relational to bigger data


Copy link to this message
-
Re: from relational to bigger data
And to make things a bit more explicit, Sqoop is the name of a project "SQL
to Hadopp".
http://sqoop.apache.org/

As for crunch, I guess Chris used the generic term.
You could use MapReduce jobs with the Java API, Pig, Hive, Cascading,
Crunch (indeed)...

Regards

Bertrand

On Thu, Dec 19, 2013 at 10:59 PM, Vinay Bagare <[EMAIL PROTECTED]> wrote:

> I would also look at current setup.
> I agree with Chris that 500 GB is fairly insignificant.
>
>
> Best,
> Vinay Bagare
>
>
>
> On Dec 19, 2013, at 12:51 PM, Chris Embree <[EMAIL PROTECTED]> wrote:
>
> > In big data terms, 500G isn't big.  But, moving that much data around
> > every night is not trivial either.  I'm going to guess at a lot here,
> > but at a very high level.
> >
> > 1. Sqoop the data required to build the summary tables into Hadoop.
> > 2. Crunch the summaries into new tables (really just files on Hadoop)
> > 3. Sqoop the summarized data back out to Oracle
> > 4. Build Indices as needed.
> >
> > Depending on the size of the data being sqoop'd, this might help.  It
> > might also take longer.  A real solution would require more details
> > and analysis.
> >
> > Chris
> >
> > On 12/19/13, Jay Vee <[EMAIL PROTECTED]> wrote:
> >> We have a large relational database ( ~ 500 GB, hundreds of tables ).
> >>
> >> We have summary tables that we rebuild from scratch each night that
> takes
> >> about 10 hours.
> >> From these summary tables, we have a web interface that accesses the
> >> summary tables to build reports.
> >>
> >> There is a business reason for doing a complete rebuild of the summary
> >> tables each night, and using
> >> views (as in the sense of Oracle views) is not an option at this time.
> >>
> >> If I wanted to leverage Big Data technologies to speed up the summary
> table
> >> rebuild, what would be the first step into getting all data into some
> big
> >> data storage technology?
> >>
> >> Ideally in the end, we want to retain the summary tables in a relational
> >> database and have reporting work the same without modifications.
> >>
> >> It's just the crunching of the data and building these relational
> summary
> >> tables where we need a significant performance increase.
> >>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB