Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # user - Throttling inserts to avoid replication lags


Copy link to this message
-
Re: Throttling inserts to avoid replication lags
Kathleen Ting 2012-09-12, 23:27
Chuck, Zoltán,

In Sqoop 2, it has been discussed that connections will allow the
specification of a resource policy in that resources will be managed
by limiting the total number of physical Connections open at one time
and with an option to disable Connections.

More info: https://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop

Regards, Kathleen

On Wed, Sep 12, 2012 at 8:08 AM, Connell, Chuck
<[EMAIL PROTECTED]> wrote:
> In my opinion, this is not a Sqoop problem. It is related to the RDBMS and
> the way it handles high-volume updates. Those updates might be coming from
> Sqoop, or they might be coming from a realtime stock market price feed.
>
>
>
> I would go ahead and test the system as is. Let Sqoop do all its updates. If
> you actually have a problem with inconsistencies or poor performance, then I
> would deal with it as a purely MySQL issue.
>
>
>
> (A low-tech approach… run the sqoop jobs at night??)
>
>
>
> Chuck
>
>
>
>
>
> From: Zoltán Tóth-Czifra [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 12, 2012 10:48 AM
> To: [EMAIL PROTECTED]
> Subject: Throttling inserts to avoid replication lags
>
>
>
> Hi guys,
>
>
>
> We are using Sqoop (cdh3u3) to export Hive tables to relational databases.
> Usually these databases are only used by business intelligence to further
> analyze and filter the data. However, in certain cases we need to export to
> relational databases that are heavily accessed by our products and users.
>
>
>
> Our concern is that Sqoop exports would interfere with this random access of
> our users. Tempotal inconsistency of the data can be solved with a staging
> table and an atomic swap, however, we are concerned about the replication
> lag between the master and the slaves.
>
>
>
> If we write large data quickly with Sqoop to the master (even to a staging
> table), that takes time to be replicated to the slaves (minutes) and causes
> an inconsistency we can't allow, that is, other writes from our users will
> be queued up. I wonder if any of you had similar problems. We are talking
> about a MySQL cluster by the way.
>
>
>
> For what I know, Sqoop doesn't have any built-in throttle funcionality (for
> example a delay between inserts). We have been thinking to solve this with a
> proxy, but the existing solutions on the market are very incomplete.
>
>
>
> Any other idea? The more transparent the best.
>
>
>
> Thanks!