Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Throttling inserts to avoid replication lags


Copy link to this message
-
RE: Throttling inserts to avoid replication lags
Hi,

Thank you for your answers!

I have been reading about Sqoop2, but since it's still under development it doesn't really serve me. Besides, my problem is not limiting connections, but somehow limiting the throughput of even one connection.

This problem might not be Sqoop-specific, but I wondered if anyone have faced this and solved it somehow.

Thank you!
________________________________________
From: Kathleen Ting [[EMAIL PROTECTED]]
Sent: Thursday, September 13, 2012 1:27 AM
To: [EMAIL PROTECTED]
Subject: Re: Throttling inserts to avoid replication lags

Chuck, Zoltán,

In Sqoop 2, it has been discussed that connections will allow the
specification of a resource policy in that resources will be managed
by limiting the total number of physical Connections open at one time
and with an option to disable Connections.

More info: https://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop

Regards, Kathleen

On Wed, Sep 12, 2012 at 8:08 AM, Connell, Chuck
<[EMAIL PROTECTED]> wrote:
> In my opinion, this is not a Sqoop problem. It is related to the RDBMS and
> the way it handles high-volume updates. Those updates might be coming from
> Sqoop, or they might be coming from a realtime stock market price feed.
>
>
>
> I would go ahead and test the system as is. Let Sqoop do all its updates. If
> you actually have a problem with inconsistencies or poor performance, then I
> would deal with it as a purely MySQL issue.
>
>
>
> (A low-tech approach… run the sqoop jobs at night??)
>
>
>
> Chuck
>
>
>
>
>
> From: Zoltán Tóth-Czifra [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 12, 2012 10:48 AM
> To: [EMAIL PROTECTED]
> Subject: Throttling inserts to avoid replication lags
>
>
>
> Hi guys,
>
>
>
> We are using Sqoop (cdh3u3) to export Hive tables to relational databases.
> Usually these databases are only used by business intelligence to further
> analyze and filter the data. However, in certain cases we need to export to
> relational databases that are heavily accessed by our products and users.
>
>
>
> Our concern is that Sqoop exports would interfere with this random access of
> our users. Tempotal inconsistency of the data can be solved with a staging
> table and an atomic swap, however, we are concerned about the replication
> lag between the master and the slaves.
>
>
>
> If we write large data quickly with Sqoop to the master (even to a staging
> table), that takes time to be replicated to the slaves (minutes) and causes
> an inconsistency we can't allow, that is, other writes from our users will
> be queued up. I wonder if any of you had similar problems. We are talking
> about a MySQL cluster by the way.
>
>
>
> For what I know, Sqoop doesn't have any built-in throttle funcionality (for
> example a delay between inserts). We have been thinking to solve this with a
> proxy, but the existing solutions on the market are very incomplete.
>
>
>
> Any other idea? The more transparent the best.
>
>
>
> Thanks!