Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Throttling inserts to avoid replication lags

Copy link to this message
Throttling inserts to avoid replication lags
Hi guys,

We are using Sqoop (cdh3u3) to export Hive tables to relational databases. Usually these databases are only used by business intelligence to further analyze and filter the data. However, in certain cases we need to export to relational databases that are heavily accessed by our products and users.

Our concern is that Sqoop exports would interfere with this random access of our users. Tempotal inconsistency of the data can be solved with a staging table and an atomic swap, however, we are concerned about the replication lag between the master and the slaves.

If we write large data quickly with Sqoop to the master (even to a staging table), that takes time to be replicated to the slaves (minutes) and causes an inconsistency we can't allow, that is, other writes from our users will be queued up. I wonder if any of you had similar problems. We are talking about a MySQL cluster by the way.

For what I know, Sqoop doesn't have any built-in throttle funcionality (for example a delay between inserts). We have been thinking to solve this with a proxy, but the existing solutions on the market are very incomplete.

Any other idea? The more transparent the best.