-Re: Loading a Hive table simultaneously from 2 different sources
Dean Wampler 2013-01-24, 16:03
You'll face all the usual concurrency synchronization risks if you're
updating the same "place" concurrently. One thing to keep in mind; it's all
just HDFS under the hood. That pretty much tells you everything you need to
know. Yes, there's also the metadata. So, one way to update a partition
directory safely is to write to unique files. Hive doesn't care about their
You can even write new directories for the partitions yourself, bypassing
Hive, and then tell Hive to "find" them afterwards. See
In this case, you're updating the metadata to reflect what just
to the file system.
On Thu, Jan 24, 2013 at 9:33 AM, Krishnan K <[EMAIL PROTECTED]> wrote:
> Hi Edward, All,
> Thanks for the quick reply!
> We are using dynamic partitions - so unable to say to which partition each
> record goes. We dont have much control here.
> Is there any properties that can be set ?
> I'm a bit doubtful here - is it because of the lock acquired on the table ?
> On Thu, Jan 24, 2013 at 8:22 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
>> Partition the table and load the data into different partitions. That or
>> build the data outside he table and then use scripting to move the data in
>> using LOAD DATA INPATH or copying.
>> On Thu, Jan 24, 2013 at 9:44 AM, Krishnan K <[EMAIL PROTECTED]>wrote:
>>> Hi All,
>>> Could you please let me know what would happen if we try to load a table
>>> from 2 different sources at the same time ?
>>> I had tried this earlier and got an error for 1 load job and while the
>>> other job loaded the data successfully into the table..
>>> I guess it was because of lock acquired on the table by the first load
>>> Is there anyway to handle this ?
>>> Please give your insights.
*Dean Wampler, Ph.D.*