Jayanth Muthya 2012-06-21, 08:16
Jerome Banks 2012-06-21, 17:17
Jayanth Muthya 2012-06-22, 09:14
Almost all operations in hive can exploit map reduce for parallelism.
(isnt not really done on the thread level) essentially if you run a
hive job and there is multiple mappers or reducers it was parallelism.
On Fri, Jun 22, 2012 at 5:14 AM, Jayanth Muthya <[EMAIL PROTECTED]> wrote:
> Thanks or clarifying, I'll look into it too and see if I can find anything.
> On Thu, Jun 21, 2012 at 10:47 PM, Jerome Banks <[EMAIL PROTECTED]> wrote:
>> set hive.exec.parallel=true;
>> This will run Hive jobs in parallel, if they are able to do so.
>> As for multi-threading in the actual job itself, I don't think so, but I'm
>> not sure. The query planner will merge steps together, in order to try to
>> minimize the number of MR jobs needed to run a query, but I think those are
>> chained together in a single thread, both on the mapper and reduce.
>> When I was at Quantcast, we had some multi-threading in the mapper ands
>> reducers, to try to increase throughput, by utilizing the CPU when the job
>> would otherwise be blocked on IO. This helps out, if your IO is very slow,
>> but if the IO no longer becomes a bottleneck, then you spend a lot of time
>> context-switching, and it no longer efficient.
>> Interesting question, I'll look into it some more. Let me know if you find
>> out anything.
>> -- jerome
>> On Thu, Jun 21, 2012 at 1:16 AM, Jayanth Muthya <[EMAIL PROTECTED]
>> > Hi,
>> > I was looking into some of the source code for hive. And had a few
>> > questions regarding parallelism in hive. Can a map task in
>> > hive exploit parallelism and run multiple threads? If it can do that,
>> > it do it by default? or does a user have to configure the settings?
>> > This question seems really basic, I just started looking into
>> > Thanks in advance!
>> > -Jay