Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # dev >> Concurrency in hive


+
Jayanth Muthya 2012-06-21, 08:16
+
Jerome Banks 2012-06-21, 17:17
+
Jayanth Muthya 2012-06-22, 09:14
Copy link to this message
-
Re: Concurrency in hive
Almost all operations in hive can exploit map reduce for parallelism.
(isnt not really done on the thread level) essentially if you run a
hive job and there is multiple mappers or reducers it was parallelism.

On Fri, Jun 22, 2012 at 5:14 AM, Jayanth Muthya <[EMAIL PROTECTED]> wrote:
> Thanks or clarifying, I'll look into it too and see if I can find anything.
>
> -Jayanth
>
> On Thu, Jun 21, 2012 at 10:47 PM, Jerome Banks <[EMAIL PROTECTED]> wrote:
>
>> set hive.exec.parallel=true;
>>
>> This will run Hive jobs in parallel, if they are able to do so.
>>
>> As for multi-threading in the actual job itself, I don't think so, but I'm
>> not sure. The query planner will merge steps together, in order to try to
>> minimize the number of MR jobs needed to run a query, but I think those are
>> chained together in a single thread, both on the mapper and reduce.
>>
>> When I was at Quantcast, we had some multi-threading in the mapper ands
>> reducers, to try to increase throughput, by utilizing the CPU when the job
>> would otherwise be blocked on IO.  This helps out, if your IO is very slow,
>> but if the IO no longer becomes a bottleneck, then you spend a lot of time
>> context-switching, and it no longer efficient.
>>
>> Interesting question, I'll look into it some more. Let me know if you find
>> out anything.
>>
>> -- jerome
>>
>> On Thu, Jun 21, 2012 at 1:16 AM, Jayanth Muthya <[EMAIL PROTECTED]
>> >wrote:
>>
>> > Hi,
>> > I was looking into some of the source code for hive. And had a few
>> > questions regarding parallelism in hive. Can a map task in
>> > hive exploit parallelism and run multiple threads? If it can do that,
>> does
>> > it do it by default? or does a user have to configure the settings?
>> > This question seems really basic, I just started looking into
>> hadoop/hive.
>> > Thanks in advance!
>> >
>> > -Jay
>> >
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB