Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Enhancing Query Join to speed up Query


Copy link to this message
-
Re: Enhancing Query Join to speed up Query
Hi,

Here are the test results for the two query options shown in my initial
email -

Test Criteria - Both tables have 20 million rows each, 20 partitions, 16
buckets for item_id

Query Option 1 - Took 25 minutes, 5 seconds
Query Option 2 - Took 24 minutes, 42 seconds

Thanks for all the responses!

Naga
On Thu, Jun 13, 2013 at 6:42 PM, Navis류승우 <[EMAIL PROTECTED]> wrote:

> You can use "explain" for confirming differences. For inner joins, it
> would make the same plan.
>
> 2013/6/14 Igor Tatarinov <[EMAIL PROTECTED]>:
> > I would expect no difference because of predicate pushdown.
> >
> > igor
> > decide.com
> >
> >
> > On Thu, Jun 13, 2013 at 11:31 AM, Naga Vijay <[EMAIL PROTECTED]> wrote:
> >>
> >> Sure, Will do
> >>
> >>
> >> On Thu, Jun 13, 2013 at 10:42 AM, Stephen Sprague <[EMAIL PROTECTED]>
> >> wrote:
> >>>
> >>> Hi naja,
> >>> test those two versions (or three now) and report back to the group.
>  :)
> >>> even if some smarty-pants thinks he knows the answer its always good to
> >>> confirm things are as they should be.
> >>>
> >>>
> >>> On Wed, Jun 12, 2013 at 11:54 PM, Sanjay Subramanian
> >>> <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> I would actually do it like this…so that the set on the left of JOIN
> >>>> becomes smaller
> >>>>
> >>>> SELECT a.item_id, a.create_dt
> >>>> FROM
> >>>>      ( SELECT
> >>>>                     item_id, create_dt
> >>>>       FROM
> >>>>                     A
> >>>>       WHERE
> >>>>                    item_id = 'I001'
> >>>>        AND
> >>>>                   category_name = 'C001'
> >>>>       )  a
> >>>> JOIN
> >>>>          b
> >>>> ON
> >>>>         a.item_id = b.item_id
> >>>> ;
> >>>>
> >>>>
> >>>> From: Naga Vijay <[EMAIL PROTECTED]>
> >>>> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> >>>> Date: Wednesday, June 12, 2013 9:17 PM
> >>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> >>>> Subject: Enhancing Query Join to speed up Query
> >>>>
> >>>> Hi,
> >>>>
> >>>> Which of the two query options is better?
> >>>>
> >>>> SELECT a.item_id, a.create_dt
> >>>> FROM   a JOIN b
> >>>> ON     (a.item_id = b.item_id)
> >>>> WHERE  a.item_id = 'I001'
> >>>> AND    a.category_name = 'C001';
> >>>>
> >>>> - or -
> >>>>
> >>>> SELECT a.item_id, a.create_dt
> >>>> FROM   a JOIN b
> >>>> ON     (a.item_id = b.item_id AND a.item_id = 'I001')
> >>>> WHERE  a.category_name = 'C001';
> >>>>
> >>>> Thanks
> >>>> Naga
> >>>>
> >>>> CONFIDENTIALITY NOTICE
> >>>> =====================> >>>> This email message and any attachments are for the exclusive use of
> the
> >>>> intended recipient(s) and may contain confidential and privileged
> >>>> information. Any unauthorized review, use, disclosure or distribution
> is
> >>>> prohibited. If you are not the intended recipient, please contact the
> sender
> >>>> by reply email and destroy all copies of the original message along
> with any
> >>>> attachments, from your computer system. If you are the intended
> recipient,
> >>>> please be advised that the content of this message is subject to
> access,
> >>>> review and disclosure by the sender's Email System Administrator.
> >>>
> >>>
> >>
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB