Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Enhancing Query Join to speed up Query


Copy link to this message
-
Re: Enhancing Query Join to speed up Query
Hi,

Here are the test results for the two query options shown in my initial
email -

Test Criteria - Both tables have 20 million rows each, 20 partitions, 16
buckets for item_id

Query Option 1 - Took 25 minutes, 5 seconds
Query Option 2 - Took 24 minutes, 42 seconds

Thanks for all the responses!

Naga
On Thu, Jun 13, 2013 at 6:42 PM, Navis류승우 <[EMAIL PROTECTED]> wrote:

> You can use "explain" for confirming differences. For inner joins, it
> would make the same plan.
>
> 2013/6/14 Igor Tatarinov <[EMAIL PROTECTED]>:
> > I would expect no difference because of predicate pushdown.
> >
> > igor
> > decide.com
> >
> >
> > On Thu, Jun 13, 2013 at 11:31 AM, Naga Vijay <[EMAIL PROTECTED]> wrote:
> >>
> >> Sure, Will do
> >>
> >>
> >> On Thu, Jun 13, 2013 at 10:42 AM, Stephen Sprague <[EMAIL PROTECTED]>
> >> wrote:
> >>>
> >>> Hi naja,
> >>> test those two versions (or three now) and report back to the group.
>  :)
> >>> even if some smarty-pants thinks he knows the answer its always good to
> >>> confirm things are as they should be.
> >>>
> >>>
> >>> On Wed, Jun 12, 2013 at 11:54 PM, Sanjay Subramanian
> >>> <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> I would actually do it like this…so that the set on the left of JOIN
> >>>> becomes smaller
> >>>>
> >>>> SELECT a.item_id, a.create_dt
> >>>> FROM
> >>>>      ( SELECT
> >>>>                     item_id, create_dt
> >>>>       FROM
> >>>>                     A
> >>>>       WHERE
> >>>>                    item_id = 'I001'
> >>>>        AND
> >>>>                   category_name = 'C001'
> >>>>       )  a
> >>>> JOIN
> >>>>          b
> >>>> ON
> >>>>         a.item_id = b.item_id
> >>>> ;
> >>>>
> >>>>
> >>>> From: Naga Vijay <[EMAIL PROTECTED]>
> >>>> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> >>>> Date: Wednesday, June 12, 2013 9:17 PM
> >>>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> >>>> Subject: Enhancing Query Join to speed up Query
> >>>>
> >>>> Hi,
> >>>>
> >>>> Which of the two query options is better?
> >>>>
> >>>> SELECT a.item_id, a.create_dt
> >>>> FROM   a JOIN b
> >>>> ON     (a.item_id = b.item_id)
> >>>> WHERE  a.item_id = 'I001'
> >>>> AND    a.category_name = 'C001';
> >>>>
> >>>> - or -
> >>>>
> >>>> SELECT a.item_id, a.create_dt
> >>>> FROM   a JOIN b
> >>>> ON     (a.item_id = b.item_id AND a.item_id = 'I001')
> >>>> WHERE  a.category_name = 'C001';
> >>>>
> >>>> Thanks
> >>>> Naga
> >>>>
> >>>> CONFIDENTIALITY NOTICE
> >>>> =====================> >>>> This email message and any attachments are for the exclusive use of
> the
> >>>> intended recipient(s) and may contain confidential and privileged
> >>>> information. Any unauthorized review, use, disclosure or distribution
> is
> >>>> prohibited. If you are not the intended recipient, please contact the
> sender
> >>>> by reply email and destroy all copies of the original message along
> with any
> >>>> attachments, from your computer system. If you are the intended
> recipient,
> >>>> please be advised that the content of this message is subject to
> access,
> >>>> review and disclosure by the sender's Email System Administrator.
> >>>
> >>>
> >>
> >
>