Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Small question


Copy link to this message
-
Re: Small question
Dmitriy Ryaboy 2012-10-04, 06:25
That's a fuzzy match (join two tables not on equality, but on one
table's column value matching a dynamically generated regex based on
another column). I don't know of efficient ways of doing that in MR,
be it Pig or Hive.. what is Hive's execution plan for that?

The only thing that comes to mind for me is a pretty fancy udf which
loads up one table completely in memory, and applies the match to all
entries as the other table is streamed through. But of course that
would be quite expensive if the lookup table is of any respectable
size.

D

On Wed, Oct 3, 2012 at 11:32 AM, J. Rottinghuis <[EMAIL PROTECTED]> wrote:
> <moved [EMAIL PROTECTED] to bcc and added [EMAIL PROTECTED]>
>
> Best asked on the Pig users list.
>
> Cheers,
>
> Joep
>
> On Wed, Oct 3, 2012 at 7:04 AM, Abhishek <[EMAIL PROTECTED]> wrote:
>
>> Hi all,
>>
>> Below hive query in pig latin how to do that.
>>
>> select t2.col1, t3.col2
>>
>> from table2 t2
>>
>> join table3 t3
>>
>> WHERE t3.col2 IS NOT NULL
>>
>> AND t2.col1 LIKE CONCAT(CONCAT('%',t3.col2),'%')
>>
>> Regards
>> Abhi
>>
>>
>>