Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> JOIN or COGROUP? - need to join 1 column from first file which should lie in between 2 columns from second file.


Copy link to this message
-
Re: JOIN or COGROUP? - need to join 1 column from first file which should lie in between 2 columns from second file.
Assuming col1 is numeric, as you've indicated, couldn't you simply
generate a new column in file 1 by rounding to the nearest 1000? Then
file 1 would look like:

*File 1:
col1  col2 join_key
1234  2    1000
2222  3    2000
3333  5    3000
4444  6    4000

Then you could just join by the new key from file 1 and col2 from file
2.

This works even if your ranges are smaller, just round to whatever makes
sense. Eg, nearest 10. What this does not work for is if your ranges are
variable. Are your ranges variable? :)

--jacob
@thedatachef

On Fri, 2011-07-15 at 01:23 -0700, Lakshminarayana Motamarri wrote:
> Hi all
>
> I have 2 CSV files a shown below:
>
> *File 1:                     File2:
> col1  col2             col1    col2   col3   col4
> 1234    2                1000   1999
> 2222    3                2000   2999
> 3333    5                3000   3999
> 4444    6                4000   4999*
>
> Now I need to JOIN these 2 files in such a way that:
>
> File1-col1 should lie in between File2-col1 and File2-col2
>
> Can I use JOIN / COGROUP or any other existing operators?
>
> or shud I build a new UDF?
>
> thanks
> Narayan.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB