Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Merge join


Hey Ankur,

Zebra's TableLoader works with the data written out using Zebra's
TableStorer. So, you need to write the data first using Zebra and then
subsequently load using TableLoader and do merge-join.

Ashutosh
On Tue, Jul 19, 2011 at 14:28, Ankur Jain <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I'm trying to do a map-side only merge join [1] in pig using Zebra's
> TableLoader. (My data allows merge join.) But I'm being unable to use the
> TableLoader. Even a simple script that loads a table and just stores it back
> doesn't work -
>
>  ----
>  A = load 'my_input' using org.apache.hadoop.zebra.pig.TableLoader('',
> 'sorted');
>  store A into 'my_output';
>  ----
>
>
>  'my_input' is input directory containing a single file with just 1 column -
>  ---
>  1
>  2
>  3
>  ---
>
>  The error I get is -
>
>  "ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal
> error. Failed to find deleted column groupsjava.io.IOException: BT Schema
> file doesn't exist: *file:/......./my_input/.btschema*"
>
>
>  I have tried specifying the schema using the 'AS' clause and the DESCRIBE
> statement as well, but its fetches me the same error. Is the .btschema file
> required? Is there any documentation available on its format? (I tried
> comma-separated column names with/without type info)
>
>
> I am also willing to work with any other loader that satisfies the merge
> join constraints. Thanks in anticipation.
>
>
>  Regards,
>  Ankur
>
>
>  [1] *http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Merge+Joins*
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB