Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Split tuple with multiple fields to tuples with single field in Pig


+
Dan Zhu 2013-07-25, 05:54
Copy link to this message
-
Re: Split tuple with multiple fields to tuples with single field in Pig
Updated about my problem:

I have tuples with varied length. I am trying to convert them to tuples with only one field(each field is a map).
Original data:

    dump entryArray;
    ([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
    ([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
    ([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585],[symbol#RFG,security_type#ETF,foreign_entry_id#5586],[symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587],[symbol#VWO,security_type#ETF,foreign_entry_id#5588])

I hope the output would be(each field still be map for further uses):

    ([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
    ([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
    ([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585])
    ([symbol#RFG,security_type#ETF,foreign_entry_id#5586])
    ([symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587])
    ([symbol#VWO,security_type#ETF,foreign_entry_id#5588])

I have tried: `entry = FOREACH entryArray GENERATE FLATTEN(TOBAG()); the output has same format, but it seems that the field is no longer MAP:

    entry = FOREACH entryArray GENERATE FLATTEN(TOBAG());
    dump entry;
    ([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
    ([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
    ([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585])
    ([symbol#RFG,security_type#ETF,foreign_entry_id#5586])
    ([symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587])
    ([symbol#VWO,security_type#ETF,foreign_entry_id#5588])

    security_type = FOREACH entry GENERATE FLATTEN($0#'security_type');
    it throws:
    ERROR 1052: Cannot cast bytearray to map with schema :map
    org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1059: <line 18, column 16> Problem while reconciling output schema of ForEach
    at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:141)
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:181)
at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:75)
    ......

Any suggestion would be very appreciated. Thanks!
From: "Yahoo! Inc." <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Wednesday, July 24, 2013 10:54 PM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Split tuple with multiple fields to tuples with single field in Pig

Hi pig-users,

I have tuples with varied length. I am trying to convert them to tuples with only one field (each field is a map).
Original data:
([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585],[symbol#RFG,security_type#ETF,foreign_entry_id#5586],[symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587],[symbol#VWO,security_type#ETF,foreign_entry_id#5588])

I hope the output would be:
([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585])
([symbol#RFG,security_type#ETF,foreign_entry_id#5586])
 ([symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587])
 ([symbol#VWO,security_type#ETF,foreign_entry_id#5588])

I have tried: FOREACH entryArray GENERATE FLATTEN(TOBAG(*));
It returns (only the first field of each tuple):
([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585])

Any suggestion would be very appreciated. Thanks!

Regards,

Dan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB