Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> how to best process key-value pairs with Pig


Copy link to this message
-
Re: how to best process key-value pairs with Pig
What about denormalizing and just representing these as 4-tuples of (id,
type, name, value) in a text file? You could always then group by type if
you need to get back to distinct types.

Are you joining against a larger dataset? I ask just because 10x200 values
is not a lot and can be done without Hadoop.
On Wed, Mar 21, 2012 at 11:49 AM, shan s <[EMAIL PROTECTED]> wrote:

> In the relational database we have a large key, value type of data in 2
> tables. Let’s call it Entity and EntityAttribute.
>
>
>
> Table: Entity                       Columns: Entity ID, Entity Type
>
> Table: EntityAttribute        Columns: EntityID, PropertyName,
> PropertyValue.
>
>
>
> These entities are loosely related to each other, hence are under a single
> roof.
>
> There are approx.  100 attributes among entities and 20 different entity
> types.
>
>
>
> My questions are:
>
> -          What is the best way to represent this kind of key-value pair
> data for processing with Pig.
>
> -          Do I represent it as key=value pairs in the text files,  if so
> how would I process such data in Pig.
>
> -          Any pointer to UDFs that help with key- value pairs would be
> great.
>
>
>
> Many Thanks,
>
> Shan
>

--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB