Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> how to best process key-value pairs with Pig


Copy link to this message
-
Re: how to best process key-value pairs with Pig
What about denormalizing and just representing these as 4-tuples of (id,
type, name, value) in a text file? You could always then group by type if
you need to get back to distinct types.

Are you joining against a larger dataset? I ask just because 10x200 values
is not a lot and can be done without Hadoop.
On Wed, Mar 21, 2012 at 11:49 AM, shan s <[EMAIL PROTECTED]> wrote:

> In the relational database we have a large key, value type of data in 2
> tables. Let’s call it Entity and EntityAttribute.
>
>
>
> Table: Entity                       Columns: Entity ID, Entity Type
>
> Table: EntityAttribute        Columns: EntityID, PropertyName,
> PropertyValue.
>
>
>
> These entities are loosely related to each other, hence are under a single
> roof.
>
> There are approx.  100 attributes among entities and 20 different entity
> types.
>
>
>
> My questions are:
>
> -          What is the best way to represent this kind of key-value pair
> data for processing with Pig.
>
> -          Do I represent it as key=value pairs in the text files,  if so
> how would I process such data in Pig.
>
> -          Any pointer to UDFs that help with key- value pairs would be
> great.
>
>
>
> Many Thanks,
>
> Shan
>

--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*