Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Best Practice: store depending on data content


Copy link to this message
-
Re: Best Practice: store depending on data content
Thejas Nair 2012-06-22, 18:23
You can use MultiStorage store func -
http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/MultiStorage.html

Or if you want something more flexible, and have metadata as well, use
hcatalog . Specify the keys on which you want to partition as your
partition keys in the table. Then use HcatStorer() to store the data.
See http://incubator.apache.org/hcatalog/docs/r0.4.0/index.html

Thanks,
Thejas

On 6/22/12 4:54 AM, Markus Resch wrote:
> Hey everyone,
>
> We're doing some aggregation. The result contains a key where we want to
> have a single output file for each key. Is it possible to store files
> like this? Especially adjusting the path by the key's value.
>
> Example:
> Input = LOAD 'my/data.avro' USING AvroStorage;
> [.... doing stuff....]
> Output = GROUP AggregatesValues BY Key;
> FOREACH Output Store * into
> '/my/output/path/by/$Output.Key/Result.avro'
>
> I know this example does not work. But is there anything similar
> possible? And, as I assume, not: is there some framework in the hadoop
> world that can do such stuff?
>
>
> Thanks
>
> Markus
>
>