Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Best Practice: store depending on data content

Copy link to this message
Re: Best Practice: store depending on data content
You can use MultiStorage store func -

Or if you want something more flexible, and have metadata as well, use
hcatalog . Specify the keys on which you want to partition as your
partition keys in the table. Then use HcatStorer() to store the data.
See http://incubator.apache.org/hcatalog/docs/r0.4.0/index.html


On 6/22/12 4:54 AM, Markus Resch wrote:
> Hey everyone,
> We're doing some aggregation. The result contains a key where we want to
> have a single output file for each key. Is it possible to store files
> like this? Especially adjusting the path by the key's value.
> Example:
> Input = LOAD 'my/data.avro' USING AvroStorage;
> [.... doing stuff....]
> Output = GROUP AggregatesValues BY Key;
> FOREACH Output Store * into
> '/my/output/path/by/$Output.Key/Result.avro'
> I know this example does not work. But is there anything similar
> possible? And, as I assume, not: is there some framework in the hadoop
> world that can do such stuff?
> Thanks
> Markus