|
Stan Rosenberg
2011-10-04, 03:09
Stan Rosenberg
2011-10-04, 03:14
Thejas Nair
2011-10-04, 17:24
Alan Gates
2011-10-04, 17:27
Stan Rosenberg
2011-10-04, 18:01
Alan Gates
2011-10-04, 18:06
Stan Rosenberg
2011-10-04, 18:12
Alan Gates
2011-10-04, 18:14
Alex Rovner
2011-10-05, 12:21
Thejas Nair
2011-10-05, 15:30
Alan Gates
2011-10-05, 17:27
|
-
output partitioningStan Rosenberg 2011-10-04, 03:09
Hi,
I'd like to store the output relation partitioned by
-
Re: output partitioningStan Rosenberg 2011-10-04, 03:14
Sorry folks, I've got to disable keyboard shortcuts in gmail.
I'd like to store the output relation partitioned by certain columns akin to what hive does. In fact, the ultimate goal is to leverage hive's dynamic partitions to store the output from pig. Any pointers are greatly appreciated. Thanks, stan On Mon, Oct 3, 2011 at 11:09 PM, Stan Rosenberg < [EMAIL PROTECTED]> wrote: > Hi, > > I'd like to store the output relation partitioned by >
-
Re: output partitioningThejas Nair 2011-10-04, 17:24
See the piggybank store func -
http://pig.apache.org/docs/r0.9.0/api/org/apache/pig/piggybank/storage/MultiStorage.html Also, see piggybank load func - http://pig.apache.org/docs/r0.9.0/api/org/apache/pig/piggybank/storage/AllLoader.html -Thejas On 10/3/11 8:14 PM, Stan Rosenberg wrote: > Sorry folks, I've got to disable keyboard shortcuts in gmail. > > I'd like to store the output relation partitioned by certain columns akin to > what hive does. In fact, the ultimate goal is to leverage > hive's dynamic partitions to store the output from pig. Any pointers are > greatly appreciated. > > Thanks, > > stan > > On Mon, Oct 3, 2011 at 11:09 PM, Stan Rosenberg< > [EMAIL PROTECTED]> wrote: > >> Hi, >> >> I'd like to store the output relation partitioned by >> >
-
Re: output partitioningAlan Gates 2011-10-04, 17:27
If you want to use Pig and Hive together, you should also consider HCatalog, which was built exactly to address that use case. http://incubator.apache.org/hcatalog/
Alan. On Oct 4, 2011, at 10:24 AM, Thejas Nair wrote: > See the piggybank store func - > http://pig.apache.org/docs/r0.9.0/api/org/apache/pig/piggybank/storage/MultiStorage.html > > Also, see piggybank load func - http://pig.apache.org/docs/r0.9.0/api/org/apache/pig/piggybank/storage/AllLoader.html > > -Thejas > > > On 10/3/11 8:14 PM, Stan Rosenberg wrote: >> Sorry folks, I've got to disable keyboard shortcuts in gmail. >> >> I'd like to store the output relation partitioned by certain columns akin to >> what hive does. In fact, the ultimate goal is to leverage >> hive's dynamic partitions to store the output from pig. Any pointers are >> greatly appreciated. >> >> Thanks, >> >> stan >> >> On Mon, Oct 3, 2011 at 11:09 PM, Stan Rosenberg< >> [EMAIL PROTECTED]> wrote: >> >>> Hi, >>> >>> I'd like to store the output relation partitioned by >>> >> >
-
Re: output partitioningStan Rosenberg 2011-10-04, 18:01
On Tue, Oct 4, 2011 at 1:27 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> If you want to use Pig and Hive together, you should also consider > HCatalog, which was built exactly to address that use case. > http://incubator.apache.org/hcatalog We'll definitely consider HCatalog but unfortunately it does not seem to be ready for prime time. Due to our data volume we need to have a secondary output partitioning; HCatalog does not yet support it.
-
Re: output partitioningAlan Gates 2011-10-04, 18:06
Can you explain what you mean by secondary output partitioning? HCatalog supports the same partitioning that Hive does.
Alan. On Oct 4, 2011, at 11:01 AM, Stan Rosenberg wrote: > On Tue, Oct 4, 2011 at 1:27 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > >> If you want to use Pig and Hive together, you should also consider >> HCatalog, which was built exactly to address that use case. >> http://incubator.apache.org/hcatalog > > > We'll definitely consider HCatalog but unfortunately it does not seem to be > ready for prime time. Due to our data volume we need to have a secondary > output partitioning; HCatalog > does not yet support it.
-
Re: output partitioningStan Rosenberg 2011-10-04, 18:12
On Tue, Oct 4, 2011 at 2:06 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> Can you explain what you mean by secondary output partitioning? HCatalog > supports the same partitioning that Hive does. > "Currently HCatStorer only supports writing to one partition." We need to partition our data by client id, then by date, hence two-level partitioning.
-
Re: output partitioningAlan Gates 2011-10-04, 18:14
That means one partition at a time, not the number of keys in the partition. And in the 0.2 (just released), the one at a time restriction is removed. So you can partition data by client id and date.
Alan. On Oct 4, 2011, at 11:12 AM, Stan Rosenberg wrote: > On Tue, Oct 4, 2011 at 2:06 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > >> Can you explain what you mean by secondary output partitioning? HCatalog >> supports the same partitioning that Hive does. >> > > "Currently HCatStorer only supports writing to one partition." > > We need to partition our data by client id, then by date, hence two-level > partitioning.
-
Re: output partitioningAlex Rovner 2011-10-05, 12:21
Alan,
We are looking into integrating with the HCatalog and I have the following questions: 1. In your opinion, how stable is the HCatalog? 2. On the install page it mentions the creation of the hive metastore db. What if we are already using Hive and have an existing metastore db in MySQL? What versions of Hive is the HCatalog compatible with? Thanks in advance Alex R On Tue, Oct 4, 2011 at 2:14 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > That means one partition at a time, not the number of keys in the > partition. And in the 0.2 (just released), the one at a time restriction is > removed. So you can partition data by client id and date. > > Alan. > > On Oct 4, 2011, at 11:12 AM, Stan Rosenberg wrote: > > > On Tue, Oct 4, 2011 at 2:06 PM, Alan Gates <[EMAIL PROTECTED]> > wrote: > > > >> Can you explain what you mean by secondary output partitioning? > HCatalog > >> supports the same partitioning that Hive does. > >> > > > > "Currently HCatStorer only supports writing to one partition." > > > > We need to partition our data by client id, then by date, hence > two-level > > partitioning. > >
-
Re: output partitioningThejas Nair 2011-10-05, 15:30
-thejas.
typed on a tiny virtual keyboard On Oct 5, 2011 5:21 AM, "Alex Rovner" <[EMAIL PROTECTED]> wrote: > Alan, > > We are looking into integrating with the HCatalog and I have the following > questions: > > 1. In your opinion, how stable is the HCatalog? > 2. On the install page it mentions the creation of the hive metastore db. > What if we are already using Hive and have an existing metastore db in > MySQL? What versions of Hive is the HCatalog compatible with? > > Thanks in advance > > Alex R > > On Tue, Oct 4, 2011 at 2:14 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > >> That means one partition at a time, not the number of keys in the >> partition. And in the 0.2 (just released), the one at a time restriction is >> removed. So you can partition data by client id and date. >> >> Alan. >> >> On Oct 4, 2011, at 11:12 AM, Stan Rosenberg wrote: >> >> > On Tue, Oct 4, 2011 at 2:06 PM, Alan Gates <[EMAIL PROTECTED]> >> wrote: >> > >> >> Can you explain what you mean by secondary output partitioning? >> HCatalog >> >> supports the same partitioning that Hive does. >> >> >> > >> > "Currently HCatStorer only supports writing to one partition." >> > >> > We need to partition our data by client id, then by date, hence >> two-level >> > partitioning. >> >>
-
Re: output partitioningAlan Gates 2011-10-05, 17:27
On Oct 5, 2011, at 5:21 AM, Alex Rovner wrote: > Alan, > > We are looking into integrating with the HCatalog and I have the following > questions: > > 1. In your opinion, how stable is the HCatalog? We have a comprehensive test suite that we run on HCatalog regularly, as does Yahoo. Also, Yahoo is running it on some of their clusters. It is a fairly young project but at its core is Hive's metastore, which is mature and well tested. > 2. On the install page it mentions the creation of the hive metastore db. > What if we are already using Hive and have an existing metastore db in > MySQL? What versions of Hive is the HCatalog compatible with? HCatalog requires Hive metastore 0.7.1 plus a few patches, none of which change the database schema. We have tested it with the 0.7.1 Hive client and Hive trunk. Alan. |