|
Prasanth J
2012-06-06, 21:24
Dmitriy Ryaboy
2012-06-07, 00:41
Prasanth J
2012-06-08, 02:41
Alan Gates
2012-06-08, 16:22
Prasanth J
2012-06-09, 02:00
Jonathan Coveney
2012-06-09, 03:06
Prasanth J
2012-06-12, 05:53
|
-
Handle NULL values in Cube dimensionsPrasanth J 2012-06-06, 21:24
Hello everyone
I would like to bring up this discussion about the ways for handling NULL values in dimensions specified for cubing. For example, if we have a dimension color with following values red blue null green how do we differentiate if the null value represent rollup of all colors values or actual null value? SQL way: There are 2 ways in which SQL server analysis services handles null values in dimensions 1) Throw error when it encounters null values in dimension values 2) Ignore error by adding the null values to UnknownMembers. By default UnknownMembers will be named as "Unknown". The name for UnknownMembers can also be specified by the user. Do we need to handle both ways in Pig? I think the first way (throwing error) is pretty straightforward. For the second way (ignoring error), what is the best way to provide support for user specified name for UnknownMembers? Please share your thoughts about how we can handle this scenario for different datatypes in Pig. Thanks -- Prasanth
-
Re: Handle NULL values in Cube dimensionsDmitriy Ryaboy 2012-06-07, 00:41
Note that the current CubeDimensions UDF does a third thing -- instead
of rebranding "nulls" as "unknown" and using null to mean "*" or "all values", the UDF allows you to specify a custom value to stand for "*" or "all values". That way null can be an individual valid cell value. This is (imho) much nicer than the other options, but it does complicate working with 3rd party tools. Does anyone have opinions on this? D On Wed, Jun 6, 2012 at 2:24 PM, Prasanth J <[EMAIL PROTECTED]> wrote: > Hello everyone > > I would like to bring up this discussion about the ways for handling NULL values in dimensions specified for cubing. For example, if we have a dimension color with following values > > red > blue > null > green > > how do we differentiate if the null value represent rollup of all colors values or actual null value? > > SQL way: > There are 2 ways in which SQL server analysis services handles null values in dimensions > 1) Throw error when it encounters null values in dimension values > 2) Ignore error by adding the null values to UnknownMembers. By default UnknownMembers will be named as "Unknown". The name for UnknownMembers can also be specified by the user. > > Do we need to handle both ways in Pig? I think the first way (throwing error) is pretty straightforward. > For the second way (ignoring error), what is the best way to provide support for user specified name for UnknownMembers? > > Please share your thoughts about how we can handle this scenario for different datatypes in Pig. > > Thanks > -- Prasanth >
-
Re: Handle NULL values in Cube dimensionsPrasanth J 2012-06-08, 02:41
But how do the user specify custom value for "*"? In the current implementation I am passing "NULL" string to the CubeDimensions constructor. If we need to get that value from user then we need some changes in grammar like
a = CUBE b BY (x,y,z) ALL as "AllProducts"; also what should be the default value if user didn't specify anything? Thanks -- Prasanth On Jun 6, 2012, at 8:41 PM, Dmitriy Ryaboy wrote: > Note that the current CubeDimensions UDF does a third thing -- instead > of rebranding "nulls" as "unknown" and using null to mean "*" or "all > values", the UDF allows you to specify a custom value to stand for "*" > or "all values". That way null can be an individual valid cell value. > > This is (imho) much nicer than the other options, but it does > complicate working with 3rd party tools. > > Does anyone have opinions on this? > > D > > On Wed, Jun 6, 2012 at 2:24 PM, Prasanth J <[EMAIL PROTECTED]> wrote: >> Hello everyone >> >> I would like to bring up this discussion about the ways for handling NULL values in dimensions specified for cubing. For example, if we have a dimension color with following values >> >> red >> blue >> null >> green >> >> how do we differentiate if the null value represent rollup of all colors values or actual null value? >> >> SQL way: >> There are 2 ways in which SQL server analysis services handles null values in dimensions >> 1) Throw error when it encounters null values in dimension values >> 2) Ignore error by adding the null values to UnknownMembers. By default UnknownMembers will be named as "Unknown". The name for UnknownMembers can also be specified by the user. >> >> Do we need to handle both ways in Pig? I think the first way (throwing error) is pretty straightforward. >> For the second way (ignoring error), what is the best way to provide support for user specified name for UnknownMembers? >> >> Please share your thoughts about how we can handle this scenario for different datatypes in Pig. >> >> Thanks >> -- Prasanth >>
-
Re: Handle NULL values in Cube dimensionsAlan Gates 2012-06-08, 16:22
Option 1 (throwing an error) is bad. It violates "Pigs eat anything" (see http://pig.apache.org/philosophy.html).
Do we need to give users an ability to name this unknown column? Why not just label it "unknown" and be done? Alan. On Jun 6, 2012, at 2:24 PM, Prasanth J wrote: > Hello everyone > > I would like to bring up this discussion about the ways for handling NULL values in dimensions specified for cubing. For example, if we have a dimension color with following values > > red > blue > null > green > > how do we differentiate if the null value represent rollup of all colors values or actual null value? > > SQL way: > There are 2 ways in which SQL server analysis services handles null values in dimensions > 1) Throw error when it encounters null values in dimension values > 2) Ignore error by adding the null values to UnknownMembers. By default UnknownMembers will be named as "Unknown". The name for UnknownMembers can also be specified by the user. > > Do we need to handle both ways in Pig? I think the first way (throwing error) is pretty straightforward. > For the second way (ignoring error), what is the best way to provide support for user specified name for UnknownMembers? > > Please share your thoughts about how we can handle this scenario for different datatypes in Pig. > > Thanks > -- Prasanth >
-
Re: Handle NULL values in Cube dimensionsPrasanth J 2012-06-09, 02:00
Thanks Alan and Dmitriy for your thoughts.
I think we have two different approaches now. In one approach, if we encounter a null in dimension values we can just label it as "unknown" and use "NULL" string to represent rollups. Whereas, in other approach, if we encounter a null in dimension values, use the null value as such but use "*" or any other string for rollups. Both approaches looks good to me. Please let me know which one should I go ahead with. Thanks -- Prasanth On Jun 8, 2012, at 12:22 PM, Alan Gates wrote: > Option 1 (throwing an error) is bad. It violates "Pigs eat anything" (see http://pig.apache.org/philosophy.html). > > Do we need to give users an ability to name this unknown column? Why not just label it "unknown" and be done? > > Alan. > > On Jun 6, 2012, at 2:24 PM, Prasanth J wrote: > >> Hello everyone >> >> I would like to bring up this discussion about the ways for handling NULL values in dimensions specified for cubing. For example, if we have a dimension color with following values >> >> red >> blue >> null >> green >> >> how do we differentiate if the null value represent rollup of all colors values or actual null value? >> >> SQL way: >> There are 2 ways in which SQL server analysis services handles null values in dimensions >> 1) Throw error when it encounters null values in dimension values >> 2) Ignore error by adding the null values to UnknownMembers. By default UnknownMembers will be named as "Unknown". The name for UnknownMembers can also be specified by the user. >> >> Do we need to handle both ways in Pig? I think the first way (throwing error) is pretty straightforward. >> For the second way (ignoring error), what is the best way to provide support for user specified name for UnknownMembers? >> >> Please share your thoughts about how we can handle this scenario for different datatypes in Pig. >> >> Thanks >> -- Prasanth >> >
-
Re: Handle NULL values in Cube dimensionsJonathan Coveney 2012-06-09, 03:06
you could always make the value pluggable, going with Unknown for now, and
then down the line if we want, we could add an "ONNULL" value to the parser that sets it. 2012/6/8 Prasanth J <[EMAIL PROTECTED]> > Thanks Alan and Dmitriy for your thoughts. > > I think we have two different approaches now. > > In one approach, if we encounter a null in dimension values we can just > label it as "unknown" and use "NULL" string to represent rollups. Whereas, > in other approach, if we encounter a null in dimension values, use the null > value as such but use "*" or any other string for rollups. > > Both approaches looks good to me. Please let me know which one should I go > ahead with. > > Thanks > -- Prasanth > > On Jun 8, 2012, at 12:22 PM, Alan Gates wrote: > > > Option 1 (throwing an error) is bad. It violates "Pigs eat anything" > (see http://pig.apache.org/philosophy.html). > > > > Do we need to give users an ability to name this unknown column? Why > not just label it "unknown" and be done? > > > > Alan. > > > > On Jun 6, 2012, at 2:24 PM, Prasanth J wrote: > > > >> Hello everyone > >> > >> I would like to bring up this discussion about the ways for handling > NULL values in dimensions specified for cubing. For example, if we have a > dimension color with following values > >> > >> red > >> blue > >> null > >> green > >> > >> how do we differentiate if the null value represent rollup of all > colors values or actual null value? > >> > >> SQL way: > >> There are 2 ways in which SQL server analysis services handles null > values in dimensions > >> 1) Throw error when it encounters null values in dimension values > >> 2) Ignore error by adding the null values to UnknownMembers. By default > UnknownMembers will be named as "Unknown". The name for UnknownMembers can > also be specified by the user. > >> > >> Do we need to handle both ways in Pig? I think the first way (throwing > error) is pretty straightforward. > >> For the second way (ignoring error), what is the best way to provide > support for user specified name for UnknownMembers? > >> > >> Please share your thoughts about how we can handle this scenario for > different datatypes in Pig. > >> > >> Thanks > >> -- Prasanth > >> > > > >
-
Re: Handle NULL values in Cube dimensionsPrasanth J 2012-06-12, 05:53
Thanks Jon for your thoughts.
I have a patch which renames the null values in dimension values to "unknown" and use null for rollups. For a sample input tuple red, null, 12 a = cube inp by ($0, $1); the above query will emit following combinations red, unknown, 12 , unknown, 12 red, , 12 , , 12 Please let me know if anyone have different opinion on this. If the above choice looks good I can go ahead and submit the patch in JIRA. Thanks. Thanks -- Prasanth On Jun 8, 2012, at 11:06 PM, Jonathan Coveney wrote: > you could always make the value pluggable, going with Unknown for now, and > then down the line if we want, we could add an "ONNULL" value to the parser > that sets it. > > 2012/6/8 Prasanth J <[EMAIL PROTECTED]> > >> Thanks Alan and Dmitriy for your thoughts. >> >> I think we have two different approaches now. >> >> In one approach, if we encounter a null in dimension values we can just >> label it as "unknown" and use "NULL" string to represent rollups. Whereas, >> in other approach, if we encounter a null in dimension values, use the null >> value as such but use "*" or any other string for rollups. >> >> Both approaches looks good to me. Please let me know which one should I go >> ahead with. >> >> Thanks >> -- Prasanth >> >> On Jun 8, 2012, at 12:22 PM, Alan Gates wrote: >> >>> Option 1 (throwing an error) is bad. It violates "Pigs eat anything" >> (see http://pig.apache.org/philosophy.html). >>> >>> Do we need to give users an ability to name this unknown column? Why >> not just label it "unknown" and be done? >>> >>> Alan. >>> >>> On Jun 6, 2012, at 2:24 PM, Prasanth J wrote: >>> >>>> Hello everyone >>>> >>>> I would like to bring up this discussion about the ways for handling >> NULL values in dimensions specified for cubing. For example, if we have a >> dimension color with following values >>>> >>>> red >>>> blue >>>> null >>>> green >>>> >>>> how do we differentiate if the null value represent rollup of all >> colors values or actual null value? >>>> >>>> SQL way: >>>> There are 2 ways in which SQL server analysis services handles null >> values in dimensions >>>> 1) Throw error when it encounters null values in dimension values >>>> 2) Ignore error by adding the null values to UnknownMembers. By default >> UnknownMembers will be named as "Unknown". The name for UnknownMembers can >> also be specified by the user. >>>> >>>> Do we need to handle both ways in Pig? I think the first way (throwing >> error) is pretty straightforward. >>>> For the second way (ignoring error), what is the best way to provide >> support for user specified name for UnknownMembers? >>>> >>>> Please share your thoughts about how we can handle this scenario for >> different datatypes in Pig. >>>> >>>> Thanks >>>> -- Prasanth >>>> >>> >> >> |