Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Handle NULL values in Cube dimensions

Prasanth J 2012-06-06, 21:24
Dmitriy Ryaboy 2012-06-07, 00:41
Prasanth J 2012-06-08, 02:41
Copy link to this message
Re: Handle NULL values in Cube dimensions
Option 1 (throwing an error) is bad.  It violates "Pigs eat anything" (see http://pig.apache.org/philosophy.html).  

Do we need to give users an ability to name this unknown column?  Why not just label it "unknown" and be done?


On Jun 6, 2012, at 2:24 PM, Prasanth J wrote:

> Hello everyone
> I would like to bring up this discussion about the ways for handling NULL values in dimensions specified for cubing. For example, if we have a dimension color with following values
> red
> blue
> null
> green
> how do we differentiate if the null value represent rollup of all colors values or actual null value?
> SQL way:
> There are 2 ways in which SQL server analysis services handles null values in dimensions
> 1) Throw error when it encounters null values in dimension values
> 2) Ignore error by adding the null values to UnknownMembers. By default UnknownMembers will be named as "Unknown". The name for UnknownMembers can also be specified by the user.
> Do we need to handle both ways in Pig? I think the first way (throwing error) is pretty straightforward.
> For the second way (ignoring error), what is the best way to provide support for user specified name for UnknownMembers?
> Please share your thoughts about how we can handle this scenario for different datatypes in Pig.
> Thanks
> -- Prasanth
Prasanth J 2012-06-09, 02:00
Jonathan Coveney 2012-06-09, 03:06
Prasanth J 2012-06-12, 05:53