|
|
-
How are nulls represented in data?
Pradeep Kamath 2010-08-09, 16:42
Hi, What value does hive expect in the data for a column to be treated as null? I tried some permutations on a text data based table but couldn't figure out what the correct representation was. I tried empty string, the string NULL and the string null for a string column and in all three cases the "is null" operator returned false.
A couple of related questions: - Does the representation of null depend on the type of the column - is it different for string Vs non-string columns? - Is the representation of null different for different storage formats - text Vs RCFile Vs SequenceFile - I am particularly interested in text and RCFile.
Thanks in advance,
Pradeep
-
Re: How are nulls represented in data?
Ning Zhang 2010-08-09, 18:46
How it is serialized/deserialized is determined by specific serde. NULL is serialized as \N by SimpleLazySerDe (default serde for text). RCFile (ColumnarSerDe) uses the same default parameters as LazySimpleSerDe.
Unless I missed something, NULL serialization/deserialization is type independent (at least in LazySimpleSerDe).
On Aug 9, 2010, at 9:42 AM, Pradeep Kamath wrote:
Hi, What value does hive expect in the data for a column to be treated as null? I tried some permutations on a text data based table but couldn’t figure out what the correct representation was. I tried empty string, the string NULL and the string null for a string column and in all three cases the “is null” operator returned false.
A couple of related questions: - Does the representation of null depend on the type of the column – is it different for string Vs non-string columns? - Is the representation of null different for different storage formats – text Vs RCFile Vs SequenceFile – I am particularly interested in text and RCFile.
Thanks in advance,
Pradeep
-
Re: How are nulls represented in data?
yongqiang he 2010-08-09, 20:07
Yes. In LazySimpleSerde/SequenceFile/TextFile, "\N" is used as NULL. (It is a table property: serialization.null.format)
In ColumnSerDe/RCFile, there is no NULL stored. (zero byte, column byte length is zero). But RCFile/ColumnarSerde also use this property when do serializing to determine if a column is a null or not. ( This is unavoidable because client can only pass a string to serde and let serde serialize it. need some special charater to represent NULL).
On Mon, Aug 9, 2010 at 11:46 AM, Ning Zhang <[EMAIL PROTECTED]> wrote: > How it is serialized/deserialized is determined by specific serde. NULL is > serialized as \N by SimpleLazySerDe (default serde for text). RCFile > (ColumnarSerDe) uses the same default parameters as LazySimpleSerDe. > Unless I missed something, NULL serialization/deserialization is type > independent (at least in LazySimpleSerDe). > On Aug 9, 2010, at 9:42 AM, Pradeep Kamath wrote: > > Hi, > What value does hive expect in the data for a column to be treated as > null? I tried some permutations on a text data based table but couldn’t > figure out what the correct representation was. I tried empty string, the > string NULL and the string null for a string column and in all three cases > the “is null” operator returned false. > > A couple of related questions: > - Does the representation of null depend on the type of the column – is it > different for string Vs non-string columns? > - Is the representation of null different for different storage formats – > text Vs RCFile Vs SequenceFile – I am particularly interested in text and > RCFile. > > Thanks in advance, > > Pradeep >
|
|