Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - LZO Compression on trunk


Copy link to this message
-
Re: LZO Compression on trunk
Bennie Schut 2010-02-05, 22:37
Hadoop 0.20.1 and hive trunk from this week. Monday I'll try and use an
older version of hive to see if that helps. Perhaps also "gz" to see if
it's compression in general.

Yongqiang He wrote:
> Hi Bennie,
> Can you post your hadoop version and hive version?
>
> Thanks
> Yongqiang
>
>
> On 2/5/10 10:05 AM, "Zheng Shao" <[EMAIL PROTECTED]> wrote:
>
>  
>> That seems to be a bug.
>> Are you using hive trunk or any release?
>>
>>
>> On 2/5/10, Bennie Schut <[EMAIL PROTECTED]> wrote:
>>    
>>> I have a tab separated files I have loaded it with "load data inpath"
>>> then I do a
>>>
>>> SET hive.exec.compress.output=true;
>>> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
>>> SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
>>> select distinct login_cldr_id as cldr_id from chatsessions_load;
>>>
>>> Ended Job = job_201001151039_1641
>>> OK
>>> NULL
>>> NULL
>>> NULL
>>> Time taken: 49.06 seconds
>>>
>>> however if I start it without the set commands I get this:
>>> Ended Job = job_201001151039_1642
>>> OK
>>> 2283
>>> Time taken: 45.308 seconds
>>>
>>> Which is the correct result.
>>>
>>> When I do a "insert overwrite" on a rcfile table it will actually
>>> compress the data correctly.
>>> When I disable compression and query this new table the result is correct.
>>> When I enable compression it's wrong again.
>>> I see no errors in the logs.
>>>
>>> Any idea's why this might happen?
>>>
>>>
>>>
>>>      
>
>
>