Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Hive queries not retrieving any values from LZO compressed tables


+
Bejoy Ks 2011-09-29, 16:50
Copy link to this message
-
Re: Hive queries not retrieving any values from LZO compressed tables
Hello Bejoy -

Even though you defined your target table by specifying the input format,
you still have to ensure the output of your population query is compressed.
 Make sure the following are set before executing your "insert overwrite
table " query.

set mapred.output.compress=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
set hive.exec.compress.output=true;

(This is assuming lzo is installed on your cluster; see
https://github.com/toddlipcon/hadoop-lzo if it's not.)

cheers,
-James
On Thu, Sep 29, 2011 at 9:50 AM, Bejoy Ks <[EMAIL PROTECTED]> wrote:

> Hi Experts
>             I'm currently struck when using compression with hive. My
> requirement is I need to store data in hive tables in LZO compressed format
> and should be able to retrieve the same successfully. However with the hive
> wiki and mailing list archives I was able to succeed only on the first part
> 'Storing data in LZO compressed format in hive tables' but failing with the
> second part 'Retrieving data back from the table'.
>
> I have a non partitioned table that holds uncompressed text data and want
> to load this data into a partitioned table in LZO format. I used Hive
> Dynamic Partition concept to populate data into partitions for the same. On
> execution of the hive QL  data is loaded in compressed format on target
> table, the hive CLI showed the no of rows loaded which was correct. Also I
> verified the directory that corresponded to the table in hdfs, there also I
> could find sub folders and files within with .deflate extension in sub
> folders. But when I fire any query on this table I can see the map reduce
> job executing and showing an OK finally but No results are returned under
> any conditions. It is the same with a simple 'Select *' query as well, Just
> an OK message with no rows returned. On the hive CLI I enabled the following
> parameters.
>
> //for output compress
> set hive.exec.compress.output=true;
> //for dynamic partitions
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
>
> I have posted below the queries i have executed from my end. Could any one
> please guide me on how you have actually implemented LZO on your hive table
> or help me out in crossing this hurdle?
>
> Thanks in Advance
>
> Source table
> CREATE EXTERNAL TABLE bejoy_test_employee(eemployeeId Int, name
> String,codeId Int,country String )
> COMMENT 'test table for LZO compression'
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/u/bejoy/external_tables/bejoy_test_employee' ;
>
> Target Table
> CREATE EXTERNAL TABLE bejoy_test_employee_partn(employeeId Int, name
> String,codeId Int)
> COMMENT 'test table for LZO compression'
> PARTITIONED BY(country String)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
> LOCATION '/u/bejoy/external_tables/bejoy_test_employee_partn' ;
>
> Query used for Dynamic Partitions
> INSERT OVERWRITE TABLE bejoy_test_employee_partn PARTITION(country)
> SELECT eemployeeId,name,codeId,country FROM bejoy_test_employee;
>
>
> //Hive CLI log once the Dynamic Partitions query is executed
> Partition default.bejoy_test_employee_partn{country=Australia} stats:
> [num_files: 1, num_rows: 0, total_size: 62114]
> Partition default.bejoy_test_employee_partn{country=Canada} stats:
> [num_files: 1, num_rows: 0, total_size: 62327]
> Partition default.bejoy_test_employee_partn{country=France} stats:
> [num_files: 1, num_rows: 0, total_size: 62377]
> Partition default.bejoy_test_employee_partn{country=Germany} stats:
> [num_files: 1, num_rows: 0, total_size: 61667]
> Partition default.bejoy_test_employee_partn{country=India} stats:
> [num_files: 1, num_rows: 0, total_size: 62341]
> Partition default.bejoy_test_employee_partn{country=Italy} stats: