Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Strange error in Hive - Insert INTO


Copy link to this message
-
Re: Strange error in Hive - Insert INTO
Sanjay Subramanian 2013-08-14, 17:44
Another reason I can think of is possibly some STRING column in your table has a "DELIMITER" character…Like once in production I had tab spaces in the string and my table was also defined using TAB as delimiter

From: Stephen Sprague <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Wednesday, August 14, 2013 8:43 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: Strange error in Hive - Insert INTO

Hi Jerome,
That's a grandiose sql statement you got there! :)    I find that if you break up those nested queries into simple CTAS (Create Table AS) statements and create a cascading effect of referring to the table in the previous step it makes debugging *so* much easier.  In other SQL dialects like DB2 this is facilitated by the WITH keyword. Maybe the Hive gurus will implement that some day.   But that's a topic for another day.

So all that said, i see that the columns in your create table statement don't match the columns in your outermost select statement.  In particular, DT_JOUR is listed as the 6th column in your create table statement but it appears to be the 2nd column in your select statement. So something looks fishy there.

My guess is ultimately you're missing a comma somewhere in the select list so hive is eating an column as a column alias and all your data is skewed over by one column. This happens not so infrequently since it is valid sql.

Long winded answer to a simple question. Apologies up front!
On Wed, Aug 14, 2013 at 5:35 AM, Jérôme Verdier <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi everybody,

I faced a strange error in Hive today.

I have launch a hive script to make some calculations, joins, union, etc... and then insert these results in over hive table.

Everything is working fine (.hql is working, full ok, data are imported), but one field (CO_RGRP_PRODUITS) is very strange.

after the insert, CO_RGRP_PRODUITS is looking like a TIMESTAMP (1970-01-01 01:00:00) instead of being a simple STRING.

I precise that source field are simple string like this  : 0101380,  for example

What is going wrong here.

You can find my script below (create table and .hql insert/calculations)

Thanks for your help.
INSERT SCRIPT :
--THM_CA_RGRP_PRODUITS_JOUR
CREATE TABLE default.THM_CA_RGRP_PRODUITS_JOUR (
    CO_SOCIETE BIGINT,
    TYPE_ENTITE STRING,
    CODE_ENTITE STRING,
    TYPE_RGRP_PRODUITS STRING,
    CO_RGRP_PRODUITS STRING,
    DT_JOUR TIMESTAMP,
    MT_CA_NET_TTC FLOAT,
    MT_OBJ_CA_NET_TTC FLOAT,
    NB_CLIENTS FLOAT,
    MT_CA_NET_TTC_COMP FLOAT,
    MT_OBJ_CA_NET_TTC_COMP FLOAT,
    NB_CLIENTS_COMP FLOAT);

INSERT SCRIPT :

INSERT INTO TABLE THM_CA_RGRP_PRODUITS_JOUR

  SELECT
          1                                                  as CO_SOCIETE,-- A modifier => variable
          '2013-01-02 00:00:00.0'                                     as DT_JOUR, -- A modifier => variable
          'MAG'                                                       as TYPE_ENTITE,
          m.co_magasin                                                as CODE_ENTITE,
          'FAM'                                                       as TYPE_RGRP_PRODUITS,
          sourceunion.CO_RGRP_PRODUITS                                as CO_RGRP_PRODUITS,
          SUM(MT_CA_NET_TTC)                                          as MT_CA_NET_TTC,
          SUM(MT_OBJ_CA_NET_TTC)                                      as MT_OBJ_CA_NET_TTC,
          SUM(NB_CLIENTS)                                             as NB_CLIENTS,
          SUM(MT_CA_NET_TTC_COMP)                                     as MT_CA_NET_TTC_COMP,
          SUM(MT_OBJ_CA_NET_TTC_COMP)                                 as MT_OBJ_CA_NET_TTC_COMP,
          SUM(NB_CLIENTS_COMP)                                        as NB_CLIENTS_COMP

        FROM (
  SELECT
            mtransf.id_mag_transfere             as ID_MAGASIN,
            v.co_famille                         as CO_RGRP_PRODUITS,
            sum(v.mt_ca_net_ttc)                 as MT_CA_NET_TTC,
            0                                    as MT_OBJ_CA_NET_TTC,
            0                                    as NB_CLIENTS,
            sum(v.mt_ca_net_ttc * (CASE WHEN mtransf.flag_mag_comp = 'NC' THEN 0 ELSE 1 END))
                                                 as MT_CA_NET_TTC_COMP,
            0                                    as MT_OBJ_CA_NET_TTC_COMP,
            0                                    as NB_CLIENTS_COMP
          FROM default.VENTES_FAM v
          JOIN default.kpi_magasin mtransf
          ON  mtransf.co_societe = CASE WHEN v.co_societe = 1 THEN 1 ELSE 2 END
          AND mtransf.id_magasin = v.id_magasin
          WHERE
              mtransf.co_societe    = 1 -- Modifier variable
          AND v.dt_jour             = '2013-01-02 00:00:00.0' -- Modifier variable
          GROUP BY
            mtransf.id_mag_transfere,
            v.co_famille

  UNION ALL

  SELECT
            mtransf.id_mag_transfere             as ID_MAGASIN,
            v.co_famille                         as CO_RGRP_PRODUITS,
            0                                    as MT_CA_NET_TTC,
            0                                    as MT_OBJ_CA_NET_TTC,
            sum(nb_client)                       as NB_CLIENTS,
            0                                    as MT_CA_NET_TTC_COMP,
            0                                    as MT_OBJ_CA_NET_TTC_COMP,
            sum(nb_client * (CASE WHEN mtransf.flag_mag_comp = 'NC' THEN 0 ELSE 1 END))
                                                 as NB_CLIENTS_COMP
          FROM default.nb_clients_mag_fam_j v
          JOIN default.kpi_magasin mtransf
          ON  mtransf.co_societe = CASE WHEN v.co_societe = 1 THEN 1 ELSE 2 END
          AND mtransf.id_magasin = v.id_magasin
          WHERE