Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Hive Query with UDF


Copy link to this message
-
Re: Hive Query with UDF
Michael Segel 2012-10-18, 02:21
You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain it and keep it in lock step w Hive from Apache.
There are other options and designs but since this is for a commercial product. I'm not going to talk about them.

Keep in mind that Hive isn't a relational database per se and works on immutable flat files. So that's going to hurt you as well.

On Oct 17, 2012, at 9:13 PM, lohit <[EMAIL PROTECTED]> wrote:

> One idea is to write your own translation layer  which sits in between query and actual job submission.
> You would most likely end up having your own version of hive jar which has your translation changes on top of HIVE sources.
> This has the added advantage that users need not change their queries, they would do it as normal HIVE query, like
>    select * from cc_details where first_name = 'Ann'
> Disadvantage is you have to maintain a fork.
>
> Even otherwise, my initial guess is you might have to modify command line parser which does encrypt once instead of for every record
>
> 2012/10/17 Sam Mohamed <[EMAIL PROTECTED]>
> Thanks for the quick response.
>
> The idea is that we are selling the encryption product for customers who use HDFS.  Hence, encryption is a requirement.
>
> Any other suggestions.
>
> Sam
> ________________________________________
> From: Michael Segel [[EMAIL PROTECTED]]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Hive Query with UDF
>
> You don't need an UDF...
>
> You encrypt the string 'Ann' first then use that encrypted value in the Select statement.
>
> That should make things a bit simpler.
>
>
>
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <[EMAIL PROTECTED]> wrote:
>
> > I have some encrypted data in an HDFS csv, that I've created a Hive table for, and I want to run a Hive query that first encrypts the query param, then does the lookup.  I have a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record in the table.  I want it do the encryption once, then do the matchup.  I've tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann') from cc_details limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying **ann** is not a known column
> >
> > Thanks,
> >
> > Sam
>
>
>
>
> --
> Have a Nice Day!
> Lohit