Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - regex_extract in hive


Copy link to this message
-
Re: regex_extract in hive
Nick Dimiduk 2010-03-10, 19:37
The parse_url UDF works in general but the common use case is querying
apache logs which do not include the protocol or host portions - you need to
include a concat() call.

Also, the docs on parse_url are wrong around the query parameter parsing
feature. The describe statement above shows the actual syntax.

In your case, you'll likely want something like:

SELECT parse_url( concat("http://www.foo.com/", request), 'QUERY', 'tag')
FROM log_table;

Cheers,
-Nick

On Mon, Mar 8, 2010 at 8:10 PM, 김영우 <[EMAIL PROTECTED]> wrote:

> Hi Prakash,
>
> You can extract query string from url using 'parse_url' udf.
>
> hive> describe function parse_url;
> OK
> parse_url(url, partToExtract[, key]) - extracts a part from a URL
> Time taken: 0.024 seconds
> hive>
> hive> select parse_url('
> http://www.example.com/searches/tagged_with?company=2-Opico&page=8&product=36154-7653-BACKUP-PLATE-F-STORAGE-STAND&tag=demco',
> 'QUERY', 'tag') from r;
> .
> .
> Ended Job = job_200911251712_2046
> OK
> demco
> Time taken: 19.405 seconds
> hive>
>
>
> Regards,
> Youngwoo
>
>
> 2010/3/9 prakash sejwani <[EMAIL PROTECTED]>
>
> Hi All,
>>         i have a query below
>>         SELECT regexp_extract(resource,'/\&tag=([^\&]+)/') FROM a_log;
>>
>>        it gives black result
>>
>>        the sample resource string is like this
>> "/searches/tagged_with?company=2-Opico&page=8&product=36154-7653-BACKUP-PLATE-F-STORAGE-STAND&tag=demco"
>> i want to extract demco out of it
>>
>> please help me with this
>>
>>
>> thanks'
>> prakash
>> Econify Infotech
>> Mumbai
>>
>>
>