Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> regex_extract in hive


Copy link to this message
-
Re: regex_extract in hive
The parse_url UDF works in general but the common use case is querying
apache logs which do not include the protocol or host portions - you need to
include a concat() call.

Also, the docs on parse_url are wrong around the query parameter parsing
feature. The describe statement above shows the actual syntax.

In your case, you'll likely want something like:

SELECT parse_url( concat("http://www.foo.com/", request), 'QUERY', 'tag')
FROM log_table;

Cheers,
-Nick

On Mon, Mar 8, 2010 at 8:10 PM, 김영우 <[EMAIL PROTECTED]> wrote:

> Hi Prakash,
>
> You can extract query string from url using 'parse_url' udf.
>
> hive> describe function parse_url;
> OK
> parse_url(url, partToExtract[, key]) - extracts a part from a URL
> Time taken: 0.024 seconds
> hive>
> hive> select parse_url('
> http://www.example.com/searches/tagged_with?company=2-Opico&page=8&product=36154-7653-BACKUP-PLATE-F-STORAGE-STAND&tag=demco',
> 'QUERY', 'tag') from r;
> .
> .
> Ended Job = job_200911251712_2046
> OK
> demco
> Time taken: 19.405 seconds
> hive>
>
>
> Regards,
> Youngwoo
>
>
> 2010/3/9 prakash sejwani <[EMAIL PROTECTED]>
>
> Hi All,
>>         i have a query below
>>         SELECT regexp_extract(resource,'/\&tag=([^\&]+)/') FROM a_log;
>>
>>        it gives black result
>>
>>        the sample resource string is like this
>> "/searches/tagged_with?company=2-Opico&page=8&product=36154-7653-BACKUP-PLATE-F-STORAGE-STAND&tag=demco"
>> i want to extract demco out of it
>>
>> please help me with this
>>
>>
>> thanks'
>> prakash
>> Econify Infotech
>> Mumbai
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB