Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Error while loading UTF-8 strings into bags


Copy link to this message
-
RE: Error while loading UTF-8 strings into bags
This could be a bug in TextDataParser due to the presence of empty
strings in the data.

Santhosh

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
[EMAIL PROTECTED]
Sent: Monday, February 16, 2009 6:05 AM
To: [EMAIL PROTECTED]
Subject: Error while loading UTF-8 strings into bags

Hi,

I'm trying to use utf-8 strings as follows:

phrases = load 'phrases' as (data: chararray, f: int);
a = group phrases by f;
b = foreach a generate group as f, phrases.data as data;
store b into 'grouped';

b = load 'grouped' as (f: int, data: bag{t: tuple(data: chararray)});
c = foreach b generate f, data;       -- just store in this sample
store c into 'final';

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2043: Unexpected
error during execution.

org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of
infinite loop caused by repeated empty string matches at line 1,
column 3.
at
org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalAction
s(TextDataParserTokenManager.java:619)
at
org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextD
ataParserTokenManager.java:568)
at
org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:623
)
at
org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:153)
at
org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:85)
at
org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:345)
at
org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42)
at
org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageCo
nverter.java:71)
at
org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConver
ter.java:79)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOp
erators.POCast.getNext(POCast.java:908)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOp
erators.POForEach.processPlan(POForEach.java:244)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOp
erators.POForEach.getNext(POForEach.java:198)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOper
ator.processInput(PhysicalOperator.java:226)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOp
erators.POForEach.getNext(POForEach.java:187)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
runPipeline(PigMapBase.java:203)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
map(PigMapBase.java:194)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$
Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)

Is there any way to use utf-8 strings in pig bags?
Thanks.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB