Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Error while loading UTF-8 strings into bags


Copy link to this message
-
RE: Error while loading UTF-8 strings into bags
This could be a bug in TextDataParser due to the presence of empty
strings in the data.

Santhosh

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
[EMAIL PROTECTED]
Sent: Monday, February 16, 2009 6:05 AM
To: [EMAIL PROTECTED]
Subject: Error while loading UTF-8 strings into bags

Hi,

I'm trying to use utf-8 strings as follows:

phrases = load 'phrases' as (data: chararray, f: int);
a = group phrases by f;
b = foreach a generate group as f, phrases.data as data;
store b into 'grouped';

b = load 'grouped' as (f: int, data: bag{t: tuple(data: chararray)});
c = foreach b generate f, data;       -- just store in this sample
store c into 'final';

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2043: Unexpected
error during execution.

org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of
infinite loop caused by repeated empty string matches at line 1,
column 3.
at
org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalAction
s(TextDataParserTokenManager.java:619)
at
org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextD
ataParserTokenManager.java:568)
at
org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:623
)
at
org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:153)
at
org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:85)
at
org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:345)
at
org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42)
at
org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageCo
nverter.java:71)
at
org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConver
ter.java:79)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOp
erators.POCast.getNext(POCast.java:908)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOp
erators.POForEach.processPlan(POForEach.java:244)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOp
erators.POForEach.getNext(POForEach.java:198)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOper
ator.processInput(PhysicalOperator.java:226)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOp
erators.POForEach.getNext(POForEach.java:187)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
runPipeline(PigMapBase.java:203)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
map(PigMapBase.java:194)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$
Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)

Is there any way to use utf-8 strings in pig bags?
Thanks.