Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Reading strings from SequenceFile


Copy link to this message
-
Re: Reading strings from SequenceFile
Andy Schlaikjer 2012-03-02, 00:13
Hi Luiz,

I wrote a few utils to simplify SequenceFile IO with Pig. They live in
the Elephant Bird project here:

https://github.com/kevinweil/elephant-bird/

See the section "Hadoop SequenceFiles and Pig" on that page.

If you have any questions, please feel free to post back to this thread.

Andy
On Wed, Feb 29, 2012 at 6:27 AM, Luiz Celso Gomes Jr
<[EMAIL PROTECTED]> wrote:
> Hey all,
>
> I imported a table from PostgreSQL using Sqoop, generating a
> SequenceFile. I'm now trying to read this file in pig using
> SequenceFileLoader. The load function doesn't seem to recognize the
> columns that were of type text in PostgreSQL. When I dump the
> relation, I get the key (integer) but none of the other fields.
>
> Here's the pig script:
>
> REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar; register
> /pesquisa.jar; register /usr/lib/sqoop/sqoop-1.3.0-cdh3u3.jar;
> DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
> univ = LOAD 'universities' USING SequenceFileLoader AS (a:int,
> b:chararray, c:chararray, d:chararray, e:chararray, f:chararray,
> g:chararray, h:chararray);
> DUMP univ;
>
> The SequenceFile class that Sqoop generates seems to be treating the
> text fields as string (class copied below).
>
> I'm new to Pig (love it) and Hadoop. I just want a simple way to be
> able to process my data (mostly text documents) in postgresql with
> Pig. I don't want to load directly from postgresql for performance
> reasons. I tried importing to HBase but had too many problems. I'm now
> trying with SequenceFiles from Sqoop. I'd be happy to hear suggestions
> of better approaches.
>
>
> Thanks!
> Luiz
>
> --
> // ORM class for universities
> // WARNING: This class is AUTO-GENERATED. Modify at your own risk.
> package pesquisa;
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.io.Writable;
> import org.apache.hadoop.mapred.lib.db.DBWritable;
> import com.cloudera.sqoop.lib.JdbcWritableBridge;
> import com.cloudera.sqoop.lib.DelimiterSet;
> import com.cloudera.sqoop.lib.FieldFormatter;
> import com.cloudera.sqoop.lib.RecordParser;
> import com.cloudera.sqoop.lib.BooleanParser;
> import com.cloudera.sqoop.lib.BlobRef;
> import com.cloudera.sqoop.lib.ClobRef;
> import com.cloudera.sqoop.lib.LargeObjectLoader;
> import com.cloudera.sqoop.lib.SqoopRecord;
> import java.sql.PreparedStatement;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.io.DataInput;
> import java.io.DataOutput;
> import java.io.IOException;
> import java.nio.ByteBuffer;
> import java.nio.CharBuffer;
> import java.sql.Date;
> import java.sql.Time;
> import java.sql.Timestamp;
> import java.util.Arrays;
> import java.util.Iterator;
> import java.util.List;
> import java.util.Map;
> import java.util.TreeMap;
>
> public class universities extends SqoopRecord  implements DBWritable, Writable {
>  private final int PROTOCOL_VERSION = 3;
>  public int getClassFormatVersion() { return PROTOCOL_VERSION; }
>  protected ResultSet __cur_result_set;
>  private Integer chave;
>  public Integer get_chave() {
>    return chave;
>  }
>  public void set_chave(Integer chave) {
>    this.chave = chave;
>  }
>  public universities with_chave(Integer chave) {
>    this.chave = chave;
>    return this;
>  }
>  private String nome;
>  public String get_nome() {
>    return nome;
>  }
>  public void set_nome(String nome) {
>    this.nome = nome;
>  }
>  public universities with_nome(String nome) {
>    this.nome = nome;
>    return this;
>  }
>  private String country;
>  public String get_country() {
>    return country;
>  }
>  public void set_country(String country) {
>    this.country = country;
>  }
>  public universities with_country(String country) {
>    this.country = country;
>    return this;
>  }
>  private String class_size;
>  public String get_class_size() {
>    return class_size;
>  }
>  public void set_class_size(String class_size) {
>    this.class_size = class_size;