Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Performance Improvements?


Copy link to this message
-
Re: HBase Performance Improvements?
Hey Oliver,

Thanks a "billion" for the response -:)  I will take any code you can
provide even if it's a hack!  I will even send you an Amazon gift card -
not that you care or need it -:)

Can you share some performance statistics?  Thanks again.
On Wed, May 9, 2012 at 8:02 AM, Oliver Meyn (GBIF) <[EMAIL PROTECTED]> wrote:

> Heya Something,
>
> I had a similar task recently and by far the best way to go about this is
> with bulk loading after pre-splitting your target table.  As you know
> ImportTsv doesn't understand Avro files so I hacked together my own
> ImportAvro class to create the Hfiles that I eventually moved into HBase
> with completebulkload.  I haven't committed my class anywhere because it's
> a pretty ugly hack, but I'm happy to share it with you as a starting point.
>  Doing billions of puts will just drive you crazy.
>
> Cheers,
> Oliver
>
> On 2012-05-09, at 4:51 PM, Something Something wrote:
>
> > I ran the following MR job that reads AVRO files & puts them on HBase.
>  The
> > files have tons of data (billions).  We have a fairly decent size
> cluster.
> > When I ran this MR job, it brought down HBase.  When I commented out the
> > Puts on HBase, the job completed in 45 seconds (yes that's seconds).
> >
> > Obviously, my HBase configuration is not ideal.  I am using all the
> default
> > HBase configurations that come out of Cloudera's distribution:
>  0.90.4+49.
> >
> > I am planning to read up on the following two:
> >
> > http://hbase.apache.org/book/important_configurations.html
> > http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/
> >
> > But can someone quickly take a look and recommend a list of priorities,
> > such as "try this first..."?  That would be greatly appreciated.  As
> > always, thanks for the time.
> >
> >
> > Here's the Mapper. (There's no reducer):
> >
> >
> >
> > public class AvroProfileMapper extends AvroMapper<GenericData.Record,
> > NullWritable> {
> >    private static final Logger logger > > LoggerFactory.getLogger(AvroProfileMapper.class);
> >
> >    final private String SEPARATOR = "*";
> >
> >    private HTable table;
> >
> >    private String datasetDate;
> >    private String tableName;
> >
> >    @Override
> >    public void configure(JobConf jobConf) {
> >        super.configure(jobConf);
> >        datasetDate = jobConf.get("datasetDate");
> >        tableName = jobConf.get("tableName");
> >
> >        // Open table for writing
> >        try {
> >            table = new HTable(jobConf, tableName);
> >            table.setAutoFlush(false);
> >            table.setWriteBufferSize(1024 * 1024 * 12);
> >        } catch (IOException e) {
> >            throw new RuntimeException("Failed table construction", e);
> >        }
> >    }
> >
> >    @Override
> >    public void map(GenericData.Record record, AvroCollector<NullWritable>
> > collector,
> >                    Reporter reporter) throws IOException {
> >
> >        String u1 = record.get("u1").toString();
> >
> >        GenericData.Array<GenericData.Record> fields > > (GenericData.Array<GenericData.Record>) record.get("bag");
> >        for (GenericData.Record rec : fields) {
> >            Integer s1 = (Integer) rec.get("s1");
> >            Integer n1 = (Integer) rec.get("n1");
> >            Integer c1 = (Integer) rec.get("c1");
> >            Integer freq = (Integer) rec.get("freq");
> >            if (freq == null) {
> >                freq = 0;
> >            }
> >
> >            String key = u1 + SEPARATOR + n1 + SEPARATOR + c1 + SEPARATOR
> +
> > s1;
> >            Put put = new Put(Bytes.toBytes(key));
> >            put.setWriteToWAL(false);
> >            put.add(Bytes.toBytes("info"), Bytes.toBytes("frequency"),
> > Bytes.toBytes(freq.toString()));
> >            try {
> >                table.put(put);
> >            } catch (IOException e) {
> >                throw new RuntimeException("Error while writing to " +
> > table + " table.", e);
> >            }
> >
> >        }
> >        logger.error("------------  Finished processing user: " + u1);
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB