Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Setting Charset in getBytes() call.


Copy link to this message
-
Re: Setting Charset in getBytes() call.
Also, on the topic of byte arrays - we should do one better than hbase and
go for ByteBuffers. They are more reusable and long-lived buffers can be
allocated outside the heap and take advantage of OS I/O optimizations.

The current reliance on Text is in my opinion the greatest deficit of the
client API- I have been fiddling with creating a new API, similar to the
work Keith did with typo, but instead looking at introducing generic
superclasses to reduce the API profile.
On Oct 29, 2012 9:22 PM, "Drew Farris" <[EMAIL PROTECTED]> wrote:

> I have always wondered if there were cases in the API where users are
> forced to use Text when they would otherwise prefer byte[], e.g: stuffing a
> non utf8 byte[] into a Text object to facilitate storage or sorting. Not
> entirely sure whether Text would complain if this were the case. I suspect
> we should seek to elimimate these if they currently exist.
>
> Speaking strictly of user data, I agree that fundamentally, every operation
> should be based upon byte[]. API methods providing Text and String based
> calls should be convience methods where the conversion of text to/from
> bytes is handled explicitly (not relying on platform default encoding or
> properties) and transparently (doing something sensible when the user
> doesn't care or is unaware of the issues surrounding character encoding).
>
> Regarding utf8, is there a need to support arbitrary character encodings
> when persisting bytes to accumulo? Think byte order for lexical sorting,
> fixed vs variable length, etc. Perhaps it would not be unreasonable to
> support explicitly stating a character encoding on table creation?
>
> Drew
>  On Oct 29, 2012 8:47 PM, "Josh Elser" <[EMAIL PROTECTED]> wrote:
>
> > +1 Mike.
> >
> > 1. It would be hard for me to believe Key/Value are ever handled
> > internally in terms of Strings, but, if such a case does exist, it would
> be
> > extremely prudent to fix.
> >
> > 2. FWIW, the Shell does use ISO-8859-1 as its charset which is referenced
> > by other commands [1,2]. It would be good to double check all of the
> other
> > commands.
> >
> > [1] https://github.com/apache/**accumulo/blob/trunk/core/src/**
> > main/java/org/apache/accumulo/**core/util/shell/Shell.java<
> https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/Shell.java
> >
> > [2] https://github.com/apache/**accumulo/blob/trunk/core/src/**
> > main/java/org/apache/accumulo/**core/util/shell/commands/**
> > InsertCommand.java<
> https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/commands/InsertCommand.java
> >
> >
> > On 10/29/2012 8:27 PM, Michael Flester wrote:
> >
> >> I agree with Benson entirely with one caveat. It seems to me that there
> >> might be two categories of things being discussed
> >>
> >>    1. User data (keys and values)
> >>    2. Ancillary things needed for operation of Accumulo (passwords).
> >>
> >> These could well be considered separately. Trying to do anything with
> >> keys and values other than treating them as bytes all of the time
> >> I find quite scary.
> >>
> >> And if this is only being done to satisfy pmd or findbugs, those tools
> >> can be convinced to modify their reporting about this issue.
> >>
> >>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB