Slightly less "hackish" way to do this without joins is to write custom UDF
that will take data.BLOCK__OFFSET__INSIDE__FILE as input parameter and
return the corresponding data from the small file. If you mark it
"deterministic" using @UDFType(deterministic = true), the performance
should be quite good.

To avoid the full table scan, partitioning is IMHO the best way to speed
things up.

Best regards,
J. Dolinar
On Thu, Jun 27, 2013 at 11:18 AM, Peter Marron <
[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB