Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> bucketing on a column with millions of unique IDs


Copy link to this message
-
bucketing on a column with millions of unique IDs
Hi guys,

I plan to bucket a table by "userid" as I'm going to do intense calculation
using "group by userid". there are about 110 million rows, with 7 million
unique userid, so my question is what is a good number of buckets for this
scenario, and how to determine number of buckets?

Any input is apprecaited :)

Echo