So working out what people like is a big deal - if nothing else, so you can show them more stuff they like more.
Many repositories of cultural data (films, books, video games, board games, etc) collect user ratings for this purpose, often on a five point scale (typically rendered as "one to five
or a like/neutral/dislike system.
This is all well and good, it can benefit both the user and the supplier.
However, there are various issues; interpreting such sparse data-sets is an active area of research.
But one issue I'd like to focus on is that these systems will rate things most people like above those that some people love and a larger number dislike.
If you're trying to select things individual people will massively enjoy, this is sub-optimal.
I propose an additional system, collecting very limited additional data, which might help solve this problem.
Many repositories already allow users to add items to a list of favourites.
Suppose that we collect and process user favourites in a very particular, and perhaps counter-intuitive way:
Basically, we severely throttle the user's ability to add favourites. Perhaps they can "favourite" only one item per week or month, or one for every 10 items rated, per some other
measure of activity, or something along those lines.
Excess favourites are added to a queue (perhaps the order of which can be manipulated by power-users if desired), so there is little need for a typical user to worry about random
bursts of favourites or remembering to come back to rate things later; everything will resolve itself in time provided they don't favourite everything they come across. (If they do, no
harm done; they'll just have to click through to their "pending favourites" list if they want to see them.)
Now we have a list of items which each user has selected as highly valued for themselves; those who rate everything highly can't pollute the data too much, those who dislike something
don't matter for cult hits, and in combination with the standard rating system, patterns of favouritism should be very amenable to clustering algorithms. This should therefore improve