# Dewey Decimal System-Based Annotation Tangentiality Measurement Function

Measure the rate of off-topic veer through Dewey decimal codes
 (+3) [vote for, against]

I want this to be more widely applied than just the HB but I've just spent twenty minutes looking for another category and couldn't find one. It does, however, apply here as well as elsewhere. I do not like this being in this category.

Anyway:

On the Halfbakery and elsewhere, particularly on social networking sites and fora, discussions frequently veer off- topic at a rate of knots. Unfortunately there seems to be no way of measuring this. Until now.

The Dewey Decimal System could be applied to smaller texts than books, down perhaps to the level of individual words. It should therefore be possible to assign Dewey numbers to ideas and annotations here, and to posts and replies, comments, retweets and so forth elsewhere. Topicality is inversely proportional to the absolute value of the difference between the Dewey numbers assigned to consecutive comments. A line graph could be plotted of these differences with reference to both the original idea/post or consecutive annotations. This graph can then be smoothed into a curve and a tangent line can be placed upon this curve and measured.

Taking this slightly and perhaps implausibly further, an algorithm could be plotted to attribute certain items of vocabulary to Dewey classifications. For instance, if words such as "electron" and "momentum" are used in a particular text, it's probably about physics, and if "neodymium" and "enthalpy" occur, it's probably chemistry, and so forth.

This can be done! There IS a way of crudely measuring the tangent of off-topicality.

 — nineteenthly, Apr 05 2017

 Excellent.

What is the DD number for excellent?
 — calum, Apr 05 2017

Probably 170 - ethics.
 — nineteenthly, Apr 05 2017

 If you took every HB topic, and paired it with it's DD counterpart, multiplied by 3, and then divided by the corresponding LC (Library of Congress number), you would have a big frigging mess.

 However the various patent office classification numbers, could be used to make matters worse.

Seriously, dance with the gal you brought to the dance. The HB classification system can tell you how far the discussion has wandered, can't it?
 — popbottle, Apr 05 2017

 The first level of HB classification is nominal so it won't work for big veers. However, I'm definitely open to other systems than Dewey.

I don't think you'd get a mess. It depends on the range of data you consider.
 — nineteenthly, Apr 06 2017

 I have some very early results on this - but have had to trim the number of topics down to the low hundreds before running out of memory.

 Interestingly, when topics are further trimmed to consist only of nouns, they seem to occupy more robust and cohesive spaces, at least compared to topics consisting of whistful, thoughtful or abstract notions.

 Grouping words together and collating the top-7 co- occurring words to form clusternames yielded some interesting and revealing topic clusters - so I've got that part (sort of) working.

 Applying these clusters to annotations and classifying each of them is then doable - however, how do you measure "distance" between clusters? OK, so you could assign a Dewey number to each one, and hope for the best, but for example, it just isn't the case that Religion (200) is closer to Technology (600) than Literature (800). I'm sure there are other better examples than that.

 It *could* be the case that there exist super-clusters where existing clusters have a higher likelihood of association, however given the data available (i.e. the halfbakery) I'm getting some nice clusters like { coffee- cup-mug -tea-hot-heat-drink }, and {space-moon-earth - orbit-mars-gravity-parking} and {ideas-halfbakery-idea - bakers-croissant-hb-baked } and other "outlying" clusters such as the rather dubious { sex-porn-condoms -bacon-ed- hot-sexual }

So the other problem with tangent off-topicality is identifying the difference between a series of riffs on a theme, each of which might include a degree of offtopicality, vs a singular wildly tangential veer off into realms new. It's a case of comparing apples and ¶.
 — zen_tom, Apr 10 2017

I find that most exquisite!
 — nineteenthly, Apr 10 2017

Victor Basta: Request vector, over.
Captain Oveur: What?
Tower voice: Flight 2-0-9'er cleared for vector 324.
Roger Murdock: We have clearance, Clarence.
Captain Oveur: Roger, Roger. What's our vector, Victor?
Tower voice: Tower's radio clearance, over!
Captain Oveur: That's Clarence Oveur. Over.
 — Ian Tindale, Apr 10 2017

Well, gods know we could use some sort of system for keeping tangentiality in check and preventing ideas from drifting off in wild and sundry directions. Speaking of wild and sundry directions, did I ever tell you about the Intercalary's work for (and I use the word "for" quite wrongly) the US Geological Survey in southern Kadugistan? It was just after the Kurdic Revolt, so obviously most lines of communication were down, and he had to fall back on the "yodelling herdsmen" to relay his information back to base camp. Of course, since he had only a rudimentary grasp of Kadugistani Yodellese, I suppose the misunderstanding that lead to the inadvertent deployment of two cruisers and a gunboat to the Altai mountains were as inevitable as they were regrettable.
 — MaxwellBuchanan, Apr 10 2017

Indeed. I think the measurement under scrutiny evaluates as a vector quantity rather than a scalar amount. Not only a straight vector but a curvy one, for example a quadratic or cubic bezier spline.
 — Ian Tindale, Apr 11 2017

Surely some meta-data concerning the number of links between items should be available and should dictate the cloudspace to reduce the topological complexity of crossings in whatever diagram shape emerges.
 — RayfordSteele, Apr 11 2017

