h a l f b a k e r yA riddle wrapped in a mystery inside a rich, flaky crust
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
This programs looks though a document, finds any names and blacks its out with a randomly generated name
(which is psudorandom, but cannot be reverted because it will be a limited pool solely used to allow people to know whose talking in the article like "john doe, jackie dan, etc..."; alternatively
you can just say "Guy1, Guy2, etc...")
This will be useful for wikileaks, as it would speed up redaction to protect some people in the article being leaked.
Random name generator
http://www.kleimo.com/random/name.cfm "...uses data from the US census..." [normzone, Aug 29 2010]
[link]
|
|
Applications I can think of would require a very, very
low error rate at distinguishing names from other text. The
most promising approach, so it seems to me, would be brute
force, relying on a huge database of names. Would work
even better in a context (e.g. medical records) where you
had a database certain to contain every name that would
need to be redacted. |
|
|
[bigsleep] Sure, and s/wedding guests/terrorists/. And
s/military installation/Chinese embassy. And s/peasant/Viet
Cong/ And s/cathedral/arms factory/ and s/sports
stadium/troop concentration/. |
|
|
We could begin by assuming that all the non-
identified words are names and then tag all the
word/name crossover words, then also black all
examples of capitalized words in the middle of
sentences. seems like it should be very easy even
for an amateur programmer. Then if you wanted to
get tricky you could use the sentence structure
deducing software used in MS word to identify the
subject of every sentence and redact it unless it was
clearly not a name. |
|
|
[WcW] //begin by assuming that all the non- identified
words
are names// Good. |
|
|
//structure deducing software used in
MS word// Bad, because complex, therefore
untrustworthy.
(e.g., you'd also need foreign-language-detection to
protect
the sentence-structure-deducing algorithm, and then what
if
some joker used Igpay Atinlay, or leetspeak, ... and so on). |
|
|
Moreover, you'd need to redact nicknames like "Red," or
"Chuck" (and on the internet, you
can't
rely on capitalization) and noms de web (could it tell that
"WcW" was a name but not "W3C" ?). |
|
|
I don't think this would help much. "Guy1, CEO of Pharmacia,
today said....". Redacting is more complex than just the
names, I think. |
|
|
[MaxwellBuchanan] ...which is why the best solution is to
apply the filter when the documents are created. Require
the person writing the document to use some simple
markup to flag redactable passages. |
|
|
This avoids the need for unreliable machine intelligence,
and requires the minimum of expensive human
intelligence (since no one has an easier time
understanding a passage than the person writing it). It
also solves the problem you raise of identifying redactable
passages that aren't names. |
|
|
This is more feasible than perhaps it seems, if documents
are created within the organization's electronic record-
keeping and internal-communication system. I work in
such an environment, and the system is a powerful tool for
imposing standard format on documents as they are
written. |
|
|
// unreliable machine intelligence // |
|
|
Where ? Where ? Show us ! We've always wanted to see that .... |
|
|
Mark the documents, Mark. |
|
|
Don't feel bad, [8th]. Intelligence is overrated. |
|
|
perhaps people are unfamiliar with current government policies regarding redaction let me explain a ========= ======== ==== ========= ========= ============== ====== ====== ========== ==== ======== ========== which allows for re======== ================ ==== =========== ========= ========= ===== ========of most ====== ======= |
|
|
This page inte==== === ====== |
|
|
' "cuppa Joe" isn't a name, but would get screwed up
by this system.'
"I could live with that," thought st6f as he drank his
cup of Pete. |
|
| |