h a l f b a k e r yExpensive, difficult, slightly dangerous, not particularly effective... I'm on a roll.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Scanning printed text, applying accurate
optical character recognition, and making
sense of the resultant input data can
require precision hardware mechanics,
intricate electronic circuitry,
sophisticated
database software. The process is also
limited by the requirement of physical
contact
between matter and machine.
Few
would argue this process could use some
improvement.
On the other hand, a UPC barcode can be
read with a whisk of a box through the
air
and the process is also called 'scanning.'
The discrepancies between machinery in
these two type of 'scanning' systems is
significant, but the crux of their
functions
is the same: look at the lines and suss
out
what they say. With the stroke of a pen,
we can close this technology gap.
"-TTT-" is a shorthand for "Typeface
Target Tag," a simple symbology to
optimize the transfer of printed text to
digital data. A unique set of graphic
characters could be established to assist
an artificial intelligence in the extraction
and processing of information collected
from hard copy sources. In a blurb, this
idea is "An upgrade on the concept of ''X'
marks the spot.'"
+..TT-E:p..+ In the proceeding character
string, the "+" symbols frame and allign
another set of symbols containing
information about the written
information
that follows. In this example "TT-E:p"
stands for 'type text- English:prose.'
Digits scrawled on a post-it beginning
with "+..HW-#..+" would be identified as
a
"hand writing number." The job of a " -
TTT-" is to show a machine where to
look,
how to see, and what to understand.
+..TT-E:p..+ A -TTT- in the position of
the previous character set would function
universally as an alignment point for
optical scanning, a typeface tag for
character recognition, and a metadata
marker for context categorization. Fixing
constants and finding variable values
simplfy the solution of any analytical
equation. The idea is imbed enough
information in a single spot that the
accurate input of other information the
marker is attached to can be more easily
discerned.
In practice, the process would proceed
with this plan:
1) Lightbeams extending from external
devices trained on a surface pinpoint a -
TTT- and instantly adjusts to focus on
this
area. 2) Elements in the -TTT- design
provide a reference for calculating the
size, angle, and position of a limited
planar frame for a scanning system to
search for variation. 3) Any typeface data
imbeded in the -TTT- is extracted by the
OCR software and used to interpret the
text frame 4) Significant features of the -
TTT- symbol structure signal cybernetic
systems to expect a specific type of
content to be soon sent through their
subroutines and steer the sorting and
storage of signals as such .
Using 'cross hairs' for precision spacial
orientation adjustments has foundations
in technology as ancient as the sextant
and as modern as motion capture.
Changing a typeface's font, size, and
style
confounds computer chips as a norm.
Finding context in the characters is
challenging, but conquest in this quest is
profound. -TTT- is the answer for it all.
[link]
|
|
+..TT-E:p:h..+ Handwriting |
|
|
Which is English:prose:handwriting followed by the
handwritten word "Handwriting" that the scanner can use
to calibrate for the handwritten text that follows. |
|
|
Couldnt get this, sorry. Reading too many bakery ideas, my attention span has been destroyed. |
|
|
This Idea is plain enough. -TTT- represents the constrast of a sample upon its media, and conveys descriptors such as pixelation to the scanning and OCR software much like a 3-D barcode carries several layers of information. Examples of OCR challenges are: |
|
|
3-D #1: handwriting on paper
3-D #2: type on paper
3-D #3: image on paper
3-D #4: image on other media
|
|
|
I thought this was like pi, but the imaginary number version. |
|
|
Am I understanding this right: the TTT is something printed with the text to add later OCR? |
|
|
In that case, why not simply print a microscopic bar code on the edges of each letter? Or, indeed, a single bar code containing the text of the whole page? Then the OCR system would just have to pick up the bar code. |
|
|
You might be able to design a font that had some bar-code features built into it. But there is also another coding system out there that uses a rectangle full of small black-and-white squares, to represent data. I think it might be better to design a font using that system than a bar-code system, because the characters in the font would look more "normal". |
|
|
Unnecessary: the OCR program that came with my latest scanner (/printer/fax/copier) already formats the scanned text in Word the way it looks on the page. |
|
|
As does my older model, [DrCurry], but mine only produces images from handwriting or from indecipherable text. I then have to resize, and in most cases retype the overlying image into a text line. |
|
|
Oh, and stray "punctuation" strewn about the page? Could be much better. |
|
|
Given that the OCR can't read my handwriting in the first place, how would it read the handwritten tags (and likely inaccurate alignment crosshairs) I add later? |
|
| |