h a l f b a k e r yThe Out-of-Focus Group.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
You've probably seen the AI software which tries to
generate images from text-based user descriptions. It
doesn't do very well but it will manage to produce blobs
of tabby fur if you type "cat" and various Caucasian
Cronenbergesque abominations if you type "woman",
"person", "man", "child"
and so forth. It also does
birthday cakes, benches, fridges and trees just about
recognisably. It uses an AI algorithm called a Generative
Adversarial Network (GAN), which sets two neural nets
to compete with each other, one trying to fool the other
that an image is real and the other trying to detect
fakes.
There are clearly a number of problems which need to
be addressed with this program, but one apparent issue
with it is that it doesn't seem to understand that the
images it's been trained on represent three dimensional
scenes rather than two dimensional pixel patterns. I
want to suggest a way of resolving this.
There are many examples of photographs of well-known
objects, scenes and people whose image files are
labelled appropriately as "St Peter's Church in Rome",
"Desmond Tutu", "double bed", and so forth. Some of
these have been taken from angles identifiable via GPS
coordinates or as recognisably similar. With some
resizing, the ones from which parallax could be
reconstructed would be theoretically convertible to
three-dimensional scenes of the items concerned,
particularly if images have been taken from all round an
object such as statues or objets d'art in galleries. If a
generative adversarial network is trained on a large
number of these images, it will be able to extract
approximate renderings of the real equivalents which
could be appropriately labelled. That's stage one.
Stage two is to take this data rather than the two-
dimensional image data and train another GAN to
recognise other scenes which are only available viewed
from a single angle, thereby constructing scenes of
people standing around at birthday parties or moodily-lit
toothpaste tubes or whatever.
Stage three is to pool all of these data and train yet
another GAN to recognise the objects according to their
descriptions as labelled in the three dimensional
reconstruction.
The end result could be a better, more accurate AI
application which is able to produce more realistic
scenes as requested by the user's descriptions which are
also possible to explore using VR.
ClipText
ClipText "A text that visualizes itself." -- are you trying to realize this idea? :) May also add the time, as extra dimension. 4D Voxels, so that it can convert stories into movies. [Mindey, May 02 2020]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Annotation:
|
|
Great, so now we get not only deepfake people videos, but
entire deepfake scenes of what happened in the news at x
location... I see trouble ahead... |
|
|
Yes, but at least it's a nice, clear 3-D VR view of the trouble ... so you can appreciate it properly. |
|
|
Once you move from 2-D to true 3-D, the data volumes involved increase exponentially. |
|
|
However, stereoscopic/binocular "views" from selected angles would limit the data set to a manageable size, but substantially improve information available to the GAN. |
|
|
Okay, so some compression needed. To some extent
it amounts to a series of linked surfaces with
contours. |
|
|
Averaging will certainly help; the image can be reduced to a set of planes and regions by reducing the colour depth and contrast, for example, yet would still be perfectly recognizable to a human observer. Tom and Jerry is clearly a cat chasing a mouse to the audience... |
|
|
So a "cartoonized" view will greatly simplfy the task. |
|
|
Thanks, that's interesting but it might be a while
before I'm in a position to poke around in it myself. |
|
|
We're alive and well thanks. Hope you are too. |
|
| |