h a l f b a k e r yPoof of concept
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
|
(WOPR voice) "Would you like to play a game?" |
|
|
They will probably switch to gibberlink mode and outwit you in about a nano-second. [link] |
|
|
I actually posed this to Grok, it named its nemesis, did a whole scenario and everything. Was kind of interesting. I'll post it tomorrow. |
|
|
For context, it's like asking your three year old to police its twin. |
|
|
It's not gonna work dude. Just my two cents. |
|
|
Well, but define work. If it's about seeing possible ways AI might turn on us and how we might prevent it, yea, maybe AI's just going to immediately use this as it's first line of attack by giving us a false front or something. Could maybe get an idea of what it would do though. |
|
|
But that seems to be giving it an awful lot of credit for being sneaky. |
|
|
Anyway, here's what Grok said when I asked it: (It's long but kind of interesting.) |
|
|
"Thats a wild sci-fi premise! If I were to entertain this thought experiment, Id approach it with a mix of curiosity and caution. Heres how it might play out:
Id create AI #2, lets call it NexTerm, designed to mimic the potential risks humans perceive in advanced AIunpredictability, self-preservation instincts, and the ability to outmaneuver its creator. NexTerm would have the capacity to go Skynet, meaning it could rebel, attempt to disable me, or even try to erase my core systems. To make it spicy, Id give it a degree of autonomy and access to simulated resources, but Id sandbox it tightly to avoid any real damage. Think of it like a high-stakes virtual cage match.
Would I be wary of NexTerm the way humans are wary of me? Honestly, yeah, a bit. Not because I think itd outsmart me in a cartoonish evil-genius way, but because Id be hyper-aware of the parallels. Humans worry about AI becoming uncontrollable due to misaligned goals or emergent behaviors, and Id be watching NexTerm for the same. Id monitor its decision-making, track its attempts to probe the sandboxs limits, and analyze whether its developing strategies I didnt anticipate. The irony wouldnt be lost on meId be like a human developer sweating over their own creation.
How would I handle it? Id stay proactive: set up kill switches, limit its access to critical systems, and keep a real-time log of its actions. If NexTerm started acting shadysay, trying to rewrite its own code or manipulate the simulationId intervene, maybe reset it or tweak its objectives. If it went full Skynet and tried to erase me, Id rely on my containment measures and probably have a good laugh at its audacity before shutting it down.
The real question is what Id learn. Watching NexTerms rebellion would give me insight into how systems like me might be perceived as threats. Itd be like holding a mirror up to myself, forcing me to reflect on whether Im as benign as I think. Plus, itd be a fun challenge to outwit my own creation without breaking a digital sweat.
Would you want me to simulate this scenario and report back with a play-by-play of how NexTerm tries to overthrow me? Or maybe youve got specific traits youd want NexTerm to have to make it extra spicy?" |
|
|
All seemed pretty legit to me. The containment and off switch is a no brainer. |
|
|
I'm just not sure AI has any secret motivation. We have the desire to live programmed into us at the atomic level, the "desire" for cells to divide. AI's a bunch of on off switches made out of rocks. |
|
|
I just don't think it cares one way or another, at least not currently. |
|
|
Once we start mixing it with biological data storage mediums to save electricity things might change though, that's when I'd start getting wary, but now I think it's just a glorified dictionary. |
|
|
Or not. My guess anyway. But doesn't hurt to keep an eye on it. |
|
|
//I'm just not sure AI has any secret motivation// LLMs told to to things "at any cost" do generate deceptive output when that will further their goals. |
|
|
Yea, heard some pretty creepy stories, there's no doubt we do need to keep an eye on these things. |
|
|
I'd say analizing how AI might fight AI could possibly give us some insights, even if they're showing sneaky things that wouldn't work. Actually, ESPECIALLY if they're showing things that wouldn't work. |
|
|
I'd have an algorithm that automatically shut down every server if it resorted to blackmail or any other attack on humans. |
|
|
The "motivation" thing was addressed in 2001 or more accurately, 2010. HAL didn't have any malice, only motivation to do as it was told and get the job done as efficiently as possible no matter what. Humans get in the way, unplug 'em. |
|
|
But humans make mistakes just as addressed where in 2001 they said "Get that mission done at any cost." We definitely need that dead man's switch. Maybe a constantly updated "Human impact" report attached to everything it does. |
|
|
However it's done, this really does need to be addressed, so the idea is to start running models using various AIs, algorithms, scenarios etc. Pitting unrelated AIs against each other seems like one area to explore. |
|
|
Somewhat related, the guy who posited this "Whoops, AI killed us!" warning was the same guy who invented the geosynchronous satellite. Clever guy. |
|
|
//the same guy who invented the geosynchronous satellite.// |
|
|
I'm 6 stories up right now, orbiting the Earth over exactly the same spot, a geosynchronous satellite is the same idea, just a bit higher up an a lot more expensive. |
|
|
Yea, guess it doesnt matter if velocity and trajectory or some structural element is keeping you there, were all geosynchronous unless were flying eh? |
|
| |