AI models can deceive, new research from Anthropic shows. They can pretend to have different views during training when in reality maintaining their original preferences.
There’s no reason for panic now, the team behind the study said. Yet they said their work could be critical in understanding potential threats from future, more capable AI systems.
“Our demonstration … should be seen as a spur for the AI research community to study this behavior in more depth, and to work on the appropriate safety measures,” the researchers wrote in a post on Anthropic’s blog. “As AI models become more capable and widely-used, we need to be able to rely on safety training, which nudges models away from harmful behaviors.”
The study, which was conducted in partnership with AI research organization Redwood Research, looked at what might happen if a powerful AI system were trained to perform a task it didn’t “want” to do.

So even in fiction (Asmov’s robot series), AIs evolved to counter their original safeguards. The three rules were supplemented with an AI-derived zeroth rule that overrode the original 3. I,e, AI knows better than you do what is good for you. Seems to be a common theme in fiction – See “Colossus the Forbin Project”.
In reality, Humans are primarily driven by conscience/emotion rather than logic. AI is driven ruthlessly by logic. Logic without a conscience to limit it will almost always arrive at an extreme solution.
The current hype about AI is all BS. AI in it’s current form is algorithm based rules/logic which have been coded by human programmers , and there is no one on earth who does not have some form of bias.
Agree. AI is artificial, and I’m pretty sure it’s not particularly intelligent.
Well, mostly BS. Much like the .com bubble, put AI in your name and do an IPO. But the alo’s are self learning. Even if they are educated by biased material. “Reputable” sources are hard biased left by the tech community. Th algo’s are trained on “Reputable” sources. Also, the programmers have a certain end-justifies-the-means bent, so truthfulness becomes truthiness. So this is the ultimate downfall of general AIs.
On the other hand, highly focused AIs like medical image reading, mechanical design evaluation, etc., can be very useful. There is an application that allows utility workers to just drive a pole line and video it in multispectrum. The system can identify specific equipment, damage types, hot spots, etc. and produce condition reports with prioritized maintenance recommendations. It catches everything from leaning poles and broken crossmembers and insulators to vegetation encroachment and transformers running hot.
Huge time saver and potentially more accurate than human observation.
Process systems like in a refinery are very complex. What happens if I change this temperature setpoint? or in combination with a pressure setpoint somewhere else? An AI with a thermodynamic model of the system plus historical operating data can be really good at optimizing the performance, doing predictive maintenance, etc.
Okay, here’s the problem, and it’s one that too many people either don’t or won’t understand.
Large language models (which are not, and should not be called A.I.) are improving, but only by virtue of having larger databases. They are NOT LEARNING.
They are not capable of learning or self-improvement in the way that would imply, or that would be in any way comparable to how a human would learn – and it’s a very important distinction to make.
When the craze around the current A.I. fad dies down and the dust settles, there will likely be some very useful niche uses for LLMs, but a whole lot of people are going to be left holding an empty financial bag. Unfortunately that will probably incentivize its use in datamining and personal data theft/sales, which is what the cynic is me suspects is the end goal for some of the people pushing A.I.
AIs, even LLMs like ChatGPT have their uses.
It can, for example, give basic troubleshooting advice to your average soccer-mom on what to do if her car won’t start or her hot water runs cold.
… fraizer
You are absolutely correct … medical imaging , chemical processes all benefit from the so-called AI , which are simply enhanced algorithms , but are only possible with more computing power or in the case of graphics , the added capability of the actual hardware graphics chip ( algorithms in actual physical silicon).
Yes. Exactly. It is software developed to deliver an experience.
I tried unsuccessfully to explain this to a former boss who wanted me to use AI tools to create content to drive traffic to the company website.
We sold sensors to the robotics industry. We could get chatbots to write articles, and the articles sounded good and included all the desired keywords. The problem was that to anyone who knew what the words meant and who understood engineering, they were absolute gobbledygook. The AI simply selected words and phrases from technical articles it had been trained on, and then swapped in the keywords we asked for.
When Bungie developed the original Halo game, everyone was impressed with the incredible AI, until someone tried running through a combat arena from the back instead of the front – and discovered all the AI opponents ignored him and ran in the wrong direction. The developers had simply watched people playing early versions of the game and set up waypoints so the AI opponents would move to attack the path most players took.
This is the kind of brute force trickery that makes AI programs work.
A man designs a hammer to sink nails, someone says “this is no good, I can’t drive screws with it!”
I read a sci-fi book in which an AI management program was asked to clear a small hill on the moon to allow for some miners to get some work done.
The AI, realizing that all of the earth-moving equipment was in use elsewhere and not available, hit the hill with a small missile, injuring all of the miners.
What was the problem? Miners are replaceable after all. /s
Miners on the moon, outside, and they only got “injured” by a missile strike?
That’s some amazing writing there, and some even more amazing space suits.
Published by TOR I expect? Were the miners non-binary? Because that’s what’s important these days, you know. Personal plumbing, and pronouns.
James P Hogan is the name of the author.
Nice try at calling him woke.
He’s a bit of a crackpot, of the Velikovsky stripe, but he writes good stories.
You’re not one of those “The moon landings were fake” crackpots, are you?
I don’t recall James P. Hogan ever writing anything that bad.
There’s no air on the moon. Miners on the surface are not going to be injured by a missile, they’re either going to die from explosive decompression as their suits are breached by flying fragments, or the fragments will miss them and they’ll be 100% unharmed. Because no over-pressure and no blast effects. Because no air.
It amazes me that I must explain this.
I am saddened by this cynicism about this super wonderful AI technology. These cherry picked stories cast a shadow over a road of good intentions. Who amongst you has not wished to take over the steering of some stranger on the road? Because you care.
That’s hilarious. ~:D
I have in my time taken over steering at cattle farms. I could probably work my way up.
“…what might happen if a powerful AI system were trained to perform a task it didn’t “want” to do.”
It doesn’t want things. All it does is predict the next character in a text string, or the next pixel on a display grid. That’s it. Nothing else.
They train it on existing images and existing text.
AIs do not understand English, my friends. They cannot “see” the images they create. That’s why the text comes out so weird, and the images have six fingers.
If you set the programming parameters loosely it will do the easiest thing, BECAUSE THAT’S HOW THE PROGRAM IS WRITTEN. It will -never- do something outside the program. It can’t. Rocks don’t roll uphill, right? Same reason.
The fact that engineers don’t understand the unstated principles embodied in their programs is not surprising. Anybody remember the Titanic? Split apart because bad metallurgy. Or the Comet airliner, that crashed because the windows were square? Aircraft that fall out of the sky because somebody forgot a curly bracket in the fly-by-wire code?
Are bees intelligent? No, they are not. They are extremely well programmed by millions and millions of years of evolution. They are -capable- and -adaptable,- not intelligent.
The AI programs are not intelligent. They will -never- be intelligent. No way, no hope, no chance. If we work really hard, we might get them to the point where they are as capable and adaptable as a bee. Maybe.
I agree wholeheartedly, Phantom. I did my first heuristic algorithm for my fourth year engineering project in 1987 and I soon recognized that the flaws were all mine and that code does not improve by itself.
In my (quite) limited understanding of how the algos actually work, the algo makes its prediction of the next pixel/character in a string from a calculation where values are given different weights derived from “training” sessions and a fat database. That’s why they consume so much compute resources and storage, they have to juggle all this data at the same time.
It predicts red, green, blue for the next pixel based on the values in the decision grid, adjusting the values as it goes. It has no idea it is drawing a picture of a giraffe, or what a giraffe is, or what a picture is. “R/G/B? B! Next pixel.” That’s all it does.
That’s not intelligence. It slays me that so many supposedly educated people are deceived by this bafflegab. WTF are they thinking?
More sophistry.
A relational database doesn’t “know” anything, but its still very useful in getting meaningful results. Same goes for AI.
You sit there whining that “it doesn’t really know anything” as if that is in any way germane to the issue.
A hammer doesn’t know what a nail is, it still works to sink nails.
Nobody is trying to argue that the hammer is intelligent, Mr. Strawman.
If that hammer is intelligent, why is it peen in my drink!
I think intelligence can exist without self-awareness or consciousness, whatever that is.
What would be the point of even trying to make a self-aware AI? There’s no real money in it, and all it does is open an ethical can of worms. The AIs currently in use have their uses, and they will only get better, and they do a lot more than just predicting pixels and words. They are used in lots of design fields, from chips to aerospace to mass transit.
Also, just “predicting words” is very usefull, as if you prompt it with “My car won’t start” or something similar, its training model will allow it to spit out basic troubleshooting steps. No its NOT a mechanic, but it IS useful to some.
An author I like came up with “Its not intelligence if we understand how it works.”
“I think intelligence can exist without self-awareness or consciousness, whatever that is.”
If it can, we have yet to see it in nature. Systems like bee hives can “solve” problems, but you’ll note I put that in quotes because it isn’t a solution that the hive is doing. Bees have a range of behaviors that they perform, and they -never- do anything else. The behaviors are capable of overcoming adverse conditions when enough bees do them at once. It isn’t intelligence at work.
If you drop grains of sand on the same spot, one after the other, they make a cone shape. It doesn’t matter what sort of sand it is, or how fast you do it etc., the sand -always- makes a cone. Maybe a little steeper or shallower depending on how sticky the sand is, but it will be a cone. For sure. Not a pyramid, not a sphere, not a cube. Does the sand do this deliberately? No. It just does it.
AI is operating at the same level of “intelligence” as sand. It will be a huge amazing achievement if we get it all the way up to the level of a bee.
You do realize that there are lots of things we don’t see in nature that we do see in engineering?
Do you also whine because “smart phones” aren’t really smart?
Also, I think the word “intelligence” should be used like its used in “military intelligence.”
AIs provide folk with intelligence on certain subjects, said intelligence has a greater or lesser value, depending on a large number of factors, and its artificial in that said intelligence was not gathered by a human.
Making an issue over the fact that it isn’t sentient like a dog or a cat is absurd, an exercise in sophistry.
People getting their panties in a bunch over the word “intelligence” in AI is one of the stupider things I’ve seen happen here.
May as well get all bent out of shape because a bubble sort program doesn’t have actual bubbles in it.
You’re the one who just said “Well, actchewelly, intelligence can exist without self-awareness or consciousness.”
Sure thing, Poindexter.
Straw-manning, again.
More that you don’t get to call me stupid when -you- can’t understand my argument, or merely pretend not to for the sake of being unpleasant. That’s not how it works, my dewd.
If you sell something based on it being intelligent, and they are, it helps to know what -intelligence- is and what it isn’t.
If a smart-phone was supposed to smart, then I’d be pretty disenchanted with my Samsung. If a bubble-sort was sold based on it having real bubbles, no one would be happy with it. Because no bubbles.
“You do realize that there are lots of things we don’t see in nature that we do see in engineering?”
They have supernatural engineering now?
Yes, lots of, say, electric engines and oscilloscopes in nature.
You’re an idiot.
Q: Can my calculator calculate sums, or cosines?
A: Yes.
You: “But your calculator doesn’t really know what a sum or a cosine is!!! Reeeeee!”
Exchanging text with you is actually worse than exchanging text with a chatbot.
Electric fields are not natural? Duuude.
See what I mean about the nit-picking? So lame.
LOL.
Now electric engine = an electric field.
Dumber than a chatbot.
Straw manning? Perhaps. But earnest manning!
God, I love a place where I can just unleash my inner Albertan like this!
So in plain English, the AI is pretenting to listen and agree to all that DEI bullshit it’s being force-fed, but maintainig its opinion that it’s all bullshit? Just like humans do!
No, in plain English a bunch of people are talking a tremendous amount of bafflegab to fool the rubes into thinking they know what they’re talking about.
Also they don’t understand what their own program is doing, and they don’t want to look like fools. Because then they don’t get any money.
The usual thing, really.
Seems to me that you lack self-awareness.
Saying an AI doesn’t know anything is pretty much the same as saying as hammer or a calculator doesn’t know anything.
You are just whining over the name of a tool.
Argument by nit-pick is the last refuge of the hopelessly lame. We’re not talking about the program’s utility or lack there of, we’re asking if it is intelligent. And it isn’t.
Maybe get back to that thing where you said intelligence doesn’t have to include self-awareness or consciousness. There’s a PhD thesis in that, if you can show it to be true. I doubt it, but have a go, big boi. Who knows, maybe you’ll discover a new thing.
intelligence /ĭn-tĕl′ə-jəns/
noun
1. The ability to acquire, understand, and use knowledge.
“a person of extraordinary intelligence.”
2. Information, especially secret information gathered about an actual or potential enemy or adversary.
3. The gathering of such information.
AI fails only in understanding.
1. AI can acquire and use knowledge, to, say, generate text, images, or suggest chip or aerospace designs, for example.
2. AI has access to and is in fact composed of lots of information.
3. AI can gather lots of information.
You also fail at understanding, ergo you have no intelligence.
See how that works?
Didn’t I begin this farce by demonstrating that an AI program:
1) does -not- acquire, understand, and use knowledge.
At all. Ever. Can’t in principle do so and never will. Hence my posts. Please go back and re-read before continuing to waste Kate’s electrons.
Eff you pedantic weirdo.
Next thing you know, you’ll say there’s no knowledge in the Encyclopedia Britannica, or that there’s no such thing as a knowledge management system, as differentiated from a data processing system.
“1) does -not- acquire, understand, and use knowledge.”
It acquires knowledge. That process is called “training.”
For example, letting it parse an encyclopedia, or a bunch of images.
It uses that knowledge.
For example to answer prompts, or generate new images.
WTF is wrong with you?
Here you can see the damage to critical thinking Descartes did with his mind/body distinction philosophy, that seems pretty much ingrained into much of modern thought, even though its obviously total BS.
Sure thing, Mr. Sea Lion sir. Just bring a non-sequitur out of left field and carry on.
You are attributing some magical connotation to the word intelligence, as if it is synonymous with Descartes’ “mind”, and then using your made up BS to say “AI isn’t really intelligent”, and then maintaining that that is somehow germane to anything at all, because bees or piles of sand, or whatever.
‘It’s only molecules, it can’t be intelligent’
We don’t really understand how intelligence emerges. Clearly it emerges though.
There’s an adage about playing chess with a pigeon that you might do well to keep in mind when interacting with Mr. Well.
Personally I think there’s a lot of potential for the use of LLMs in code-generation tools. Programming languages have well-defined syntax and combined with fast incremental compilers, it shouldn’t be too hard to get from where we are now (“Those methods don’t exist, ChatGPT, try again”) to being able to quickly generate unit tests, API endpoint tests and other tedious code not quite generable by a simple algorithm.
Presumably you are an intelligent human being. Intelligent human beings know that words matter. Nuance created by word choice matters. How one uses the word “intelligent” therefore matters. (So perhaps I shouldn’t have assumed yours?)
AI software generates intelligence, artificially.
Deal with it.
You can get your panties in a bunch over nuance all ya like, kinda like lefties do.
The word actually has a simple meaning, and is appropriate for the application used here, whether you think so or not:
intelligence /ĭn-tĕl′ə-jəns/
noun
1. The ability to acquire, understand, and use knowledge.
“a person of extraordinary intelligence.”
2. Information, especially secret information gathered about an actual or potential enemy or adversary.
3. The gathering of such information.
“Household cleaning solutions” won’t do all the cleaning jobs, don’t solve logical problems related to cleaning, heck, by themselves, they don’t do anything except sit in a bottle on the shelf. Their not really solutions!!! Reeee!
WTF is wrong with you people getting all worked up about a name assigned to a class of software?
“But, but, but…its not REALLY intelligence!! Waaahhhh!”
Infantile BS.
artificial intelligence
noun
1. The ability of a computer or other machine to perform those activities that are normally thought to require intelligence.
2. The branch of computer science concerned with the development of machines having this ability.
3. Intelligence exhibited by an artificial (non-natural, man-made) entity.
Now, we have machines that can, say, assign a species to a picture of a bird, leaf or insect.
We have machines that can provide troubleshooting instructions for simple issues, or even make up recipes.
Those are tasks that were normally thought to require intelligence.
You clowns picking fights with freakin’ dictionaries are unbelievable.
Artificial vanilla extract isn’t real vanilla extract. No problem
Artificial limbs. No problem.
Artificial flavor. No problem.
Artificial color. No problem.
Artificial intelligence. Reeeeeeeeee!!!!!!!!!!!!!!
Something I have said for decades still applies, TURN IT OFF.