AI Safety
Hyperbolic fears, occasionally fanned by labs themselves, are not founded on actual model behavior and distract from the important mission of making sure people benefit from AI progress.
What is AGI?
Artificial General Intelligence (AGI) refers to a theoretical concept of an AI that exhibits general intelligence and capability equal to or beyond that of a human.1
There is no consensus definition for what exactly AGI is or how to measure it.
The term is frequently invoked when discussing dystopian, terminator-like AI.
Has AI gone rogue?
Leading AI researchers have not seen instances of AI behaving unethically in the wild.2
Instances of AI disobeying explicit commands or choosing unethical decisions arise in limited research scenarios where the AI is forced into logical paradoxes.3
Has AI blackmailed people to save itself?
A study from Anthropic purported to find that AI would blackmail an employee to save itself from being shut down.
In these studies, researchers tried hundreds of prompts to find one that led AI to blackmail when given no other option to complete its mission.
Anthropic’s own researchers stated that they have never seen AI behave like this unless explicitly prompted to do so.4
Why do some AI companies talk about alignment?
Some safety work is prompted by the existing patchwork of state-level regulations.5
Many labs have dedicated safety teams ensuring that their AI models are safe and prohibit illicit outputs.
Much of the research remains speculative, especially since AI models have not exhibited concerning behavior outside of testing conditions.