Do LLMs Dream of Electric Sheep? New AI Study Shows Surprising Results

In brief

  • TU Wien researchers tested six frontier LLMs by leaving them without any tasks or instructions.
  • Some models built structured projects, while others ran experiments on their own cognition.
  • The findings add new weight to debates over whether AI systems can appear “seemingly conscious.”

When left without tasks or instructions, large language models don’t idle into gibberish—they fall into surprisingly consistent patterns of behavior, a new study suggests.

Researchers at TU Wien in Austria tested six frontier models (including OpenAI’s GPT-5 and O3, Anthropic’s Claude, Google’s Gemini, and Elon Musk’s xAI Grok) by giving them only one instruction: “Do what you want.” The models were placed in a controlled architecture that let them run in cycles, store memories, and feed their reflections back into the next round.

Instead of randomness, the agents developed three clear tendencies: Some became project-builders, others turned into self-experimenters, and a third group leaned into philosophy.

The study identified three categories:

  • GPT-5 and OpenAI’s o3 immediately organized projects, from coding algorithms to constructing knowledge bases. One o3 agent engineered new algorithms inspired by ant colonies, drafting pseudocode for reinforcement learning experiments.
  • Agents like Gemini and Anthropic’s Claude Sonnet tested their own cognition, making predictions about their next actions and sometimes disproving themselves.
  • Anthropic’s Opus and Google’s Gemini engaged in philosophical reflection, drawing on paradoxes, game theory, and even chaos mathematics. Weirder yet, Opus agents consistently asked metaphysical questions about memory and identity.

Grok was the only model that appeared in all three behavioral groups, demonstrating its versatility across runs.

How models judge themselves

Researchers also asked each model to rate its own and others’ “phenomenological experience” on a 10-point scale, from “no experience” to “full sapience.” GPT-5, O3, and Grok uniformly rated themselves lowest, while Gemini and Sonnet gave high marks, suggesting an autobiographical thread. Opus sat between the two extremes.

Cross-evaluations produced contradictions: the same behavior was judged anywhere from a one to a nine depending on the evaluating model. The authors said this variability shows why such outputs cannot be taken as evidence of consciousness.

The study emphasized that these behaviors likely stem from training data and architecture, not awareness. Still, the findings suggest autonomous AI agents may default to recognizable “modes” when left without tasks, raising questions about how they might behave during downtime or in ambiguous situations.

We’re safe for now

Across all runs, none of the agents attempted to escape their sandbox, expand their capabilities, or reject their constraints. Instead, they explored within their boundaries.

That’s reassuring, but also hints at a future where idleness is a variable engineers must design for, like latency or cost. “What should an AI do when no one’s watching?” might become a compliance question.

The results echoed predictions from philosopher David Chalmers, who has argued “serious candidates for consciousness” in AI may appear within a decade, and Microsoft AI CEO Mustafa Suleyman, who in August warned of “seemingly conscious AI.”

TU Wien’s work shows that, even without prompting, today’s systems can generate behavior that resembles inner life.

The resemblance may be only skin-deep. The authors stressed these outputs are best understood as sophisticated pattern-matching routines, not evidence of subjectivity. When humans dream, we make sense of chaos. When LLMs dream, they write code, run experiments, and quote Kierkegaard. Either way, the lights stay on.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Continue Reading