Core concepts of AI agents & autonomous systems

Purpose: define the fundamental building blocks you need to understand before comparing LLMs, AI engines, frameworks, and agents. These concepts appear everywhere — from simple chatbots to OpenClaw and Hermes Desktop.

1. What is an "AI agent"?

Agent: a system that uses an LLM (or other reasoning engine) to perceive its environment, decide on actions, and execute those actions — often with tools, memory, and a degree of autonomy.

A pure chatbot (L1) is not an agent because it has no ability to act or remember beyond the current conversation. An agent acts, plans, and iterates.

Minimal characteristics of an agent:

Reasoning loop (think → act → observe → repeat).
Tool use (at least one external capability: search, calculator, code, API).
Goal‑directed behavior (it tries to accomplish something).

2. The four pillars that distinguish agents from simple LLMs

2.1 Tool use (function calling)

Ability to call external functions or APIs. Examples: web search, file read/write, database query, sending an email, executing shell commands.

Why it matters: Without tools, an agent can only talk. With tools, it can change the world (or your computer).

2.2 Memory (short‑term, long‑term, and vector memory)

Short‑term / working memory: context window of the LLM. Contains recent conversation and reasoning.
Long‑term memory: persistent storage (database, markdown files, vector DB). Allows the agent to remember facts across sessions.
Episodic memory: remembers past actions and outcomes to improve future behavior.

2.3 Planning & reasoning

Agents break down high‑level goals into step‑by‑step plans. Common patterns:

ReAct (Reason+Act): interleaved reasoning traces and actions.
Plan‑and‑execute: create a full plan, then execute step by step.
Reflexion / Reflection: agent critiques its own outputs and improves.

2.4 Autonomy level (how much human oversight?)

Defined by the autonomy ladder (L1–L4). See table below.

3. Autonomy ladder (L1 to L4) — core framework for classification

Use this table to position any AI product: ChatGPT (L1/L2), GitHub Copilot (L2), OpenClaw (L4).

Level	Name	Memory across sessions?	Tool use?	Self‑initiated actions?	Example products
L1	Chatbot	No	No (or very limited)	No	ChatGPT (free tier), basic Claude, old‑style bots
L2	Coplilot	No (session only)	Yes (but human confirms each action)	No, waits for user approval	GitHub Copilot, Microsoft 365 Copilot, Cursor
L3	Agent (task‑oriented)	Limited / optional long‑term memory	Yes (can call tools autonomously)	Triggered by user prompt, but executes multi‑step	DeepSeek (tool mode), Claude Code, Assistants API agents
L4	Autonomous system	Yes, persistent memory	Yes, full computer or API control	Yes (scheduled, event‑driven, self‑waking)	OpenClaw, Hermes Desktop, Devin, Waymo

Important nuance: many products span levels depending on configuration. For example, ChatGPT with “Code Interpreter” acts like L3, but the standard chat interface is L1/L2.

4. Key agent architectures and patterns

4.1 ReAct loop (Reason + Act)

Loop: Thought → Action → Observation → Thought → … until goal reached.

Example: “What’s the weather in Tokyo?” → Thought: need to call weather API → Action: get_weather(Tokyo) → Observation: 22°C, sunny → Final answer.

4.2 Multi‑agent systems (collaborative agents)

Multiple agents with different roles communicate to solve complex tasks. Common patterns:

Supervisor / orchestrator: one agent delegates subtasks to specialists.
Peer‑to‑peer conversation: agents debate or iterate (AutoGen style).
Role‑based teams (CrewAI): Researcher, Writer, Critic work sequentially.

4.3 Reflection & self‑improvement

The agent reviews its own outputs, identifies mistakes, and generates improved answers. Hermes Desktop’s “reflective phase” is a concrete example: after completing a task, the agent writes a reusable skill file based on what worked.

4.4 Graph‑based agents (LangGraph style)

State machines where each node is a step (e.g. “retrieve”, “generate”, “check safety”). Edges define conditional flows. Allows loops, human‑in‑the‑loop, and persistent checkpoints.

5. Memory types (deep dive)

Memory is the most misunderstood concept. Without memory, an agent cannot learn or adapt beyond a single chat session.

Memory type	Storage	Persistence	Use case
Conversation buffer	LLM context window	Session only	Short chat history
Vector / semantic memory	Vector database (e.g. Pinecone, pgvector)	Permanent	Retrieve relevant facts, documents, past solutions
Entity memory	Key‑value store	Permanent	Remember user preferences, names, settings
Procedural memory (skills)	Markdown / code files	Permanent	“How to write a weekly report” – reusable workflows
Episodic memory	Logs + summarization	Permanent	Remember past successes / failures to improve planning

Examples in real products: OpenClaw uses markdown‑based long‑term memory. Hermes Desktop stores skills and reflections as markdown. ChatGPT’s “memory” feature (optional) is entity memory across sessions.

6. Tools & environments (what agents can control)

An agent’s capabilities are defined by the tools available. The most powerful agents (L4) have access to the full computer environment.

Browsing / web search: retrieve real‑time information.
Code interpreter: run Python, analyze data, generate charts.
Filesystem access: read, write, delete, move files.
Shell / terminal: execute system commands (dangerous, requires sandbox).
APIs / HTTP: call any REST service (Slack, email, GitHub, databases).
Messaging channels: WhatsApp, Telegram, Discord, iMessage – used as remote control for desktop agents.

Security note: L4 agents with full computer access MUST run in isolated environments (Docker, VMs, sandboxes). OpenClaw and Hermes documentation both warn against running on production machines without restrictions.

7. Agent stacks (frameworks) vs standalone agents — the distinction

Many people confuse the framework used to build an agent with the agent itself.

Agent framework / stack: a library (LangGraph, CrewAI, AutoGen) that helps you build agents. You write code, define tools, set up memory.
Standalone agent / product: an already‑built agent you can download or subscribe to (OpenClaw, Hermes Desktop, Devin).

Example: You can use LangGraph (framework) + OpenAI API (engine) to build a custom “legal document reviewer” agent. That final executable is your agent, not LangGraph itself.

8. Common misconceptions (clarifications)

“All LLMs are agents” – FALSE. LLMs only generate text. Without a loop, tools, and memory, they cannot act.
“Agents must be fully autonomous” – FALSE. Many agents operate with human‑in‑the‑loop (L3). Autonomy exists on a spectrum.
“ChatGPT is an agent” – PARTIALLY TRUE. The default interface is a chatbot (L1). With plugins, code interpreter, or “Tasks” it becomes agentic (L3). But it is not a persistent L4 autonomous system unless you build on the Assistants API with cron jobs.
“Frameworks are agents” – FALSE. Frameworks are toolkits, not runnable agents.
“Memory = vector database only” – FALSE. Memory includes conversation buffer, entity memory, procedural skills, and reflection logs.

9. Quick glossary (essential terms)

Term	Definition
LLM	Large Language Model – neural net trained to predict tokens.
Tool calling / function calling	LLM outputs a structured JSON to invoke an external tool.
ReAct	Reason+Act pattern: interleaved reasoning and action steps.
RAG	Retrieval‑Augmented Generation – inject relevant documents into context.
Multi‑agent orchestration	Coordination of multiple agents, often with a supervisor.
Reflective phase	Agent analyzes its own performance and writes reusable improvements (Hermes Desktop, Reflexion papers).
Skill / tool definition	A reusable unit of capability, often described in markdown or code (e.g. “send_slack_message” skill).
Context window	Maximum tokens an LLM can process in one call (e.g. 128k, 1M, 2M).
Checkpointing	Saving agent state (memory, step progress) to resume later.

10. Putting it all together: a concrete agent anatomy

Take Hermes Desktop (L4 autonomous agent) as an example:

LLM / engine: uses any OpenAI‑compatible API (GPT‑4o, Claude, DeepSeek, local models).
Memory: long‑term markdown memory + vector retrieval (RAG).
Tools: shell commands, file system, browser, messaging channels, 90+ built‑in skills.
Planning & reasoning: ReAct loop + reflective phase that creates new skills.
Autonomy: can be scheduled, wakes itself to respond to messages, executes full workflows without human step‑by‑step.

This anatomy applies to any modern agent: OpenClaw, PythonClaw, Devin, or a custom agent built with CrewAI.

Next steps (related pages on this website)

LLM vs Engine vs Framework vs Agent – the four‑layer hierarchy (homepage).
Autonomy ladder (L1‑L4) with full product table – compare ChatGPT, Copilot, OpenClaw, Hermes.
Agent stacks comparison – LangGraph, CrewAI, AutoGen, OpenAI Agents SDK.
OpenClaw vs Hermes Desktop vs Devin – side‑by‑side feature matrix.
Memory implementation guide – vector DB, markdown memory, episodic logs.

soratoraservices.com

Core Concepts