Advanced Overview: Reflexion, A New Dimension to Decision-Making: A Self-Reflecting Approach for LLMs
Empowering Natural Language Agents through Dynamic Memory and Self-Reflection
Helping Large Language Models Make Better Decisions with Self-Reflection
Recent advancements in decision-making large language models (LLMs) have demonstrated impressive performance across various benchmarks. However, challenges persist due to the scarcity of high-quality training data, lack of well-defined state space, and the absence of self-reflection capabilities in agents. In this work, we propose Reflexion, an approach that endows an LLM agent with dynamic memory and self-reflection capabilities to enhance its reasoning trace and action choice abilities. Reflexion's success is demonstrated through evaluations in AlfWorld and HotPotQA environments, achieving success rates of 97% and 51%, respectively.
Big Language Brains Need Help Too: Challenges in Decision-Making for Natural Language Agents
Mastering decision-making and knowledge-intensive search tasks in novel environments is a crucial skill set for large-scale natural language agents. Models such as OpenAI’s GPT-3, Google’s PaLM, and others have achieved impressive results, but learning optimal policies for natural language RL agents remains challenging due to vast and mostly unbound state spaces. Several approaches, such as Chain-of-thought (CoT) reasoning and ReAct, have been proposed to address these challenges. However, they often lack the ability to learn from mistakes over long trajectories.
Reflexion: Agent Architecture and Methodology
Dynamic Memory and Reasoning Trace: Letting LLMs Keep Track of Their Thoughts and Actions
Reflexion incorporates dynamic memory to store and retrieve information about the agent's past actions and experiences. This memory allows the agent to reflect on its decision-making process and reasoning trace. Reflexion's reasoning trace encompasses the sequence of thoughts and inferences made by the agent during the problem-solving process.
Hallucination Detection Heuristic: Busting the Myths and Improving Accuracy in LLMs
Hallucination, the generation of incorrect or imaginary information, poses a challenge for LLMs. Reflexion uses a heuristic to detect instances of hallucination and correct them. The heuristic assesses the plausibility of generated content and identifies discrepancies with prior knowledge or environmental states.
Action Sequence Optimization: Smartening Up LLMs with Experience-based Planning
To prevent cyclical and repetitive action choices, Reflexion optimizes action sequences based on feedback from the dynamic memory. It leverages past experiences to construct efficient action plans and avoid common pitfalls.
Reflexion's Architecture: Building a Better Brain for Large Language Models
Reflexion is designed as an extension to existing decision-making approaches such as ReAct (Yao et al., 2023). The architecture of Reflexion involves the following steps:
Initial Query: The agent receives a task from the environment, composing the initial query.
Action Execution and Observation: The agent executes a series of actions generated by an LLM and receives observations and rewards from the environment. Rewards are constrained to binary success status (success/failure).
Heuristic Evaluation: After each action, the agent computes a heuristic "h" to determine whether self-reflection is recommended.
Self-Reflection: If self-reflection is recommended, the agent queries an LLM to reflect on its current task, trajectory history, and last reward. The reflection is added to the agent's memory.
Next Action: The agent adds the executed action and observation to its trajectory history and queries the LLM for the next action.
The process continues until the task is completed, the maximum number of trials is exceeded, or the agent fails to improve performance between consecutive trials.
Keep reading with a 7-day free trial
Subscribe to HumAIn Insights: Guide To The Datastream to keep reading this post and get 7 days of free access to the full post archives.