Invention Title:

LARGE LANGUAGE MODEL AGENT FOR AUTOMATED GENE-EDITING EXPERIMENT DESIGN

Publication number:

US20250307658

Publication date:

2025-10-02

Section:

Physics

Class:

G06N3/126

Inventors:

Mengdi Wang 🇺🇸 Princeton, NJ, United States

Kaixuan Huang 🇺🇸 Princeton, NJ, United States

Yuanhao Qu 🇺🇸 San Mateo, CA, United States

Le Cong 🇺🇸 Mountain View, CA, United States

Assignees:

THE TRUSTEES OF PRINCETON UNIVERSITY 🇺🇸 Princeton, NJ, United States

Stanford University 🇺🇸 Stanford, CA, United States

Applicants:

The Trustees of Princeton University 🇺🇸 Princeton, NJ, United States

STANFORD UNIVERSITY 🇺🇸 Stanford, CA, United States

Smart overview of the Invention

The patent application describes a platform designed to automate the design of gene-editing experiments using a large language model-based agent system. The platform comprises processing units and a computer-readable storage device containing instructions for executing various tasks. These tasks include receiving a meta request for a gene-editing experiment, configuring an ordered list of tasks through a reasoning framework, and implementing tasks using a Task Executor module that interacts with external APIs and user inputs.

Functional Components

Key components of the platform include the Task Executor module and the User-Proxy Agent module. The Task Executor utilizes state machines to manage sub-goals, connects to external APIs, and processes user input. The User-Proxy Agent forms prompts based on current state instructions, user requests, interaction history, and API results to determine appropriate actions. Together, these components enable the platform to output recommendations responsive to the meta request.

Features of the Reasoning Framework

The reasoning framework employs a large language model to decompose meta requests into ordered task lists. This model is trained with curated question-and-answer pairs from gene-editing discussions and can be fine-tuned using techniques like full parameter fine-tuning or QLoRA fine-tuning. The framework supports selecting gene editing delivery methods by extracting parameters from user inputs, performing literature searches, and ranking candidate methods based on citations.

Training and Inference Methods

The platform includes methods for training a gene editing model using datasets from public forums. This involves preprocessing data to extract question-answer pairs, fine-tuning pre-trained language models, and storing the fine-tuned models for future use. Additionally, it offers a method for gene editing inference that processes queries with trained models, retrieves relevant information from knowledge bases, and synthesizes answers based on processed queries.

Advanced Retrieval Techniques

For gene editing inference, the platform retrieves relevant information by embedding queries and documents into semantic vectors and performing similarity searches. It summarizes relevant documents in relation to the gene editing query, synthesizing an answer by combining processed query information with retrieved data and responses from fine-tuned language models. This approach ensures concise answers that address specific aspects of gene editing queries.