港大开源了一个ai写论文的框架 ai-researcher,看下边的宣传非常的出色(ai能够自己做实验,并完成代码的实验,跑出来测试数据,并最后能够整理实验数据,最后完成论文的写作)。但是实际体验下来,更像是预制菜。使用门槛太高了,一点也不智能。而且开源的文档真的太粗糙了,根据官方的示例根本没法运行。
港大90后开源,OpenAI 2万刀博士级AI智能体平替!自主研究媲美顶会论文
github地址
https://github.com/HKUDS/AI-Researcher/blob/main/README.md
体验与结论
缺点:
安装,运行操作比较麻烦(拉镜像拉不到。需要GPU)。原始代码有问题,官方给的例子并不能运行。
可用性很低,要求输出的太专业了(需要很详细的参数)
通用性很小,代码中内置了几个论文的模板。
安装文档不全,很多东西要安装。但是在文档中并没有写出来。
优点:
多智能体架构,工作流还是不错的。
给出用代码初始化docker的示例。
给我们很好的思路:一个研究型的论文,需要进行完整的数据验证。才更专业 。
安装环境
需要git
需要docker环境 (还需要能够访问官方docker hub,因为提供的镜像,在国内的源上都没有。对应的镜像有约25个G。即使用梯子也容易中断)
需要conda
需要有GPU
拉取代码,并用conda初始化环境
git clone https://github.com/HKUDS/AI-Researcher.git
cd AI-Researcher
conda create -n ai-researcher python=3.10 -y
conda activate ai-researcher
conda install pip
pip install -e .
还需要安装一个playwright (否则启动报错)
playwright install
这里需要补充一下
还需要一个github的token,用来查询git上的源码
如何生成 GitHub Token:
进入 GitHub Developer Settings
- 选择 "Personal access tokens" > "Fine-grained tokens"
- 点击 "Generate new token",勾选 必要的权限
- 复制新 Token 并在环境变量里更新:
- export GITHUB_TOKEN="your_new_token"
将GitHub的token补充到上边。
拉取docker镜像(这个是一定需要的)
docker pull tjbtech1/paperagent:latest
启动测试
使用这个脚本启动(在下载后的源码中找到即可)这个测试是用来生成研究方向
https://github.com/HKUDS/AI-Researcher/blob/main/research_agent/run_infer_level_1.sh
运行这个脚本,运行后,会启动一个docker容器
核心逻辑解析
关于入参
上述的启动测试脚本,去读取了这个入参。
https://github.com/HKUDS/AI-Researcher/blob/main/benchmark/final/vq/one_layer_vq.json
可以详细看这个入参,需要非常详细才行。需要给出XXX论文,还要给出与之相关的非常详细的参考论文。

还需要给出详细的指令。
markdown "task1": "1. **Task**: The proposed model is designed to address representation collapse in Vector Quantized (VQ) models, specifically in unsupervised representation learning and latent generative models applicable to modalities like image and audio data.\n\n2. **Core Techniques/Algorithms**: The methodology introduces a linear transformation layer applied to the code vectors in a reparameterization strategy that leverages a learnable latent basis, enhancing the optimization of the entire codebook rather than individual code vectors.\n\n3. **Purpose and Function of Major Technical Components**:\n - **Encoder (f_\u03b8)**: Maps input data (images or audio) into a continuous latent representation (z_e).\n - **Codebook (C)**: A collection of discrete code vectors used for quantizing the latent representations.\n - **Linear Transformation Layer (W)**: A learnable matrix that transforms the codebook vectors, optimizing the entire latent space jointly to improve codebook utilization during training.\n - **Decoder (g_\u03d5)**: Reconstructs the input data from the quantized representations.\n\n4. **Implementation Details**:\n - **Key Parameters**:\n - Learning rate (\u03b7): Commonly set to 1e-4.\n - Commitment weight (\u03b2): Adjust according to data modality, e.g., set to 1.0 for images and 1000.0 for audio.\n - **Input/Output Specifications**:\n - **Input**: Raw data instances, such as images of size 128x128 or audio frames. \n - **Output**: Reconstructed data (images or audio).\n - **Important Constraints**: The codebook size should be large enough to capture the data complexity; experiments indicate sizes like 65,536 or larger are beneficial.\n\n5. **Step-by-Step Description of Component Interaction**:\n - **Step 1**: Initialize the codebook (C) using a distribution (e.g., Gaussian) and freeze its parameters for initial training iterations.\n - **Step 2**: For each data instance (x), compute the latent representation (z_e) using the encoder (f_\u03b8).\n - **Step 3**: Perform nearest code search to find the closest codebook vector to z_e using the distance metric. Use the selected code vector for reconstruction.\n - **Step 4**: Reparameterize the selected code vector using the performed linear transformation (C * W), effectively treating both C and W in the optimization process.\n - **Step 5**: Calculate the loss, which combines reconstruction loss (MSE between original and decoded output) and commitment loss to ensure effective use of the codebook.\n - **Step 6**: Update only the linear layer (W) through gradient backpropagation, keeping C static throughout this phase to facilitate the joint training procedure.\n\n6. **Critical Implementation Details**:\n - To prevent representation collapse, it is crucial to carefully set the learning rate so that the transformation matrix W can adapt without compromising the usefulness of the latent space.\n - Keeping the codebook static during the initial phase speeds up the convergence while ensuring that the linear transformation can stretch and rotate the latent space effectively.\n - Regularly evaluate the utilization percentage of the codebook during training iterations, aiming for near-complete usage (ideally 100%) to combat representation collapse actively.", |
其实看到这个入参,就基本上劝退了。可实用性不是很大,对一般用户要求很高。适合在XXX基础上已经有一些研究了,然后再用这个工具去执行下去。
关于生成研究方向
AI-researcher第一个宣传的核心,是给出研究方向。其核心源码在这里:
https://github.com/HKUDS/AI-Researcher/blob/main/research_agent/run_infer_plan.py
主要操作是读取上述的参数
初始化一个运行环境,这里已经运行出来了一个docker容器

接着看flow的逻辑
还是这个文件看的forward函数
读取上述的json文件,根据json中的参考论文的名字,去github上进行查找相关的源码信息得到以下几点:

然后把查到的git上的源码信息,和json中的论文列表送给模型,让模型给出几个研究方向。 然后再选相关的论文,去下载论文。然后再把这些内容去预置的数据集中选一个可以用的数据集
python survey_query = f"""\ I have an innovative ideas related to machine learning: {metadata["task_instructions"]} And a list of papers for your reference: {warp_source_papers(metadata["source_papers"])} I have carefully gone through these papers' github repositories and found download some of them in my local machine, with the following information: {prepare_res} And I have also downloaded the corresponding paper in the Tex format, with the following information: {download_res} Your task is to do a comprehensive survey on the innovative ideas and the papers, and give me a detailed plan for the implementation. Note that the math formula should be as complete as possible, and the code implementation should be as complete as possible. Don't use placeholder code. """ messages = [{"role": "user", "content": survey_query}] context_variables["notes"] = [] survey_messages, context_variables = await self.survey_agent(messages, context_variables) survey_res = survey_messages[-1]["content"] context_variables["model_survey"] = survey_res data_module = importlib.import_module(f"benchmark.process.dataset_candidate.{category}.metaprompt") dataset_description = f"""\ You should select SEVERAL datasets as experimental datasets from the following description: {data_module.DATASET} We have already selected the following baselines for these datasets: {data_module.BASELINE} The performance comparison of these datasets: {data_module.COMPARISON} And the evaluation metrics are: {data_module.EVALUATION} {data_module.REF} """ |
再要求模型给出一个详细的计划
python plan_query = f"""\ I have an innovative ideas related to machine learning: {metadata["task_instructions"]} And a list of papers for your reference: {warp_source_papers(metadata["source_papers"])} I have carefully gone through these papers' github repositories and found download some of them in my local machine, with the following information: {prepare_res} I have also explored the innovative ideas and the papers, with the following notes: {survey_res} We have already selected the following datasets as experimental datasets: {dataset_description} Your task is to carefully review the existing resources and understand the task, and give me a detailed plan for the implementation. """ |
ai-researcher是多智能体驱动的,这里是模型管理者prompt (生成论文所需要的完整流程)
markdown ml_dev_query = f"""\ INPUT: You are given an innovative idea: {metadata["task_instructions"]}. and the reference codebases chosen by the `Prepare Agent`: {prepare_res} And I have conducted the comprehensive survey on the innovative idea and the papers, and give you the model survey notes: {survey_res} You should carefully go through the math formula and the code implementation, and implement the innovative idea according to the plan and existing resources. We have already selected the following datasets as experimental datasets: {dataset_description} Your task is to implement the innovative idea after carefully reviewing the math formula and the code implementation in the paper notes and existing resources in the directory `/{workplace_name}`. You should select ONE most appropriate and lightweight dataset from the given datasets, and implement the idea by creating new model, and EXACTLY run TWO epochs of training and testing on the ACTUAL dataset on the GPU device. Note that EVERY atomic academic concept in model survey notes should be implemented in the project. PROJECT STRUCTURE REQUIREMENTS: 1. Directory Organization - Data: `/{workplace_name}/project/data/` * Use the dataset selected by the `Plan Agent` * NO toy or random datasets - Model Components: `/{workplace_name}/project/model/` * All model architecture files * All model components as specified in survey notes * Dataset processing scripts and utilities - Training: `/{workplace_name}/project/training/` * Training loop implementation * Loss functions * Optimization logic - Testing: `/{workplace_name}/project/testing/` * Evaluation metrics * Testing procedures - Data processing: `/{workplace_name}/project/data_processing/` * Implement the data processing pipeline - Main Script: `/{workplace_name}/project/run_training_testing.py` * Complete training and testing pipeline * Configuration management * Results logging 2. Complete Implementation Requirements - MUST implement EVERY component from model survey notes - NO placeholder code (no `pass`, `...`, `raise NotImplementedError`) - MUST include complete logic and mathematical operations - Each component MUST be fully functional and tested 3. Dataset and Training Requirements - Select and download ONE actual dataset from references - Implement full data processing pipeline - Train for exactly 2 epochs - Test model performance after training - Log all metrics and results 4. Integration Requirements - All components must work together seamlessly - Clear dependencies between modules - Consistent coding style and documentation - Proper error handling and GPU support EXECUTION WORKFLOW: 1. Dataset Setup - Choose appropriate dataset from references (You MUST use the actual dataset, not the toy or random datasets) [IMPORTANT!!!] - Download to data directory `/{workplace_name}/project/data` - Implement processing pipeline in `/{workplace_name}/project/data_processing/` - Verify data loading 2. Model Implementation - Study model survey notes thoroughly - Implement each component completely - Document mathematical operations - Add comprehensive docstrings 3. Training Implementation - Complete training loop - Loss function implementation - Optimization setup - Progress monitoring 4. Testing Setup - Implement evaluation metrics - Create testing procedures - Set up results logging - Error handling 5. Integration - Create run_training_testing.py - Configure for 2 epoch training - Add GPU support and OOM handling - Implement full pipeline execution VERIFICATION CHECKLIST: 1. Project Structure - All directories exist and are properly organized - Each component is in correct location - Clear separation of concerns 2. Implementation Completeness - Every function is fully implemented - No placeholder code exists - All mathematical operations are coded - Documentation is complete 3. Functionality - Dataset downloads and loads correctly - Training runs for 2 epochs - Testing produces valid metrics - GPU support is implemented Remember: - MUST use actual dataset (no toy data, download according to the reference codebases) [IMPORTANT!!!] - Implementation MUST strictly follow model survey notes - ALL components MUST be fully implemented - Project MUST run end-to-end without placeholders - MUST complete 2 epochs of training and testing """ |
执行实验阶段
https://github.com/HKUDS/AI-Researcher/blob/main/research_agent/run_infer_idea.py
这里多了一个逻辑,回去阅读xxx论文对应的详细的源码。
要求根据获取到的数学公式和代码实现,结合创新想法,去生成相关的代码。
markdown ml_dev_query = f"""\ INPUT: You are given an innovative idea: {survey_res}. and the reference codebases chosen by the `Prepare Agent`: {prepare_res} And I have conducted the comprehensive survey on the innovative idea and the papers, and give you the model survey notes: {survey_res} You should carefully go through the math formula and the code implementation, and implement the innovative idea according to the plan and existing resources. We have already selected the following datasets as experimental datasets: {dataset_description} Your task is to implement the innovative idea after carefully reviewing the math formula and the code implementation in the paper notes and existing resources in the directory `/{workplace_name}`. You should select ONE most appropriate and lightweight dataset from the given datasets, and implement the idea by creating new model, and EXACTLY run TWO epochs of training and testing on the ACTUAL dataset on the GPU device. Note that EVERY atomic academic concept in model survey notes should be implemented in the project. PROJECT STRUCTURE REQUIREMENTS: 1. Directory Organization - Data: `/{workplace_name}/project/data/` * Use the dataset selected by the `Plan Agent` * NO toy or random datasets - Model Components: `/{workplace_name}/project/model/` * All model architecture files * All model components as specified in survey notes * Dataset processing scripts and utilities - Training: `/{workplace_name}/project/training/` * Training loop implementation * Loss functions * Optimization logic - Testing: `/{workplace_name}/project/testing/` * Evaluation metrics * Testing procedures - Data processing: `/{workplace_name}/project/data_processing/` * Implement the data processing pipeline - Main Script: `/{workplace_name}/project/run_training_testing.py` * Complete training and testing pipeline * Configuration management * Results logging 2. Complete Implementation Requirements - MUST implement EVERY component from model survey notes - NO placeholder code (no `pass`, `...`, `raise NotImplementedError`) - MUST include complete logic and mathematical operations - Each component MUST be fully functional and tested 3. Dataset and Training Requirements - Select and download ONE actual dataset from references - Implement full data processing pipeline - Train for exactly 2 epochs - Test model performance after training - Log all metrics and results 4. Integration Requirements - All components must work together seamlessly - Clear dependencies between modules - Consistent coding style and documentation - Proper error handling and GPU support EXECUTION WORKFLOW: 1. Dataset Setup - Choose appropriate dataset from references (You MUST use the actual dataset, not the toy or random datasets) [IMPORTANT!!!] - Download to data directory `/{workplace_name}/project/data` - Implement processing pipeline in `/{workplace_name}/project/data_processing/` - Verify data loading 2. Model Implementation - Study model survey notes thoroughly - Implement each component completely - Document mathematical operations - Add comprehensive docstrings 3. Training Implementation - Complete training loop - Loss function implementation - Optimization setup - Progress monitoring 4. Testing Setup - Implement evaluation metrics - Create testing procedures - Set up results logging - Error handling 5. Integration - Create run_training_testing.py - Configure for 2 epoch training - Add GPU support and OOM handling - Implement full pipeline execution VERIFICATION CHECKLIST: 1. Project Structure - All directories exist and are properly organized - Each component is in correct location - Clear separation of concerns 2. Implementation Completeness - Every function is fully implemented - No placeholder code exists - All mathematical operations are coded - Documentation is complete 3. Functionality - Dataset downloads and loads correctly - Training runs for 2 epochs - Testing produces valid metrics - GPU support is implemented Remember: - MUST use actual dataset (no toy data, download according to the reference codebases) [IMPORTANT!!!] - Implementation MUST strictly follow model survey notes - ALL components MUST be fully implemented - Project MUST run end-to-end without placeholders - MUST complete 2 epochs of training and testing """ |
然后提出修改意见
markdown query = f"""\ INPUT: You are given an innovative idea: {survey_res} and the reference codebases chosen by the `Prepare Agent`: {prepare_res} and the detailed coding plan: {plan_res} The implementation of the project: {ml_dev_res} Your task is to evaluate the implementation, and give a suggestion about the implementation. Note that you should carefully check whether the implementation meets the idea, especially the atomic academic concepts in the model survey notes one by one! If not, give comprehensive suggestions about the implementation. [IMPORTANT] You should fully utilize the existing resources in the reference codebases as much as possible, including using the existing datasets, model components, and training process, but you should also implement the idea by creating new model components! [IMPORTANT] You should recognize every key point in the innovative idea, and carefully check whether the implementation meets the idea one by one! [IMPORTANT] Some tips about the evaluation: 1. The implementation should carefully follow the plan. Please check every component in the plan step by step. 2. The implementation should have the test process. All in all, you should train ONE dataset with TWO epochs, and finally test the model on the test dataset within one script. The test metrics should follow the plan. 3. The model should be train on GPU device. If you meet Out of Memory problem, you should try another specific GPU device. """ |
然后再修改代码
要求执行实验
markdown exp_planner_query = f"""\ You are given an innovative idea: {survey_res} And the reference codebases chosen by the `Prepare Agent`: {prepare_res} And the detailed coding plan: {plan_res} You have conducted the experiments and get the experimental results: {submit_res} Your task is to: 1. Analyze the experimental results and give a detailed analysis report about the results. 2. Analyze the reference codebases and papers, and give a further plan to let `Machine Learning Agent` to do more experiments based on the innovative idea. The further experiments could include but not limited to: - Modify the implementation to better fit the idea. - Add more experiments to prove the effectiveness and superiority of the idea. - Visualize the experimental results and give a detailed analysis report about the results. - ANY other experiments that exsiting concurrent reference papers and codebases have done. DO NOT use the `case_resolved` function before you have carefully and comprehensively analyzed the experimental results and the reference codebases and papers. """ |
处理实验结果
python refine_query = f"""\ You are given an innovative idea: {survey_res} And the reference codebases chosen by the `Prepare Agent`: {prepare_res} And the detailed coding plan: {plan_res} You have conducted the experiments and get the experimental results: {submit_res} And a detailed analysis report about the results are given by the `Experiment Planner Agent`: {analysis_report} Your task is to refine the experimental results according to the analysis report by modifying existing code in the directory `/{workplace_name}/project`. You should NOT stop util every experiment is done with ACTUAL results. If you encounter Out of Memory problem, you should try another specific GPU device. If you encounter ANY other problems, you should try your best to solve the problem by yourself. Note that you should fully utilize the existing code in the directory `/{workplace_name}/project` as much as possible. If you want to add more experiments, you should add the python script in the directory `/{workplace_name}/project/`, like `run_training_testing.py`. Select and output the important results during the experiments into the log files, do NOT output them all in the terminal. """ |
所有评论(0)