IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

Authors

  • Qiyao Wang
    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Beijing China
  • Hongbo Wang
    Dalian University of Technology, Dalian China
  • Jianguo Huang
    Shanghai Jiao Tong University, Shanghai China
  • Shule Lu
    Beihang University, Beijing China
  • Yuan Lin
    Dalian University of Technology, Dalian China
  • Kan Xu
    Dalian University of Technology, Dalian China
  • Liang Yang
    Dalian University of Technology, Dalian China
  • Hongfei Lin
    Dalian University of Technology, Dalian China

DOI:

https://doi.org/10.70891/JAIR.2025.040011

Keywords:

large language models, benchmark, intellectual property

Abstract

With the rapid development of Large Language Models (LLMs) in vertical domains, attempts have been made to the field of intellectual property (IP). However, there is currently no evaluation benchmark specifically for assessing the understanding, application, and reasoning abilities of LLMs in the IP domain. To address this issue, we introduce IPEval, the first capability evaluation benchmark designed for IP agency and consulting tasks. IPEval consists of 2657 multiple-choice questions, divided into four major capability dimensions: creation, application, protection, and management. These questions cover eight areas: patent rights which including inventions, utility models, and designs, trademarks, copyrights, trade secrets, integrated circuit layout design rights, geographical indications, and related laws. We designed three evaluation methods: zero-shot, 5-few-shot, and Chain of Thought (CoT) for seven kinds of LLMs with varying parameters, primarily using either English or Chinese. The study results indicate that the GPT series and Qwen series models demonstrate stronger performance in English tests, while Chinese-major LLMs, such as the Qwen series, outperform GPT-4 in Chinese tests. Specialized legal domain LLMs, such as the fuzi-mingcha and MoZi, still significantly lag behind general-purpose LLMs of comparable parameter sizes in IP performance. This highlights the necessity and substantial potential for developing more specialized LLMs with stronger IP abilities. We also analyze the models' capabilities in terms of the regional and temporal aspects of IP, emphasizing that IP domain LLMs need to clearly understand the differences in IP laws across different regions and their dynamic changes over time. We hope IPEval can provide an accurate assessment of LLM capabilities in the IP domain and encourage researchers interested in IP to develop LLMs with richer IP knowledge.

Downloads

Published

2025-08-02

Issue

Section

Articles

How to Cite

Wang, Q., Wang, H., Huang, J., Lu, S., Lin, Y., Xu, K., Yang, L., & Lin, H. (2025). IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models. Journal of Artificial Intelligence Research, 2(1), 9-27. https://doi.org/10.70891/JAIR.2025.040011