IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

Qiyao Wang; Hongbo Wang; Jianguo Huang; Shule Lu; Yuan Lin; Kan Xu; Liang Yang; Hongfei Lin

doi:10.70891/JAIR.2025.040011

Authors

Qiyao Wang
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Beijing China
Hongbo Wang
Dalian University of Technology, Dalian China
Jianguo Huang
Shanghai Jiao Tong University, Shanghai China
Shule Lu
Beihang University, Beijing China
Yuan Lin
Dalian University of Technology, Dalian China
Kan Xu
Dalian University of Technology, Dalian China
Liang Yang
Dalian University of Technology, Dalian China
Hongfei Lin
Dalian University of Technology, Dalian China

DOI:

https://doi.org/10.70891/JAIR.2025.040011

Keywords:

large language models, benchmark, intellectual property

Abstract

With the rapid development of Large Language Models (LLMs) in vertical domains, attempts have been made to the field of intellectual property (IP). However, there is currently no evaluation benchmark specifically for assessing the understanding, application, and reasoning abilities of LLMs in the IP domain. To address this issue, we introduce IPEval, the first capability evaluation benchmark designed for IP agency and consulting tasks. IPEval consists of 2657 multiple-choice questions, divided into four major capability dimensions: creation, application, protection, and management. These questions cover eight areas: patent rights which including inventions, utility models, and designs, trademarks, copyrights, trade secrets, integrated circuit layout design rights, geographical indications, and related laws. We designed three evaluation methods: zero-shot, 5-few-shot, and Chain of Thought (CoT) for seven kinds of LLMs with varying parameters, primarily using either English or Chinese. The study results indicate that the GPT series and Qwen series models demonstrate stronger performance in English tests, while Chinese-major LLMs, such as the Qwen series, outperform GPT-4 in Chinese tests. Specialized legal domain LLMs, such as the fuzi-mingcha and MoZi, still significantly lag behind general-purpose LLMs of comparable parameter sizes in IP performance. This highlights the necessity and substantial potential for developing more specialized LLMs with stronger IP abilities. We also analyze the models' capabilities in terms of the regional and temporal aspects of IP, emphasizing that IP domain LLMs need to clearly understand the differences in IP laws across different regions and their dynamic changes over time. We hope IPEval can provide an accurate assessment of LLM capabilities in the IP domain and encourage researchers interested in IP to develop LLMs with richer IP knowledge.

IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Quick Submit

Publication Tips

Powered by Infinity Science Press