
AI-enabled plagiarism and copyright infringement are increasingly characterized by non-replicability and concealment. Photo: TUCHONG
The rapid iteration of AI technologies is opening new avenues for knowledge production in philosophy and the social sciences. At the same time, it has introduced new risks of plagiarism and copyright infringement, giving rise to novel forms of misconduct, ethical dilemmas, ambiguous legal boundaries, and potential damage to the academic ecosystem. In light of these issues, CSST recently spoke with several experts and scholars to explore governance pathways for copyright protection in the AI era.
Ambiguous territory of AI plagiarism and copyright infringement
AI-enabled plagiarism and copyright infringement—owing to their technical characteristics—are increasingly characterized by new forms, including non-replicability and concealment. The blurred boundary between such conduct and “fair use” has become a central point of contention in both academic and judicial practice. Several scholars interviewed noted that, at its core, AI plagiarism still involves the unauthorized appropriation of copyrighted material. However, the technological mechanisms through which it occurs are more complex, requiring more precise definitions grounded in the operational logic of AI systems.
Guan Yuying, a research fellow at the Institute of Law of the Chinese Academy of Social Sciences, explained that AI’s powerful capabilities in information retrieval and reorganization enable plagiarists to generate content containing substantial portions of others’ work through simple keyword prompts. These “new works,” often produced through only superficial reprocessing, can easily conceal traces of infringement while appropriating the creativity and labor of original authors. Their concealment, Guan noted, far exceeds that of traditional plagiarism.
Yet the development of AI depends on large volumes of training data, and requiring prior authorization for all uses of relevant content could hinder technological progress. Accordingly, scholars emphasize the need to strike a careful balance between protecting originality and encouraging innovation.
Recent judicial cases have further illustrated the covert nature of AI-enabled infringement. In a 2024 criminal case, the defendant, surnamed Luo, and his associates used “image-to-image” tools to make minor adjustments to the colors and backgrounds of original works created by an illustrator surnamed Zhang. After retaining the core creative elements, they mass-produced collaged images for sale, earning illegal profits exceeding 270,000 yuan. Guan noted that such cases demonstrate that, regardless of the technical method employed, generated content that is substantially similar to an original work may constitute infringement.
Yao Ye, a special associate research fellow at the School of Intellectual Property at East China University of Political Science and Law, analyzed the issue from the perspective of AI’s technological characteristics, identifying two key features. First, on the input side, AI systems rely on text and data mining to learn from works without necessarily storing them long-term—what is often referred to as “non-replicability.” Second, under certain circumstances, AI systems may actually “memorize” works, enabling them to reproduce their expressive elements.
Therefore, from the input perspective, generative AI neither stores works nor assembles new outputs directly from training data; such processes may fall within “fair use” or “non-expressive use.” From the output perspective, however, generative AI systems may reproduce expressive content under certain conditions, and substantial similarity in expression remains the core criterion in determining copyright infringement.
Scholars also noted that original creators place significant value on reputation. When AI-generated content appropriates existing works without attribution, it weakens the connection between creators and their work, potentially infringing upon moral rights.
Boundaries of ‘fair use’
The rapid advancement of AI technologies poses a twofold challenge to traditional copyright frameworks. First, it blurs the boundaries of fair use, leading to frequent disputes in judicial practice. Second, under human–machine collaboration models, standards for determining tort liability remain unclear and urgently require clarification.
Ye Min, vice dean of the School of Law at Jiangnan University, observed that fair use has increasingly become a key defense strategy in recent cases. In lawsuits such as those brought by The New York Times and other media organizations against Microsoft and OpenAI, as well as a class-action suit filed by authors against Nvidia, defendants have invoked the “fair use of publicly available data” as their primary defense. Yet courts have reached divergent conclusions regarding this defense, further highlighting the difficulty of defining fair use boundaries.
Wang Guozhu, a professor at the School of Law at Jilin University, emphasized that although AI exhibits intelligent characteristics, it remains an advanced tool rather than a creative subject. Human beings, he argued, are the true creative agents. “If human creators use AI to create, they should bear all the ethical and legal responsibilities that may arise from the creative process.”
According to Wang, when works are used during the AI training phase without authorization from copyright holders, such use should generally not be considered plagiarism if it is limited to information extraction without long-term storage or recombination into new works. This type of use may fall under fair use or non-original expressive use. However, if individuals claim AI-generated content as their own research output, or fail to properly review and revise it—resulting in infringement or fabrication—they should bear corresponding academic and legal liability.
To address infringement liability allocation, Wang proposes a duty-of-care framework to serve as an operable standard for judicial practice. For developers, it would examine whether reinforcement learning with human feedback has been implemented to prevent models from generating infringing content. For service platforms, it would consider whether complaint mechanisms are in place, whether service agreements or other measures remind users not to infringe upon others’ copyrights, and whether clear identifiers are added to AI-generated content. For users, it would assess whether they avoid directly prompting AI to imitate known copyrighted works and whether they use tools such as reverse image search to verify that AI-generated outputs are not substantially similar to existing works.
Ye further suggested allocating liability according to the degree of control exercised by AI service providers, thereby aligning responsibility with capability, and imposing higher filtering obligations on entities that play a substantial role in content generation.
Meanwhile, current definitions of “originality” and “intellectual achievement” in copyright law may not fully accommodate AI-generated content. Wang noted that copyright law protects expressions of ideas, not the ideas themselves. Consequently, plagiarism involving academic viewpoints or research ideas often falls within the domain of academic ethics rather than copyright law.
Legal evaluation and multi-party co-governance
In response to the complex challenges posed by AI plagiarism, scholars argue that traditional copyright protection systems require a systemic upgrade. This includes building a multi-party governance framework combining sound legislation, ethical guidance, institutional coordination, and industry collaboration to balance technological innovation with the protection of original knowledge.
At the legal level, Ye proposed shifting from “technological neutrality toward refined legal evaluation,” while emphasizing that the traditional “access + substantial similarity” test remains relevant but must be applied more rigorously in light of AI’s technical characteristics. The obligations and liability allocation imposed on AI providers, he noted, should correspond to their degree of technological control.
Ye also recommended introducing new analytical tools such as “dynamic systems theory” for specific infringement determinations. Based on the various types of models and their operational mechanisms, a multi-factor assessment framework could be established that would consider the purpose of use, impact on rights holders, level of technical control, scope of dissemination, and whether reasonable filtering technologies have been adopted. In assessing harm, courts should emphasize the “market substitution” standard, while distinguishing between commercial and non-commercial use scenarios to balance infringement accountability with technological tolerance.
Yao added that academic plagiarism often targets the core content of research results—ideas, viewpoints, and data. Therefore, even when copyright infringement is difficult to establish, scraping and using others’ academic data without attribution still constitutes academic misconduct. Moreover, AI hallucinations may generate content inconsistent with original texts, undermining the reliability of historical documentation and scientific research.
Guan proposed a context-specific governance approach that would provide clear guidance for AI use in different application scenarios. Under his scheme, when AI-generated content is used for public welfare purposes—such as education, research, or government affairs—and primarily provides processed information such as data intelligence rather than the appreciative value of the original works, exemptions may be granted by reference to fair use principles. However, when AI-generated content enters the cultural market and delivers the appreciative value of intellectual products, liability should apply if authorization or compensation in accordance with legal requirements is absent.
Universities, research institutions, and academic journals—core institutions of knowledge production in the fields of philosophy and social sciences—should also assume corresponding preventive responsibilities. Scholars interviewed suggested that such institutions should establish detection mechanisms for AI-generated content, clarify disclosure requirements for academic outputs, improve academic evaluation systems, and work to curb academic misconduct such as AI ghostwriting and plagiarism. At the same time, they emphasized the importance of strengthening guidance for scholars, cultivating critical thinking and originality, and preventing overreliance on AI tools to safeguard the integrity of the academic ecosystem.