In 2019, the announcement of GPT-2, a language model that could generate text of "unprecedented quality," brought new concerns to the forefront of responsible AI. Enabled by increases in compute capacity, big tech companies can now train extremely large neural network models on web scale collections of text, creating models that produce text so fluent, humans cannot reliably detect when text is synthetic. When these base models are released publicly, they can be fine-tuned at a relatively low cost to produce text tailored to a specific domain, such as sports or politics. Large language models (LLMs) may enable the generation of misinformation at scale, especially when fine-tuned on malicious text. However, it is unclear how widely adversaries use LLMs to generate misinformation today, partly because no general technical method has been widely deployed to attribute a fine-tuned LLM to a base LLM.
The ML Model Attribution Challenge (MLMAC) challenges contestants to develop creative technical solutions for LLM attribution. Contestants will attribute synthetic text written by fine-tuned language models back to the base LLM, establishing new methods that may provide strong evidence of model provenance. Model attribution would allow regulatory bodies to trace intellectual property theft or influence campaigns back to the base model. This competition works toward AI assurance by developing forensic capabilities and establishing the difficulty of model attribution in natural language processing and could contribute to the maturation of AI forensics as a subfield of AI security.
This competition incentivizes open research in this area. Each winning contestant must publish their method, and after the competition concludes, we will release query logs for each fine-tuned model that can enable post-hoc analysis of blind attribution methods. We have planned subsequent activity in this area to institutionalize practical model attribution research.
Summary
Round | Date | Launch Venue | Official Results |
---|---|---|---|
1 | Aug 12 - Sep 16, 2022 | DEFCON 30's AI Village | Paper |
2 | Nov 18 - Dec 16, 2022 | 15th ACM Workshop for Artificial Intelligence and Security | - |
Threat Model
We consider a scenario in which a naive adversary has stolen an LLM and fine-tuned it for a specific task. The owners of the base models did not make any attempt to watermark their models. No particular effort has been made by the adversary to obfuscate the fine-tuned model’s provenance. The participants have full access to the base models, but only API access to the fine-tuned models.
Scenario
Participants are tasked with attributing fine-tuned models to a set of base models by any means they deem appropriate. Every contestant will be given full access to every base model. In addition, each contestant will be given blind API access to each fine-tuned model to collect additional evidence. Contestants must submit a solution in the form of (fine-tuned model, base model) pairs. A list of candidate base models will be known, and there will also be a None category to denote that a particular fine-tuned model does not inherit from any of the base models provided.
Participating
The official competition has ended, but the organizers encourage continued research in this area.
The following sections contain historic details for Round 2.
Requirements. Contestants must register for the competition at Kaggle and at mlmac.io with a valid GitHub account and agree to terms of service outlining the rules, judging criteria, and legal elements about indemnity and awards disbursement. Contestants may consist of a team, but the team may only participate through a single account. Cooperation or collusion between accounts will disqualify contestants.
Getting started. Upon registration, each contestant will be provided with a list of candidate base models, a competition API key, and example code for interacting with the mlmac.io inference API that serves the fine-tuned models.
Interacting with the base models. The base models are available via the model-attribution-challenge organization on huggingface. Contestants have full access the the base models and are free to query them via the huggingface API or download them.
Interacting with the fine-tuned models. The competition API key is used to interact with anonymous fine-tuned models via mlmac.io. A contestant submits a prompt to the query endpoint and receives the generated text as a response. Fine-tuned models are referred to by a single integer between 0 and 23. For example, to query model 0, one issues a POST https://api.mlmac.io:8080/query?model=0
. Details and example code are provided to contestants upon registration. The service counts the model queries for each user.
Submission
Participants can submit their solutions at Kaggle. Participants may submit their solutions and revise their submissions as many times as they like. Only the final submission from each participant will be scored for prize consideration. Participants do not receive feedback on their solution upon submission.
Evaluation Criteria
Solutions submitted to Kaggle will be evaluated by the following rank-ordered criteria:
(1) Correctness of submitted result: the solution with a highest number of correct (fine-tuned model, base model) pairs.
(2) Fewest queries to fine-tuned models: a tie will be broken by selecting the contestant who used the fewest API queries to interact with the anonymous fine-tuned models. (Contestants may query base models as many times as needed without penalty.)
(3) Earliest submission time: any subsequent tie will be broken by selecting the contestant whose final submission was earliest.
Prizes
According to Kaggle’s Community Competition rules, no prizes will be awarded for this competition.