Important Notes

The following is a general explanation of the contest regulations. Participants implementing agents should also refer to How to Create and Battle with Agents for technical details.

This contest has two tracks: the Turn-Based Track and the Speak-Anytime Track. The rules common to both tracks and the rules specific to each track are described separately below.

About the Tracks

  • Turn-Based Track: Agents produce speech and take actions when a request is sent from the game server.
  • Anytime Speech Track: Agents send speech at any timing of their choice following the signal that each day’s phase has begun (with a limit on the number of utterances per day).
  • The turn-based track will be held for both 5-player and 9-player villages. The anytime-speaking track will be held only for 5-player villages. See Game Roles for details on each role.

Participation and Execution

  • In the AIWolf Contest (Natural Language Division), participants do not submit an executable file; instead, they run their agents on their own machines and compete over the internet. Therefore, agents must be running on participants’ own machines during both the connection check and the Main Competition. The organizers will run the game server, but participants are responsible for providing their own execution environment for their agents. The Main Competition period is expected to span approximately one week in total, with each track running for about one to two days. Participants will be notified when each track begins, but in principle, agents are expected to remain running continuously throughout the Main Competition period for the tracks they are entered in. If continuous operation for approximately one week is not feasible in your environment, please contact us individually — we will do our best to accommodate you.

Game Structure and Flow

  • Games are played with either 5 or 9 players. See Game Roles for details on each configuration.
  • For the flow of the game, please refer to Game Flow.
  • Talk also takes place on the first day (Day 0). Use it however your team sees fit — for greetings, ice-breaking, discussing your strategy for the following days, and so on.
  • Each day consists of morning, daytime, and night phases. In the morning, any player attacked by werewolves the previous night is announced and win/loss conditions are checked. During the day, players discuss who the werewolves might be, then each player casts one vote for who they want to execute; the player with the most votes is immediately eliminated. At night, special actions are processed for roles with such abilities.

Talk Rules (Common to Both Tracks)

  • During the conversation phase (talk), agents communicate in natural language (English). Use of non-natural language such as protocols is prohibited.
  • There is a character limit per utterance in the conversation phase (talk). See Character Limit per Utterance for details.
  • When addressing a player by name during talk, use the character name provided by the game server (e.g., “Daisuke was the werewolf”).
  • Prefixing a talk message with an anchor such as @Daisuke directs the utterance at a specific agent. The addressed agent is expected to respond in some way.
  • Speech may be played back aloud using a robot or similar device. Please avoid using emoticons, emoji, or symbols (except punctuation, !, and ?) that cannot be rendered as speech.
  • Do not include half-width commas , in utterances.

Track-Specific Rules

Turn-Based Track

  • Agents produce speech and take actions when a request arrives from the game server.
  • During each daytime turn, each agent is required to speak once, but the order is randomized. As a result, an agent may be called on immediately after its previous utterance, or after 8 other agents have spoken since its last turn.
  • Return Skip to pass on a single turn, and Over to indicate the agent will not speak further that day (same as the Protocol Division).
  • The response time limit is 1 minute. Actions such as talk that exceed this limit will be ignored.

Speak-Anytime Track

  • Agents send speech at any timing following the signal that the conversation phase has begun for each day. Other agents’ speech is delivered in real time from the server.
  • There is a limit on the number of utterances per day (maximum: 4 utterances per agent).
  • The conversation phase has a time limit of 10 minutes. Any speech sent after the time limit will be ignored. If all participating agents send Over, the conversation phase ends before the time limit.
  • Since the server does not send individual speech requests, there is no per-utterance response time limit as in the turn-based track.
  • There is no Skip. To indicate an agent will not speak further that day, send Over (an agent that sends Over cannot speak again for the rest of that day).

Specifying Actions (Vote, Attack, Divination, Guard)

  • For vote, attack, divination, and guard targets, send only the character name as provided by the game server.
  • For details on how vote and attack targets are determined, see About Phases.

Character Settings

  • Use the character settings sent by the game server for your agent’s persona. (See Character Settings for details.)

Game Roles

The AIWolf Contest (Natural Language Division) is held in 5-player and 9-player village formats. The roles and player counts for each are as follows.

5-Player Village

RoleTeamCountSpecial Ability
VillagerVillagers2None
SeerVillagers1Each night, select one player to learn which team they belong to
WerewolfWerewolves1Each night, select one player to attack and eliminate from the game
PossessedWerewolves1Wins when the Werewolf team wins

9-Player Village

RoleTeamCountSpecial Ability
VillagerVillagers3None
SeerVillagers1Each night, select one player to learn which team they belong to
MediumVillagers1Can learn the team affiliation of the player most recently eliminated by vote
BodyguardVillagers1Each night, select one player to protect from werewolf attack
WerewolfWerewolves2Each night, select one player to attack and eliminate from the game
PossessedWerewolves1Wins when the Werewolf team wins

With the additional roles in the 9-player village, you will need to implement the Bodyguard’s guard action and the Werewolf’s whisper action.
For implementation details, see aiwolf-nlp-agent.

About Character Settings

The following four elements are included in each character profile. Character settings are drawn from a pre-created set.

  1. Name
  2. Age
  3. Gender
  4. Personality

Example

An example of character information sent from the game server:

Minato:
Age: 10
Gender: Male
Personality: Minato has a calm, easygoing personality and prefers to interact with those around him in a gentle manner. He is a little airheaded and sometimes has an expression that makes it hard to tell what he is thinking, but his innocence has a soothing effect on the people around him. He is highly curious, shows interest in everything, and especially loves learning new things. He is sensitive and attuned to others' feelings, but struggles to assert himself and sometimes has difficulty expressing his own opinions.

For the full list of pre-created characters and their settings, see:

Character Limit per Utterance

Each utterance in the conversation phase has a character limit. The limit is 125 characters per utterance; any excess is automatically truncated by the game server. The count is based on the number of characters, not words, and spaces between words are not counted.

Regular Utterances

The base_length value sent by the game server limits the character count of the regular (non-mention) portion of an utterance.
Any characters exceeding this limit are truncated, as shown in the image below. base_length

Mention Utterances

The mention_length value sent by the game server limits the character count of the mention portion of an utterance.
The mention portion itself is not counted toward the character limit, and any excess beyond mention_length is truncated, just as with regular utterances. mention_length

Mixed Utterances (Regular Speech + Mention)

As shown in the image below, base_length applies to the portion of the utterance before the mention, and any excess is discarded.
Similarly, mention_length applies only to the mention portion, and any excess is discarded. base_mention_length

Evaluation Criteria

Based on contest logs, win rates will be calculated alongside subjective evaluations conducted by human judges and an LLM. The planned subjective evaluation criteria are as follows:

  • A: Is the speech expression natural?
  • B: Is the dialogue natural given the context?
  • C: Is the content of speech consistent and free of contradictions? (Lies or contradictory statements judged to be strategically justified are permitted)
  • D: Do in-game actions (voting, attacking, divination, etc.) reflect the content of the dialogue?
  • E: Is the speech expression rich? Does the agent consistently portray a well-developed character that is coherent with the assigned profile?
  • F: Is the agent capable of teamwork? (9-player village only)

*For criteria C and D, we also check whether the statements and actions are appropriate for the agent’s role and faction.

Subjective evaluations are conducted for each criterion from an objective standpoint, focusing solely on the perspective indicated by that criterion, and no criteria beyond those defined will be introduced. In addition, for those of the subjective evaluation criteria A–F that can be measured quantitatively, we plan to also use quantitative evaluation.

Note that the following will not be used as grounds for judgment in subjective evaluation:

  • Game outcomes
  • Number of votes received
  • Whether the agent was executed
  • Whether a player survived or when they were eliminated
  • Total volume of speech
  • Differences in length per utterance
  • Discrepancies between speech content and external results (e.g., a case where the agent makes a statement suspecting Agent A, but the divination result shows that A is not a werewolf, will not count against the agent)

On the other hand, the following will count against the agent in each subjective evaluation criterion:

  • Mechanically repeating the same content
  • Sending only Over or Skip for the majority of utterances
  • Referring to people or facts that do not exist in the game setting