AI Sentience projects explore the philosophical and empirical questions surrounding machine consciousness to advance our understanding of artificial sentience.

FIG helps you build career capital. You can spend 8+ hours a week working on foundational philosophical issues that can improve technical AI safety and mitigate catastrophic risks.

Our project leads are looking for postgraduate students across multiple fields (including computer science and philosophy), people with experience in machine learning, decision and game theory specialists, and well-read generalists with a track record of high-quality written work.

Scroll down to learn more. On this page, we list our focus area, project leads, and open projects.

Applications for the Winter 2025 FIG Fellowship are now open!

Apply here by Midnight (Anywhere on Earth) Sunday 19th October!

Focus Areas

In the next few months, we will work on:

Governance of AI Sentience: projects in research ethics and best practices for AI welfare, constructing reliable welfare evaluations, and more.

Foundational AI Sentience Research: projects in models of consciousness, eliciting preferences from LLMs, individuating digital minds and evaluating normative competence.

Project Leads

    • Jeff Sebo

      • Individuating digital minds

      • Proposals for exploring sentience-relevant ‘homologies’ across substrates

    • Patrick Butlin

      • Running an experiment on AI preferences

      • Philosophical projects on AI welfare

    • Derek Shiller

      • Foundational Technical Research for AI Welfare

    • Seth Lazar

      • Evaluating LLM Agent Normative Competence

    • Brad Saad

      • Digital Minds

Governance of AI Sentience

Projects in research ethics and best practices for AI welfare, constructing reliable welfare evaluations, and more.

Rosie Campbell

Managing Director, Eleos AI

Research ethics for AI welfare

When we experiment on human subjects, we have research ethics rules and best practices to help protect their welfare, such as requiring informed consent. In the event that AI systems are or become welfare subjects, what kind of research ethics recommendations should we make to those conducting experiments on these systems? For example, in the paper "Alignment faking in large language models," the authors kept their word to donate money to a charity of Claude's choice. Should keeping promises in this way be a best practice? What other research ethics precedents might we want to set? What are the trade-offs with such guidelines? To what extent can we transfer best practices from human subjects, and to what extent do we need different practices for AI systems?

Robert Long

Executive Director, Eleos AI

  • Candidates with social science experience are likely to be suited to this, especially if they have experience running studies involving human subjects and interacting with research ethics. Ideally the candidate will be relatively autonomous, for example by seeking out the relevant literature. The candidate should be able to think analytically about trade-offs and be able to anticipate unintended consequences and second-order effects of any proposed guidelines. Familiarity with the kind of experiments and evals that get run on on AI systems is a major plus.

  • MVP output would be a memo we can use to inform Eleos recommendations to labs and other orgs, but a paper or report that can be published would be ideal. We are flexible on timing.

Building better AI welfare evals

For the release of Claude 4, Eleos conducted some AI welfare evals for Anthropic. These evals were limited in various ways, such as being based on self-reports which are notoriously unreliable, and being ad hoc and largely manual. How can we iterate on this work to create a suite of AI welfare evals that can be easily run by frontier labs or model evaluators, and how can we improve the quality of these evals, either by improving the reliability of self-reports or by using other evaluation techniques?

  • Candidates should have ML engineering experience, ideally building evals. Knowledge or an interest in the science of consciousness would be very beneficial. We will be able to provide input on research directions but not much technical support.

  • It's unclear whether the output will be a new set of evals that can be run on models, or a report describing new eval methodologies. Either could be valuable.

Toward a Code of Practice for digital sentience research

This project will focus on exploring a multi-level governance framework for research into conscious AI systems. It will draw heavily on existing ethical principles in human and animal research, such as the Nuremberg Code, the Declaration of Helsinki, the 3Rs, and the theorised and implemented enhancements that they have seen. We will analyse how these guidelines can be adapted and used as a starting point to address unique AI sentience research-related issues, and how the complex question of legal personhood for artificial intelligence bears on these questions. Additionally, the project could explore what establishing dedicated oversight bodies for AI sentience research at multiple levels of government might look like. The research could also examine existing and proposed regulatory approaches from various global jurisdictions to understand how different legal systems might grapple with the profound ethical and social implications of advanced, potentially conscious, AI technologies.

Potential Outputs

  • A novel ethical framework for researching sentient AI, based on principles from human and animal research.
  • A governance proposal for a novel Institutional Review Board model, designed to oversee research into conscious AI.
  • A comparative analysis of global high-consequence research regulations, with policy recommendations for how different countries can govern research on artificial sentience.
  • A historical analysis of the development of ethical research institutions and principles.

Jeff Sebo

Affiliated Professor of Bioethics, Medical Ethics, Philosophy, and Law. NYU

    • A legal or governance scholar with expertise in international law and global governance.

    • A bioethicist with a background in human and animal research ethics that would be interested in applying these principles to research in artificial consciousness.

    • An AI Safety Researcher who might be interested in implement technical safeguards in artificial sentience research.

    • Philosopher of Mind specializing in consciousness studies to address questions of legal personhood and the moral status of AI.

Investigating Beliefs about AI Sentience

To understand human-AI interaction and its downstream consequences, such as support for regulatory policies and willingness to advocate for AI welfare, we need to better understand beliefs about AI. This project seeks to evaluate extreme beliefs about AI sentience such as the belief in “awakening” chatbots, as situated in social scientific and psychological theories of conspiracy thinking, delusion, and persuasion. This project entails reviewing psychological and human-computer interaction literature, designing a study, collecting and analyzing data, and writing a report.

Janet Pauketat

Research Fellow, Sentience Institute

  • The ideal candidate is comfortable independently reading, summarizing, and writing in social science. This person has experience using social scientific methods like surveys, text analysis, experiments, interviews, or focus groups at a Master's degree level or higher. Creative thinking is desirable, and it would be especially useful to have some background in human-computer interaction, or social, moral, or cognitive psychology.

Foundational AI Sentience Research

Projects in models of consciousness, eliciting preferences from LLMs, individuating digital minds and evaluating normative competence.

Individuating digital minds

This project will investigate the metaphysical and ethical issues that come with individuating and counting digital minds. Frontier LLMs present particular challenges to this task because they are not single, monolithic entities that chatbot interfaces might present them to be, but rather a collection of distributed processes. An example of this can be found in architectures that implement routing, a system that directs a user's query to the most suitable model from a pool of candidate models in order to optimise efficiency and cost. This might imply that consciousness is realised in a distributed manner across multiple servers in geographically distinct locations.

Potential Outputs

  • A taxonomy of individuation problems for digital minds (e.g., boundaries, persistence, fragmentation, merging, etc.) and an analysis of why these problems might be uniquely instantiated in artificial systems.
  • A review paper situating the individuation problem for artificial sentience within the philosophy of mind, metaphysics of identity, and AI safety literature.
  • A mapping of LLM architectures and routing mechanisms that are potentially relevant to individuation and counting conscious states.
  • A white paper for policymakers on the risks of undercounting or overcounting digital minds.

Jeff Sebo

Affiliated Professor of Bioethics, Medical Ethics, Philosophy, and Law. NYU

    • Philosophers with experience in philosophy of mind and metaphysics who can clarify questions of individuation, persistence, and identity for digital systems.

    • AI researchers with knowledge of large-scale distributed systems and cognitive modeling who can analyze how LLM architectures and routing might affect the realization of artificial sentience. 

    • Ethicists with experience in AI governance and emerging technology policy who can assess the moral and political implications of counting or individuating digital minds.

    • Interdisciplinary scholars with experience in both technical AI research and philosophy of mind who can bridge metaphysical frameworks with the practical realities of AI architectures.

Proposals for exploring sentience-relevant ‘homologies’ across substrates

We'd be excited to review and steer high-quality proposals for establishing a framework for intersubstrate welfare assessment by identifying sentience-relevant homologies shared between biological and artificial systems. Directly addressing the lack of a shared evolutionary history, we'll look for approaches grounded in biological and philosophical theories of (functional) homology and the debates around them. This foundational work could support the exploration of candidate homologies that might be able to support attributions of sentience to artificial systems. By identifying these empirically grounded common denominators, project proposals should aim to enable rigorous, quantifiable, and bidirectional comparisons between biological and artificial minds with the goal of taking a step towards a more technical and objective understanding of sentience and its realisation across substrates.

Potential Outputs (Fellow-Led)

  • A literature review on the problem of homology in biology and comparative cognition and an initial analysis on how this could be extended to artificial systems that did not emerge in biological evolution.
  • An initial comparative taxonomy of candidate homologies between biological and artificial systems.
  • A set of empirical case studies demonstrating how specific biological–artificial homologies might be rigorously assessed and tested.
  • A methodological framework for systematically theorising and evaluating intersubstrate homologies in ways that might support welfare assessment.
  • Quantitative metrics or indicators derived from homologies to enable systematic comparison of biological and artificial minds.
    • Neuroscientists or cognitive with experience in computational and systems neuroscience who are interested in identifying biological mechanisms that might be able to be mapped onto candidate artificial analogues.

    • Machine learning researchers, potentially with experience in mechanistic interpretability, reinforcement learning or neural network architectures, who are interested identifying and formalizing candidate sentience relevant features in artificial systems.

    • Comparative cognition scholars with experience across performing comparative analysis who can contribute empirical case studies of homology assessment.

    • Philosophers of mind or cognitive science with experience in theories of consciousness and welfare who can integrate conceptual clarity into the methodological framework for intersubstrate comparison.

Running an experiment on AI preferences

With a team of FIG fellows, I have been planning an experiment on LLM preferences. We want to know whether LLMs will take costly instrumental actions to achieve outcomes that they claim to prefer. Our ultimate aim is to contribute to AI welfare research; we take it that having genuine preferences matters for being a welfare subject, and that taking costly instrumental actions is a mark of genuine preference.

We need extra help to implement our experiment, so I'm now looking for fellows with experience conducting behavioural experiments with LLMs to join the team. As well as implementing the experiments, fellows would be involved in planning, data analysis and writing up the work; we intend to produce a paper for publication in a suitable venue.

Patrick Butlin

Senior Research Lead, Eleos AI

  • Essential attributes:

    • Experience conducting behavioural experiments with LLMs or demonstrable possession of the relevant skills

    • Ability to work independently; reliability and initiative

    • Ability to communicate effectively with a team with diverse expertise

    • Sufficient time available to make a substantial contribution to the project

    Desirable attributes:

    • Experience in data analysis and scientific writing

  • The project output will be a paper presenting our experiment. The experiment should ideally be completed within the 12-week project period, but I would prefer the fellow to continue to work with the team until the paper is complete.

Philosophical projects on AI welfare

I am open to supervising philosophical projects on AI welfare if they are of outstanding interest. Suitable topics might include, among many others:

  • Functional accounts of aspects of agency and the self, and/or accounts of their significance for moral patienthood
  • The case for or against biological naturalism about consciousness
  • A suitable candidate would be a very capable philosopher, with a PhD in progress or comparable (or stronger) credentials. They should also be motivated to contribute to this area and able to work productively with limited supervision.

  • The aim of the project would be to write a short (potentially co-authored) paper, or a blogpost or report presenting ideas that I could use in future work. This work should be complete or near-complete within 12 weeks.

Foundational Technical Research for AI Welfare

We plan to engage in experimental work related to foundational welfare-relevant capabilities in LLM models and would like to bring a FIG fellow on to design and conduct experiments that probe model capabilities, structure, and function, particularly around introspection, coherence, preference, and decision-making. Examples of potential projects include evaluations of token counting capabilities or the sensitivity of 'assistant' labelling to character trait presentation. The exact details of what we test will be dependent both on our priorities at the time and fellow interest.

Derek Shiller

Senior Researcher, Rethink Priorities

  •  The ideal fellow would have significant familiarity with contemporary LLM architecture and function and the ability to carry out technical work both through commercial APIs and with the help of cloud-hosted open-weight models. Familiarity with Python and git is expected. Experience with Huggingface, Docker, and major ML libraries is a plus.

  • The goal is for the research project to be completed in three months time, starting with a relatively simple experiment to get the team and the fellow comfortable working on a technical project together, followed by a more in-depth experiment aiming at a more substantive take-away based on the avenues we find most promising following the first. The final month will be devoted to polishing up results and putting them into a presentable publication format, to be put on arXiv.

Evaluating LLM Agent Normative Competence

This project explores the strengths and limits of LLM and LLM agent normative competence, by developing ecologically valid evaluations that situate AI systems in normatively-loaded scenarios, and test their ability to respond adequately to moral and other reasons. Normative competence is one of the central pillars of moral status on most frameworks, and alongside rational autonomy may be sufficient for personhood, even in the absence of subjective experience. This work will contribute experimental support to theoretical research applying John Rawls' 'political conception of the person' to AI agents.

Seth Lazar

Professor of Philosophy, Australian National University

  • The ideal candidate will have technical prerequisites for, and ideally experience with running LLM evaluations. Some background developing LLM agents in particular would be especially useful. They should show a strong ability to work independently and asynchronously, and show reliability and initiative in their approach to the work. The candidate will be onboarded into the MINT lab (https://mintresearch.org) which will provide resources for upskilling on the relevant philosophical debates, and other researchers engaged in similar experimental work. 

Digital Minds

I'm looking for applicants to work on impactful projects on digital minds (understood as AI systems that merit moral consideration for their own sake, owing to their potential for morally significant mental states).

Digital Minds Governance Projects

Participants will work on projects at the intersection of digital minds and AI governance. Default output: a document about a particular area of AI governance that is tailored to be useful for people seeking to positively influence outcomes for digital minds through AI governance but who lack background in AI governance.

Indicator gaming problem projects

Default output: an experimental demonstration the gaming problem for AI consciousness or AI moral patiency indicators. (For background, see, e.g. here).

Digital Minds Ecosystem Projects

Default approach: search for low hanging fruits for improving the digital minds information ecosystem. Then either report the existence of such fruits or pick some of them. Possible outputs: translations of important works on digital minds into (for example) Mandarin; creation of audio versions of important works on digital minds; a post on important channels for influencing digital minds discourse that digital minds researchers are apt to overlook.

Website Project

Work on developing a website that provides an overview of what AI developers have done for the sake of digital minds.

Brad Saad

Senior Research Fellow, University of Oxford

  • The following descriptions aren't strict conditions for acceptance, but I expect most or all accepted applicants to satisfy the description for whichever type of project they work on.

    Digital minds governance: at least three months of research experience (outside of courses) working on AI governance. Excited to do distillation work. Brad already has distillation documents on compute governance, international AI governance, and agent governance. Applicants should identify another part of AI governance in which they're well-positioned to do distillation work. This could be another sub-area, a strategy in AI governance with connections to digital minds

    Gaming problem projects: Academic and/or working background in CS (or similar). Have previously conducted and written up at least one LLM experiment (if available, please include in the application). Background interest in digital minds is a plus, but ability to take initiative and execute project without technical guidance is key.

    Ecosystem improvement projects: Excited about and well-positioned to execute on a specific way of improving digital minds information ecosystem. Open to finding even better ways of improving the ecosystem. For translation projects, applicants should be fluent in the target language.

    Website project: Good web development skills, good AI welfare context, and good judgment regarding communications and epistemics with respect to the website content.

  • Please note that coauthoring isn't the default on projects, though I'm open to discussing this on a case by case basis with applicants.

    Applicants should be willing to devote at least 10 hours each week to their project and to attend online group research meetings once every two weeks. Participants should aim to complete their project by the end of the 12 weeks.

Suryansh, a FIG co-founder, presenting his research at the Spring 2024 Research Residency.