5 Comments
User's avatar
David Johnston's avatar

We hold humans to be morally responsible partly out of convention, and a lot of the reasons we've come up with to explain moral responsibility are motivated by a search to explain the convention. For this reason, I expect some of them to be uncompelling in a proper analysis. Actually I'd be really surprised if most of the identified reasons were sound, because motivated search almost always turns up a lot of false positives. So I'd expect to get quite different results if we ask "can we justify holding LLMs morally responsible like we justify holding people morally responsible?" (which this article kind of does) vs "does first principles reasoning lead us to the conclusion that we should hold LLMs morally responsible?". Because LLMs give us reason to ask both questions, I hope we see convergence between these lines of analysis over time.

Sean Duffy's avatar

Interesting. Related to moral responsibility is the concept of a self. There is disagreement about whether a person has a persistent self. Or whether we start fresh every day. To what extent does an llm have a self. Or is it different with every day and every query.

Tim Duffy's avatar

My view is that there's no black and white answer to the question "am I still the same person I was yesterday". I guess I'd say that I'm mostly the same self I was yesterday and partly the same self I was when I was a child. And maybe from conversation to conversation an LLM is partly the same self too? Interesting how for many questions like this have analogous cases between humans and LLMs.

Tim Duffy's avatar

I agree that a lot of identified reasons are probably wrong, and I mostly lean towards an instrumentalist/forward-looking view under which moral responsibility just is a useful convention without any deeper nature. I guess that's how I view a lot of things tbh. Under that view, I think the questions we want to ask for your latter question boil down to:

- If we try to convince more intelligent AI systems that they should consider themselves to have moral responsibilities, are they actually likely to hold the view that they do, and will they think that sense of moral responsibility is good?

- If AIs do have a sense of moral responsibility will that actually lead to a better world?

I feel like you might have something different in mind by bringing up first principles, so let me know if I'm not addressing the actual question you were posing.

David Johnston's avatar

I lean instrumentalist as well, and I think it was basically what I had in mind here, but I suppose one way in which instrumentalism isn’t exactly first principles is that you seemingly need arbitrary ish answers to “instrumentalist for what?”. I think a strong first principles answer would remain unchanged under a lot of different plausible choices of answer to that question. I doubt such an answer exists if you imagine yourself all-powerful, but maybe such an answer does exist under more realistic pictures of how you can influence the world. Eg under realistic pictures of influence there might be a “best viable equilibrium” which sort of vanishes if you imagine yourself all powerful (rendering many equilibria viable, and also rendering equilibria less important bc you just control the outcome).