bulbous-oar
We don’t know that no possible large neural network implements an optimizer. If one does, we could stumble into it via gradient descent, just as we’ve stumbled into large neural networks that have world models.
Sure, we don’t know that no possible neural network implements an optimizer. In fact, I’ll go farther: it seems like if it’s possible to write one, which it probably is, we should probably be able to encode one in a large neural network. It’s just a very flexible framework for a model.
(Of course that flexibility is also the weakness: if it could be anything it has no strong pressure to be a specific thing.)
But I can say something stronger and also dumber than that, right? If an optimizer exists, then it’s possible that if I have my computer spit out 10TB of random bits and then interpret that as the source code to a C program, it will compile to that optimizer. It’s just…unlikely.
Large neural networks like GPT are not attempting to create optimizing agents. The training process is (outer-)optimizing for something, and the thing it’s optimizing for is very much not an (inner-)optimizer.
So this argument is something like, maybe writing an extremely powerful optimizer is so easy that we’ll do it by accident. And then not only will we create one, but it will be so powerful that it can out-optimize us, even if we’re trying to stop it, even though that’s if anything counterproductive to what we were training the model to do. That’s not impossible but it seems extremely unlikely.
(In contrast, I don’t think we “stumbled into” world models exactly; they are something we were more or less explicitly trying to get for the past couple years. This is part of Sarah’s point: people are trying to get world models, but they aren’t trying to get ontological stability or train inner optimizing loops basically at all.)