Tracing the thoughts of a large language model

Posted by Imicrowavebananas

1 Comment

  1. Imicrowavebananas on

    AI is an important topic and it has been discussed a lot here, but it is also a very technical topic at its heart. So I think posting some solid pieces that explain the basics is useful. A lot of the fundamental questions, copyright for example, are based quite directly on how these models actually work internally.

    Personally, I found the articles and research by Anthropic on this the most accessible. What I liked even more is that they seem technically serious. To really answer these questions, I do not think you can rely too much on metaphors. That is why I did not like Ted Chiang’s “blurry JPEG” article in *The New Yorker* that much. It is a good phrase, but you are not left with much new understanding if all you get is a vague analogy.

    There is a good Feynman bit from an old interview where he is asked why magnets repel each other. He basically says that “why” questions are much harder than they seem, because an explanation always depends on what you are allowed to take for granted. You can say someone went to the hospital because she slipped on the ice, but that only explains anything if the listener already knows what hospitals are, why broken hips are serious, how people call ambulances, and so on. Otherwise every answer just opens up the next why. Why is ice slippery? Why does pressure melt it? Why does water expand when it freezes? It keeps going. The nice part is where he says that explaining magnets by saying they are “like rubber bands” would be cheating. They are not rubber bands, and if the listener asked why rubber bands pull back together, you would eventually have to explain the same electrical forces you were trying to explain away. That is roughly how I feel about AI explanations too. Metaphors are fine, but only up to a point. Eventually you have to say what is actually going on.

    Of course Anthropic is not a neutral actor. They are a company with interests. But it is hard to get a really good understanding of these technologies while avoiding the people who understand them best. So I would not treat the articles as gospel, but I do think they should be evaluated on their own merits.

Leave A Reply