LLMs generate text by predicting the next word using attention to capture context and MLP layers to store learned patterns. Mechanistic interpretability shows these models build circuits of attention and features, and tools like sparse autoencoders and attribution graphs help unpack superposition, revealing how tasks are actually computed.