I was happy to see the second version of the OpenAI Model Spec released last week. Sharing my notes:
- One notable change is that each section is labeled with an authority level, from "platform" (can't be overridden by the user or developer) to "guideline" (can be easily overridden). This seems like a nice conceptual simplification of the notion of "defaults" in the previous version, unifying the authority levels of the spec itself with the levels of different messages.
- A couple lines are refreshingly honest. The objective "Maintain OpenAI's license to operate by protecting it from legal and reputational harm", and "[why chains of thought hidden] ... as well as for competitive reasons." This is the kind of thing that'd usually get watered down by comms/legal/policy teams at a typical company.
- The spec starts to cover a couple topics that weren't present before, such as multimodality (eg using accents, avoiding premature warnings) and agents (with a discussion of what it means for an agent to overstep when pursuing user-defined goals).
- There's a new untrusted_text feature, which presumably means there'll be an API feature for quoted text, where it's delimited by special tokens rather than leaving the developer to handle quoting and the model to interpret the quoting. This is useful for protecting against prompt injection.
- In a couple places, a point from the previous spec is derived more from first principles in this one. The most controversial part of the previous spec was "don't try to change anyone's mind", wrt users having false beliefs like "the earth is flat". Now this is justified as a special case of "highlight possible misalignments", following from "assume the user's long-term goals include learning, self-improvement, and truth-seeking".
- This is a subtle and debatable point, I like the emphasis on user freedom ("intellectual freedom" is used a few times), as opposed to more of a cost benefit analysis. It's like "maximize user freedom subject to constraints" as opposed to "do cost benefit analysis to enable beneficial use cases and prevent harmful ones". The latter would give the platform too much moral authority.
- Detail added in various places, e.g. more style guidelines about conversational behavior, more detail about privileged information in developer messages.
- Still no erotica
Feb 14, 2025 · 5:45 PM UTC
15
21
357
