By end of 2024, steering foundation models in latent/activation space will outperform steering in token space ("prompt eng") in several large production deployments.
I felt skeptical about this in summer '23, felt vaguely positive in Jan, and now think it's more likely than not, and I'm more optimistic than ever about the direction of my work since 2022 around a joint exploration of interfaces and latent visualization/steering of foundation models. Anthro's published work today is a milestone in a steady march toward this future that started in mid 2023.
Anthropic and DM's leadership in this area, combined with lots of community efforts in work like steering vectors, better sparse autoencoders, various image editing UI prototypes all push us toward the future here, but once technical foundations are there, interface will be much more obviously a bottleneck to utility, alignment, and capability.
I'm very excited about the interface possibilities this will open up, particularly for multimodal models and creative use cases. For a moment I thought it was possible that dialogue may eat everything. I don't think so anymore. We'll see new universes of possibilities in both.
(And if frontier labs don't have serious interface research bets today, this would probably be a good time to reconsider it :-)