Transformers can crack cryptographic primitives like Pseudo-random number sequences. We explain how!
ICML 2025Transformers can solve unseen problems by looking at in-context examples. We uncover the underlying mechanisms!
NeruIPS 2024 (oral)Deep learning models often acquire abilities with steep jumps, called grokking. We explain how to this occurs in modular arithmetic tasks!
ICLR 2024 ICLR 2024 BGPTDeep learning models can only be trained if initialized preperly. We explore how to design architectures and initialize parameters for better training.
NeurIPS 2023 (spotlight) AutoInitEmail: darshil.h.doshi@gmail.com