← jackdoe

List of 27 papers (supposedly) given to John Carmack by Ilya Sutskever: "If you really learn all of these, you’ll know 90% of what matters today.", taken from keshavchan 1787861946173186062 and arc folder D0472A20-9C20-4D3F-B145-D2865C0A9FEE.

It is not confirmed if this is the actual list, but I think its a good list anyway, so decided to save it in case it gets lost in the dead internet.

The Annotated Transformer (nlp.seas.harvard.edu)
The First Law of Complexodynamics (scottaaronson.blog)
The Unreasonable Effectiveness of RNNs (karpathy.github.io)
Understanding LSTM Networks (colah.github.io)
Recurrent Neural Network Regularization (arxiv.org)
Keeping Neural Networks Simple by Minimizing the Description Length of the Weights (cs.toronto.edu)
Pointer Networks (arxiv.org)
ImageNet Classification with Deep CNNs (proceedings.neurips.cc)
Order Matters: Sequence to sequence for sets (arxiv.org)
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (arxiv.org)
Deep Residual Learning for Image Recognition (arxiv.org)
Multi-Scale Context Aggregation by Dilated Convolutions (arxiv.org)
Neural Quantum Chemistry (arxiv.org)
Attention Is All You Need (arxiv.org)
Neural Machine Translation by Jointly Learning to Align and Translate (arxiv.org)
Identity Mappings in Deep Residual Networks (arxiv.org)
A Simple NN Module for Relational Reasoning (arxiv.org)
Variational Lossy Autoencoder (arxiv.org)
Relational RNNs (arxiv.org)
Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton (arxiv.org)
Neural Turing Machines (arxiv.org)
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin (arxiv.org)
Scaling Laws for Neural LMs (arxiv.org)
A Tutorial Introduction to the Minimum Description Length Principle (arxiv.org)
Machine Super Intelligence Dissertation (vetta.org)
PAGE 434 onwards: Komogrov Complexity (lirmm.fr)
CS231n Convolutional Neural Networks for Visual Recognition (cs231n.github.io)