Transformers

TLDR

Aaaaand, now for something more practical 😂.

"Auto-regressive decoder-only" GPTs like ChatGPT take text input and give you text output. We pre-train them on the internet to get a model good at text completion, e.g. Llama-2-7B on HuggingFace. If you prompted such a model with "What is the diameter of the Earth?" it might respond like "What is the diameter of Mars?". Then we train them with SFT and RLHF to get good at dialog, this is like Llama-2-7B-Instruct. Such a model would instead tell you the diameter of the earth. Usually, but not always, that is what we want. (For a decent number of research tasks, like data cleaning, the non-instruct can be easier.)

Most commercial LLMs have a system prompt added by the operator that further tunes the model’s behavior. You saw this with the Google image snafu. They put in a really strong system prompt that said "make sure your images are diverse" and the model followed that with embarrassing results.

Getting what you want out of an LLM is called "prompt engineering". There’s a little nuance! Basically, if you craft your text in a certain way, you can get LLMs to give you what you want. This is sometimes called "jail breaking" the LLM. E.g. I hope to show you in class that if I ask ChatGPT to tell me how to cheat on my teaching reviews, it will refuse. But, if I tell it I'm writing a fictional story about a professor who cheats on his teaching reviews, it will gladly tell me how he does it.

Some advice on prompting

Above all, know this: the machine cannot think without writing. Writing and thinking are identical to the machine. To the extent you provide short prompts and ask for short responses you will receive "less thoughtful" answers. You need to work the machine into a propitious hidden state with your dialog. Once you get it in that state, it will give you answers you like.

There are two classic ways to get more thoughtful answers (longer stretches of text before <EOS> is sampled and the text returns). First, you can say something like "explain your answer". Second, you can say "think step-by-step". This latter is called "chain of thought" prompting. Even better than that, you can say "let’s think step by step, I think the first step is ….now you tell me the second step" and you thereby enter into a slow dialog. At the end of the dialog, if you need something you can copy/paste, you can say "summarize the above". You can even say "summarize the above and format it in LaTeX" or something.

Have the LLM take on a role. E.g. "you are an excellent statistics teacher at Yale and you explain things in a lucid manner for precocious kids". You will then receive back answers in the LLM’s imagining of that role.

Provide context and be specific. There’s a lot of context that humans around you have, which helps them seem impressive. E.g. if you ask me a question, I know that you are an expert in econ/polisci/soc, you’re a smart professor at Yale, you’re a certain age and all of that goes in to making my response. Take a look at my interaction with ChatGPT here I tell it "I'm a faculty person at the Yale School of Management. A few of us will appear in our annual talent show to raise funds for students. I need to write substitute lyrics to the 1978 song YMCA by the Village People. You will help me. Please acknowledge." The machine acknowledges, then we get on with it.

Have the machine generate knowledge to work it into a good hidden state ("generated knowledge prompting"). You can do this in a few ways. 1) As you can see in that YMCA chat, I asked ChatGPT to do some web research on Yale SOM. That info is now in its context window. 2) You can just ask the LLM to tell you what it knows, then move on. E.g. "I am making a quiz about regression discontinuity. Give me a summary of this technique." Then, after the response, you say "now write me five multiple choice questions…". Again, this moves the hidden state to where you want it. 3) Obviously, you can paste in context. It’s good to put it in a code fence: ``` (see below) You can even set a "persistent" context in ChatGPT’s web interface, that’s like "I am a professor of economics and I use a mac. I know a lot about programming in Julia and prefer my code examples in that language." This will get prepended after the system prompt to your requests.

Use Markdown syntax to help the machine understand code-related stuff. Here you can see me using backticks to indicate variables/code and triple backticks to indicate code blocks. I told the machine my data are in JSON format with ```json. (You can see there how to deal with imperfect responses.) Here’s a similar dialog (was not super specific here, but could have been if the response came back unsatisfactory).

Give the machine examples of what you want ("few shot prompting").

Be specific. Don’t just say "summarize this email", say "summarize this email. In particular tell me, 1) main themes; 2) follow-up items; 3) upcoming events"

Iterate, don’t argue with the machine. If it gets off track, go back to where it was last on track and restart from there.

Kyle's example code

None this time 😔.

Advanced reading

Intro to Large Language Models, Andrej Kaparthy https://www.youtube.com/watch?v=zjkBMFhNj_g. Super good!
Prompt Design and Engineering: Introduction and Advanced Methods, Xavier Amatriain https://arxiv.org/abs/2401.14423
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine. https://arxiv.org/abs/2311.16452 This paper shows that how you prompt makes a huge difference. They came up with some prompting strategies for ChatGPT-4 that led to it performing better than Med-PaLM-2, a model that is fine-tuned for medical purposes.
God Help Us, Let's Try To Understand AI Monosemanticity - https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand A good article on LLM interpretability, following up on our discussion today.
Borges and AI (Thanks Dennis!) - https://arxiv.org/abs/2310.01425 - will AI kill us? I, for one, welcome our new digital overlords.
RoFormer: Enhanced Transformer with Rotary Position Embedding http://arxiv.org/abs/2104.09864 - this is the new default positional encoding technique
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints – introduces Grouped Query Attention, which is popular. http://arxiv.org/abs/2305.13245
LLM in a flash: Efficient Large Language Model Inference with Limited Memory - http://arxiv.org/abs/2312.11514 Introduces Flash Attention. Not sure if that works with GQA…popular.
Textbooks Are All You Need - http://arxiv.org/abs/2306.11644 Introduces Phi model, which is trained on textbooks and ends up being super good

Transformers

TLDR

Some advice on prompting

Kyle's example code

Further reading

Advanced reading