Untitled Document

U1_Lesson3_Input & Training Data.

Purpose: Students will explore the process of recognizing and categorizing words and then comparing it to how AI systems achieve similar tasks.

Vocabulary: Large Language Models, Prompt, Tokens

Activity: OpenAI Language Model Tokenization

Open AI has a website that lets you preview how it interprets words and converts them into tokens: https://platform.openai.com/tokenizer. This can be a helpful visualization of how words are broken apart and interpreted by large language models. Most words will likely be interpreted as single words, but more complex words can be interpreted as two tokens instead of one.

Activity: Training Data... (Handout)

Using the Handout, students will help train a Large Language Model to notice words that are more common in different situations.