RustGPT: A pure-Rust transformer LLM built from scratch

RustGPT: Transformer-Based Large Language Model in Pure Rust RustGPT is a complete implementation of a transformer-based Large Language Model (LLM) written entirely in Rust. It is designed from scratch without depending on external machine learning frameworks, using only the ndarray crate for matrix operations. --- Overview Language: 100% Rust Architecture: Transformer-based LLM Key features: Pre-training, instruction tuning, interactive chat mode, full backpropagation with gradient clipping. No external ML dependencies like PyTorch or TensorFlow. Focus on clarity, modular design, and educational value. --- What This Is This project demonstrates building a transformer language model from scratch with: Pre-training on factual text for foundational world knowledge. Instruction tuning for conversational AI behavior. Interactive chat mode for testing the trained model. Full forward and backward passes including training logic. Modular architecture for clean separation of concerns. --- Key Files to Explore src/main.rs Contains the training pipeline, data preparation, and interactive mode logic. src/llm.rs Core LLM implementation including forward and backward passes, training, and architecture. --- Architecture General workflow of data in the model: Utilizes 3 transformer blocks featuring attention and feed-forward networks. Custom tokenization with punctuation handling. Greedy decoding for text generation. Gradient clipping applied for training stability. --- Project Structure --- What The Model Learns Two phases of training: Pre-training (100 epochs) Learns basic factual knowledge to model language and world facts. Example facts: "The sun rises in the east," "Water flows downhill due to gravity." Instruction Tuning (100 epochs) Learns conversational behaviors and patterns. Example dialogues: Handles common conversational phrases such as greetings and explanations. --- Quick Start --- Interactive Mode Example --- Technical Implementation Model Configuration Vocabulary size: dynamic, built from training data at runtime. Embedding dimension: 128 Hidden dimension: 256 Max input sequence length: 80