sequenceDiagram participant Trainer participant Model participant Dataset participant LossFunction participant Optimizer Note over Trainer: Initialization Phase Trainer->>Model: Load model\nInitialize weights Trainer->>Dataset: Prepare dataset Note right of Dataset: Sequences of tokens (Numerical IDs) Labels (next token/class) loop Until Converged Note over Trainer: Mini-batch Sampling Trainer->>Dataset: Sample random batch Dataset->>Trainer: Batch inputs and labels Note over Trainer: Forward Pass Trainer->>Model: Predict with batch inputs Model->>Trainer: Output logits Note over Trainer: Loss Calculation Trainer->>LossFunction: Compute loss (logits + labels) LossFunction->>Trainer: Loss value Note over Trainer: Backward Pass Trainer->>Model: Backpropagate loss Note right of Model: Compute gradients using chain rule Note over Trainer: Weight Update Trainer->>Optimizer: Update parameters (learning rate) Optimizer->>Model: Adjust weights end