sequenceDiagram
participant Trainer
participant Model
participant Dataset
participant LossFunction
participant Optimizer
Note over Trainer: Initialization Phase
Trainer->>Model: Load model\nInitialize weights
Trainer->>Dataset: Prepare dataset
Note right of Dataset: Sequences of tokens (Numerical IDs) Labels (next token/class)
loop Until Converged
Note over Trainer: Mini-batch Sampling
Trainer->>Dataset: Sample random batch
Dataset->>Trainer: Batch inputs and labels
Note over Trainer: Forward Pass
Trainer->>Model: Predict with batch inputs
Model->>Trainer: Output logits
Note over Trainer: Loss Calculation
Trainer->>LossFunction: Compute loss (logits + labels)
LossFunction->>Trainer: Loss value
Note over Trainer: Backward Pass
Trainer->>Model: Backpropagate loss
Note right of Model: Compute gradients using chain rule
Note over Trainer: Weight Update
Trainer->>Optimizer: Update parameters (learning rate)
Optimizer->>Model: Adjust weights
end