Challenge

Frameworks like Hugging Face hide the 'magic'. I wanted to peel back the layers and build everything myself.

Solution

A transparent library of model implementations focused on readability and mathematical correctness.

File Explorer
backprop
model.py
train.py
data.py
shared
config.py
model.py
1import torch.nn as nn
2
3class MLP(nn.Module):
4 """Multi-layer perceptron for regression."""
5 def __init__(self, input_dim: int, hidden_dims: list[int], dropout: float):
6 super().__init__()
7 layers = []
8 in_dim = input_dim
9 for hidden_dim in hidden_dims:
10 layers.append(nn.Linear(in_dim, hidden_dim))
11 layers.append(nn.ReLU())
12 layers.append(nn.Dropout(p=dropout))
13 in_dim = hidden_dim
14 layers.append(nn.Linear(in_dim, 1))
15 self.network = nn.Sequential(*layers)
16
17def initialize_weights(model: MLP, strategy: str):
18 for module in model.modules():
19 if isinstance(module, nn.Linear):
20 if strategy == "kaiming":
21 nn.init.kaiming_normal_(module.weight, nonlinearity="relu")
22 elif strategy == "xavier":
23 nn.init.xavier_uniform_(module.weight)
24 elif strategy == "normal":
25 nn.init.normal_(module.weight, mean=0.0, std=0.01)
26 if module.bias is not None:
27 nn.init.zeros_(module.bias)
Console
Initializing PyTorch DL environment...
Found device: CPU (Simulated)
Ready.

MLP & Weight Init

My first dive into building training loops from scratch to see how initialization affects convergence.

Learner Insight

I learned that Kaiming init is critical for ReLU—otherwise, gradients die early.

Project Architecture

Dataset

California Housing

Parameters

18,305

Training

< 5 min

Core Concept

Weight Initialization

Core Algorithms

  • AdamW
  • Cosine LR w/ Warmup
  • Kaiming Init
  • Gradient Accumulation