Offers machine learning primitives for building complex neural networks. Features activation functions, layers, optimizers, and training methods, serving as educational resources for optimization and visualization techniques.
This project started as a small attempt to recreate some machine learning primitives from
scratch in Kotlin.
I quickly realized while doing research that I could fairly easily build a small set of primitives
to build larger components and scale up complexity rapidly. At the same time that I was
learning about different optimization and visualization techniques for my own use, I realized
they would make fantastic learning tools for others as well.
Normalizes each input by the root-mean-squared across all inputs
LayerNorm†
† - Planned/Experimental
classDiagram
class Tensor {
dimensions: List~Int~
get(indices: List~Int~): Double
}
class ActivationFunction {
calculate(value: Double): Double
}
class Layer {
weights: Tensor
biases: Tensor
activation: ActivationFunction
}
Layer *--> Tensor
Layer *--> ActivationFunction
note for Network "A neural network composed of neurons."
class Network
class SequentialNetwork {
layers: List~Layer~
}
SequentialNetwork *--> "1..*" Layer
Network <|-- SequentialNetwork
class Loss {
calculate(expected: Tensor, actual: Tensor): Double
}
note for Optimizer "An optimizer is responsible for <br>adjusting the weights and biases <br>of a layer based on the error <br>gradient."
class Optimizer {
batch()
update()
}
class EpochStatistics {
onEpochStart()
onEpochEnd()
}
class BatchStatistics {
onBatchStart()
onBatchEnd()
}
class SampleStatistics {
onSampleStart()
onSampleEnd()
}
class LayerStatistics {
onLayerStart()
onLayerEnd()
}
class GradientDescent
GradientDescent --|> Optimizer
GradientDescent --|> SampleStatistics
class StochasticGradientDescent
StochasticGradientDescent --|> Optimizer
StochasticGradientDescent --|> BatchStatistics
class Adam
Adam --|> Optimizer
Adam --|> BatchStatistics
note for StudyCase "A study case associates an <br>input with an expected output."
class StudyCase {
input: Tensor
output: Tensor
}
note for SequentialTrainer "A trainer presents cases to <br>a network and tracks gradients <br>for back-propagation."
class SequentialTrainer {
network: SequentialNetwork
cases: StudyCase
lossFunction: Loss
optimizer: Optimizer
}
SequentialTrainer *--> SequentialNetwork
SequentialTrainer *--> "1.." StudyCase
SequentialTrainer *--> Loss
SequentialTrainer *--> Optimizer
class LearningTensor
class SimpleTensor
Tensor <|-- LearningTensor
Tensor <|-- SimpleTensor
class ReLU
class Sigmoid
class Linear
ActivationFunction <|-- ReLU
ActivationFunction <|-- Sigmoid
ActivationFunction <|-- Linear
class MeanSquaredError {
}
Loss <|-- MeanSquaredError
class StochasticGradientDescent {
}
Optimizer <|-- StochasticGradientDescent
Project Structure
The structure of this project is based on Clean Architecture applied to Android's MVVM architecture.
UI and Data implementations occupy the outermost frameworks/drivers layer. ViewModels,
Repository Implementations, Data Source interfaces and occupy the adapter/interfaces layer.
Interactors and Repository interfaces occupy the application business layer.
Entities occupy the enterprise business layer.
graph
subgraph apps
:app-android --> :app-shared
:app-desktop --> :app-shared
end
apps --> features
subgraph features
:feature-training --> :feature-simulation
:feature-simulation-training --> :feature-simulation
:feature-simulation-training --> :feature-training
:feature-training
:feature-evolution --> :feature-simulation
end
features --> core
subgraph core
:core-ui --> :core-viewmodels --> :core-interactors --> :core-data --> :core-entities --> :core-common
end
Research Citations
Layers
Dense
Linear (Fully Connected)
Convolutional†
Networks
Sequential Networks
Networks composed entirely of layers, each receiving a single input from the previous layer
Residual Networks
Layer-based networks that allow skip connections
Cost/Loss Functions
Mean Squared Error
Cross Entropy Loss
Converts the output predictions to
Optimizers
Gradient Descent
Computes gradients across all samples before updating parameters.
Stochastic (Gradient Descent)
Computes gradients across stochastically selected samples before updating parameters.
Adam†
Momentum-based; performs multiple passes over samples and parameter updates in a single epoch
Training
Sequential Trainer
Trainer for a Sequential Network
AutoDiff Trainer
Trainer for any network that produces a computation graph.
Organic Trainer†
Trainer that modifies a network according to statistics; mimics neurogenesis and ablation at alternating stages
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton. Layer Normalization. arXiv:1607.06450v1, 2016.
Tianqi Chen, Ian Goodfellow, Jonathon Shlens. Net2Net: Accelerating Learning Via Knowledge Transfer. arXiv:1511.05641v4, 2016.
David Ha, Jürgen Schmidhuber. World Models. arXiv:1803.10122v4, 2018.
Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yanzhi Li, Shean Wang, Lu Wang, Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685v2, 2021.
Diederik P. Kingma, Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980v9, 2017.
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. Visualizing the Loss Landscape of Neural Nets. arXiv:1712.09913v3, 2018.
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever. Improving Language Understanding by Generative Pre-Training. arXiv:1810.04805v1, 2018.
Ravid Schwartz-Ziv, Naftali Tishby. Opening the Black Box of Deep Neural Networks via Information. arXiv preprint arXiv:1703.00810v3, 2017.
Sathya Krishnan Suresh, Shunmugapriya P. Towards Smaller, Faster Decoder-Only Transformers: Architectural Variants and Their Implications. arXiv:2404.14462v4, 2024.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention is All You Need. arXiv preprint arXiv:IDvN, 2017.