The Foundations of Artificial Intelligence: A 30 Day Crash Course

Day 1 What It Means for a System to Learn

We cover the origins of artificial intelligence as a field, the early symbolic vision of intelligence, and the later shift toward systems that learn from data. We'll talk about the difference between programmed behavior and learned behavior, training versus inference, memorization versus generalization, and narrow AI versus general intelligence.

Day 2 The Stack and the Team: Models, Training, Inference, Roles, and Responsibilities

Since we'll already know where AI started, this day jumps forward to modern times, to see what AI currently looks like in practice. We'll look at the full architecture that exists around production systems, including data pipelines, training infrastructure, model artifacts, inference services, APIs, application layers, monitoring, and deployment environments. Then we'll cover the different types of Engineers and engineering-adjacent roles on a production team, and how each of them contribute to the final product.

Day 3 Statistical Prediction and Generalization

This day is all about the act of prediction under uncertainty. How we use samples to reason about larger populations, and the statistical foundations that make learning from examples possible. We'll be talking about probability distributions, sampling variation, estimation, uncertainty, generalization, distribution shift, and more.

Day 4 Data, Labels, Leakage, and Dataset Design

On this day we'll dive into how exactly you turn raw records into labeled examples, how you define targets, how you manage noisy or ambiguous labels, and how you prevent leakage between training and evaluation data. We'll talk about sampling bias, class imbalance, weak supervision, train-validation-test splitting, dataset provenance, documentation, and versioning.

Day 5 Loss Functions, Metrics, and Model Evaluation

Model evaluation is the discipline of deciding whether a trained system is useful, reliable, and appropriate for its intended task. On this day we'll talk about the distinction between loss functions and evaluation metrics, regression and classification measures, threshold selection, calibration, cross-validation, benchmarking, subgroup analysis, and how to choose good metrics.

Day 6 The Mathematics of Optimization

Optimization is, in essence, the mathematical problem of choosing parameters that make an objective function as small or large as possible, subject to the shape of the search space and any constraints on valid solutions. So this is our day to flesh out how all that works at the lowest level: parameter spaces, feasible regions, local and global minima, convexity, curvature, gradients, Hessians, regularized objectives, constrained optimization, and more.

Day 7 Gradient Descent and Learning by Iterative Improvement

Gradient descent is the engine of iterative learning: parameters are adjusted step by step using local gradient information, rather than being solved for in one pass. This is our day to talk about batch-, stochastic-, and mini-batch updates, learning rates, epochs, convergence and divergence, noisy gradients, momentum, adaptive methods, learning-rate schedules, early stopping, numerical precision, and the practical structure of a training loop.

Day 8 Supervised Learning: Classification and Regression

Supervised learning is the problem of determining a mapping from labeled examples to future predictions. On this day we'll talk about classification and regression, hard labels versus soft labels, scores versus predicted outputs, decision boundaries, separability, probabilistic prediction, nearest-neighbor methods, Naive Bayes, support vector machines, baselines, and structured prediction.

Day 9 Overfitting, Regularization, and Generalization

Overfitting and underfitting are failures of generalization. One model learns too much of the training data’s accidental structure, while another fails to learn enough useful structure. We'll flesh out those ideas on this day. We'll talk about model capacity, training-validation-test errors, learning curves, validation curves, bias and variance, hyperparameter tuning, regularization, spurious correlations, shortcut learning, domain shift, and the limits of validation-based model selection.

Day 10 Feature Engineering and Data Preparation

The transformation of raw inputs into clean, consistent, usable representations. We'll be talking about data-type inspection, missing-value handling, outlier treatment, scaling, encoding, feature crossing, binning, time-based features, aggregate features, feature selection processes, preprocessing pipelines, fit-transform separation, training-serving consistency, and feature stores.

Day 11 Linear Models and Logistic Regression

Linear models are predictive systems built from weighted feature combinations. Linear regression is used for continuous values and logistic regression is used for probabilities and classification. To wrap our heads around how these work we'll need to dive into coefficients, intercepts, residuals, least-squares fitting, logits, sigmoid and softmax transformations, linear decision boundaries, multicollinearity, coefficient interpretation, and more.

Day 12 Decision Trees, Random Forests, and Boosting

Decision trees are models that make predictions by recursively splitting data into increasingly specific regions. Tree ensembles are methods that combine many trees to improve stability and performance. On this day we'll talk about splitting criteria, pruning, nonlinear interactions, bagging, random forests, feature subsampling, out-of-bag evaluation, feature importance, weak learners, AdaBoost, gradient boosting, and the tradeoff between interpretability and ensemble power.

Day 13 Unsupervised Learning: Clustering and Dimensionality Reduction

Unsupervised learning is the search for structure in data that doesn't have target labels. On this day we'll talk about what similarity is, what distance metrics are, how clustering objectives work, how k-means works, and what hierarchical clustering is. We'll also discuss DBSCAN, Gaussian mixture models, anomaly detection, PCA, manifold learning, t-SNE, UMAP, latent structure, feature discovery, and more.

Day 14 Reinforcement Learning and Reward Systems

In reinforcement learning your goal is to train systems that can choose actions through rewards, delayed consequences, and interaction with an environment. To understand that, we'll need to dive into states, actions, policies, trajectories, returns, Markov decision processes, value functions, Q-values, exploration, exploitation, credit assignment, reward design, model-free learning, model-based learning, simulation, and the practical risks of reward hacking.

Day 15 Anatomy of a Neural Network

ANNs are layered computational systems built from artificial neurons, weights, biases, activation functions, and composed transformations. On this day we'll talk about what a layer is, what input-layers, hidden-layers, and output layers are. We'll talk about forward propagation, nonlinear activations, depth, width, output heads, parameter count, capacity, core components, and why initialization affects how signals move through a network.

Day 16 Training & Backpropagation

We'll talk about neural-network training as a two-phase computation: a forward pass that computes predictions and loss, and a backward pass that propagates gradients through the computational graph to update weights. We'll talk about computational graphs, automatic differentiation, loss gradients, activation gradients, weight gradients, optimizer state, mini-batches, vanishing and exploding gradients, normalization, residual connections, checkpointing, diagnostics, and training instability, among other things.

Day 17 Deep Learning and Representation Learning

Networks aren't just functions. They don't merely map inputs to outputs, they also learn intermediate feature spaces that make complex prediction tasks easier. Deep learning is what happens when those intermediate layers start to grow in complexity and number. On this day we'll talk about all the things that become possible when that occurs: hierarchical features, latent representations, end-to-end learning, transfer learning, pretrained models, autoencoders, contrastive learning, self-supervised learning, multitask learning, representation probing, and representation collapse.

Day 18 Graphs and Graph Neural Networks

Graphs are a way to model entities and relationships, from social networks to molecules to citation networks. On this day we'll walk through how they're constructed, and how the models that interpret them operate. We'll cover nodes, edges, adjacency, neighborhoods, paths, homophily, heterophily, node classification, link prediction, graph classification, graph embeddings, message passing, graph convolution, GraphSAGE, attention over neighbors, and the limits of deep GNNs.

Day 19 Computer Vision

The problem of turning pixel arrays into useful knowledge. We'll talk about image tensors, color channels, resolution, edges, corners, texture, shape, illumination, viewpoint, occlusion, image classification, localization, scene understanding, image datasets, labeling, preprocessing, augmentation, normalization, and why visual recognition is generally so difficult for machines.

Day 20 Convolutional Neural Networks

These are image-focused architectures built around local connectivity, shared filters, feature maps, pooling, receptive fields, and spatial hierarchies. On this day we'll talk about what convolution is and how it works. As well as kernels, stride, padding, channel depth, translation equivariance, and both the classic CNN and the residual architectures.

Day 21 Object Detection and Segmentation

Think of these tasks as the shift from recognizing what is in an image to identifying where objects are and which pixels belong to them. We'll talk about bounding boxes, class labels, confidence scores, IoU, anchor boxes, region proposals, one-stage and two-stage detectors, non-maximum suppression, semantic segmentation, instance segmentation, panoptic segmentation, mask prediction, evaluation metrics, annotation cost, and real-time constraints.

Day 22 GANs, Diffusion Models and Synthetic Media

Generative modeling is, broadly speaking, the process of sampling new images, audio, video, or other media from learned data distributions. On this day we'll talk about how that works. Latent spaces, autoencoders, variational autoencoders, GANs, adversarial training, mode collapse, conditional generation, image-to-image translation, diffusion, denoising, guidance, and text-to-image systems. We'll also touch on deepfakes, watermarking, and synthetic media detection.

Day 23 Natural Language Processing

Think of NLP as the problem of converting human language into computational representations that can be searched, classified, tagged, translated, summarized, or generated. To get how they work we'll dive into ideas like corpora, subwords, tokenization, vocabulary construction, syntax, semantics, pragmatics, part-of-speech tagging, named entity recognition, sentiment analysis, information extraction, machine translation, summarization, ambiguity, and context dependence.

Day 24 Sequence Models and Recurrent Neural Networks

Sequence models are, generally speaking, systems that are designed for ordered data, where each input depends on position, history, and/or temporal context. To get how they work we'll need to talk about time steps, hidden state, recurrence, sequence-to-one and sequence-to-sequence tasks, recurrent neural networks, backpropagation through time and temporal dependencies. At that point we'll have the foundation to understand the precursors to LLMs: LSTMs, GRUs, and encoder-decoder models. Then we'll get on to teacher forcing, exposure bias, time-series prediction, speech modeling, text modeling before transformers, and the scaling limits of recurrence.

Day 25 Embeddings and Semantic Representations

Embeddings are dense vector representations that place words, sentences, documents, images, or any other inputs into spaces where their similarity to each other can be computed. On this day we'll talk about how that works: embedding tables, semantic similarity, cosine distance, sparse versus dense representations, distributional semantics, Word2Vec, GloVe, contextual embeddings, user-item embeddings, multimodal embeddings, shared spaces, embedding dimensionality, bias, evaluation, and retrieval infrastructure.

Day 26 Attention Mechanisms and Transformers

Attention mechanisms are a way for models to choose which parts of a sequence or source representation matter for each computation. Transformers are the architecture that made attention their central organizing principle. On this day we’ll break down how they both work: queries, keys, values, attention scores, attention weights, self-attention, cross-attention, multi-head attention, positional encoding, transformer encoder and decoder blocks, residual pathways, normalization, masked attention, causal attention, context windows, parallel sequence processing, and the computational cost of all of it.

Day 27 Large Language Models

We'll talk about them at a high level then zoom in on how exactly they work and how they're built: next-token prediction, in-context learning, prompting, instruction tuning, supervised fine-tuning, pretraining corpora, scaling model laws, preference data, RLHF, HITL, chat formatting, context windows, compression, quantization and more.

Day 28 Retrieval-Augmented Generation

RAG is a design pattern that connects language models to external documents, databases, and knowledge sources at inference time. On this day we'll explain how that works: ingestion, parsing, chunking, metadata, keyword retrieval, dense retrieval, hybrid search, vector databases, query rewriting, reranking, context assembly, grounding, citations, freshness, access control, failure modes, and more.

Day 29 Co-Pilots & Autonomous Agents

These are systems, often grounded in LLMs, that move beyond single-turn responses into tool use, action selection, planning loops, memory, orchestration, and monitored execution. On this day, we'll talk about how all of that actually works: function calling, tool schemas, observation-action cycles, task decomposition, ReAct, Toolformer, MRKL systems, human approval gates, multi-agent role specialization, browser agents, code agents, robotic agents, reliability failures, error accumulation, monitoring, and intervention.

Day 30 The Four Pillars of AI Safety: Alignment, Robustness, Transparency and Accountability

AI safety is the discipline of reducing harmful behavior in deployed systems through alignment. To understand how we'll start by talking about the technical side: specification failures, misaligned objectives, reward misspecification, adversarial examples, jailbreaks, prompt injection, data poisoning, hallucination, explainability, feature attribution, model cards, and dataset cards. But we'll end with the human and policy perspectives: bias, privacy leakage, oversight, governance, risk management, incident reporting, and the limits of our current safety methods.

Menu

The Foundations of Artificial Intelligence

About This Course

Class Schedule / Curriculum