AI Terms Glossary

Artificial Intelligence (AI)

The simulation of human intelligence by machines, particularly computer systems, including learning, reasoning, and self-correction.

Machine Learning (ML)

A subset of AI that enables systems to learn and improve from experience without being explicitly programmed.

Deep Learning

A subset of machine learning based on artificial neural networks with multiple layers that can learn representations of data.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected nodes that process and transmit information.

Epoch

One complete pass through the entire training dataset during model training. Multiple epochs help the model learn patterns in the data.

Training

Batch Size

The number of training examples used in one iteration of model training. Larger batch sizes can lead to faster training but may require more memory.

Training

Learning Rate

A hyperparameter that controls how much to adjust the model in response to errors. Higher rates mean faster learning but potential instability.

Training

Gradient Descent

An optimization algorithm used to minimize the loss function by iteratively moving toward the minimum value.

Training

Large Language Model (LLM)

AI models trained on vast amounts of text data to understand and generate human-like text.

LLM

Transformer

A neural network architecture that uses self-attention mechanisms, forming the basis of modern LLMs.

LLM

Token

The basic unit of text that LLMs process, typically representing parts of words, words, or characters.

LLM

Context Window

The maximum number of tokens an LLM can process in a single forward pass, determining how much text it can "remember" and analyze at once.

LLM

Prompt Engineering

The practice of designing and optimizing input text to get desired outputs from language models.

LLM

Fine-tuning

The process of further training a pre-trained model on specific data to adapt it for particular tasks.

LLM

Graphics Processing Unit (GPU)

A specialized processor designed to accelerate graphics and parallel computing operations.

GPU

CUDA

NVIDIA's parallel computing platform and programming model for general computing on GPUs.

GPU

Tensor Core

Specialized cores in NVIDIA GPUs designed to accelerate matrix multiplication and convolution operations.

GPU

VRAM

Video Random Access Memory, the dedicated memory on a GPU used to store model weights, activations, and other data during processing.

GPU

TFLOPS

Trillion Floating Point Operations Per Second, a measure of computational performance particularly relevant for AI workloads.

GPU

Quantization

The process of reducing the precision of model weights and activations (e.g., from FP32 to INT8) to improve performance and reduce memory usage.

GPU

Mixed Precision Training

A technique that uses both FP32 and FP16 datatypes during training to reduce memory usage while maintaining model accuracy.

GPU

TPU

Tensor Processing Unit, Google's custom-developed ASIC for neural network machine learning.

Hardware

NVLink

NVIDIA's high-bandwidth GPU interconnect technology for multi-GPU systems, enabling faster data transfer between GPUs.

Hardware

PCIe

Peripheral Component Interconnect Express, the standard interface for connecting GPUs to the system.

Hardware

GPU Clustering

Connecting multiple GPUs together to work on a single task, enabling training of larger models or faster inference.

Hardware

PyTorch

An open-source machine learning library developed by Facebook's AI Research lab, popular for deep learning research and development.

Software

TensorFlow

An open-source machine learning framework developed by Google, widely used in production ML systems.

Software

Docker

A platform for developing, shipping, and running applications in containers, ensuring consistent environments across different systems.

Software

Kubernetes

An open-source container orchestration platform for automating deployment, scaling, and management of containerized applications.

Software

GPU Instance

A cloud computing instance equipped with one or more GPUs for accelerated computing.

Cloud

Spot Instance

Cloud instances available at a lower price but with potential interruption, useful for fault-tolerant workloads.

Cloud

Auto Scaling

Automatically adjusting computational resources based on demand, optimizing cost and performance.

Cloud

Model Serving

The process of making trained models available for inference through APIs or other interfaces.

Cloud