AI Terms Glossary
Artificial Intelligence (AI)
The simulation of human intelligence by machines, particularly computer systems, including learning, reasoning, and self-correction.
Machine Learning (ML)
A subset of AI that enables systems to learn and improve from experience without being explicitly programmed.
Deep Learning
A subset of machine learning based on artificial neural networks with multiple layers that can learn representations of data.
Neural Network
A computing system inspired by biological neural networks, consisting of interconnected nodes that process and transmit information.
Epoch
One complete pass through the entire training dataset during model training. Multiple epochs help the model learn patterns in the data.
Batch Size
The number of training examples used in one iteration of model training. Larger batch sizes can lead to faster training but may require more memory.
Learning Rate
A hyperparameter that controls how much to adjust the model in response to errors. Higher rates mean faster learning but potential instability.
Gradient Descent
An optimization algorithm used to minimize the loss function by iteratively moving toward the minimum value.
Large Language Model (LLM)
AI models trained on vast amounts of text data to understand and generate human-like text.
Transformer
A neural network architecture that uses self-attention mechanisms, forming the basis of modern LLMs.
Token
The basic unit of text that LLMs process, typically representing parts of words, words, or characters.
Context Window
The maximum number of tokens an LLM can process in a single forward pass, determining how much text it can "remember" and analyze at once.
Prompt Engineering
The practice of designing and optimizing input text to get desired outputs from language models.
Fine-tuning
The process of further training a pre-trained model on specific data to adapt it for particular tasks.
Graphics Processing Unit (GPU)
A specialized processor designed to accelerate graphics and parallel computing operations.
CUDA
NVIDIA's parallel computing platform and programming model for general computing on GPUs.
Tensor Core
Specialized cores in NVIDIA GPUs designed to accelerate matrix multiplication and convolution operations.
VRAM
Video Random Access Memory, the dedicated memory on a GPU used to store model weights, activations, and other data during processing.
TFLOPS
Trillion Floating Point Operations Per Second, a measure of computational performance particularly relevant for AI workloads.
Quantization
The process of reducing the precision of model weights and activations (e.g., from FP32 to INT8) to improve performance and reduce memory usage.
Mixed Precision Training
A technique that uses both FP32 and FP16 datatypes during training to reduce memory usage while maintaining model accuracy.
TPU
Tensor Processing Unit, Google's custom-developed ASIC for neural network machine learning.
NVLink
NVIDIA's high-bandwidth GPU interconnect technology for multi-GPU systems, enabling faster data transfer between GPUs.
PCIe
Peripheral Component Interconnect Express, the standard interface for connecting GPUs to the system.
GPU Clustering
Connecting multiple GPUs together to work on a single task, enabling training of larger models or faster inference.
PyTorch
An open-source machine learning library developed by Facebook's AI Research lab, popular for deep learning research and development.
TensorFlow
An open-source machine learning framework developed by Google, widely used in production ML systems.
Docker
A platform for developing, shipping, and running applications in containers, ensuring consistent environments across different systems.
Kubernetes
An open-source container orchestration platform for automating deployment, scaling, and management of containerized applications.
GPU Instance
A cloud computing instance equipped with one or more GPUs for accelerated computing.
Spot Instance
Cloud instances available at a lower price but with potential interruption, useful for fault-tolerant workloads.
Auto Scaling
Automatically adjusting computational resources based on demand, optimizing cost and performance.
Model Serving
The process of making trained models available for inference through APIs or other interfaces.