Terminology in Federated Learning can be complex and context-specific. The glossary presents clear, concise definitions of key concepts and technical terms to ensure conceptual clarity and promote consistent understanding across disciplines.
| Term | Description | Category |
|---|---|---|
| Active Learning | Model selects most informative data points for labeling. | General ML |
| Adverse Event | Unintended medical occurrence during treatment or study. | Clinical/Healthcare |
| Algorithmic Fairness | Ensuring machine learning models avoid biased or discriminatory outcomes. | General ML |
| Alignment | Matching data or models to a reference standard. | Bioinformatics |
| Allele Frequency | Proportion of a specific allele among all alleles in a population. | Bioinformatics |
| Anonymization | Removing or masking personal identifiers from data. | Data & Privacy |
| Asynchronous Federated Learning | FL where clients update models at different times, unsynchronized. | Federated Learning |
| AutoFE in Federated Learning | Automated feature engineering adapted for federated learning settings. | Federated Learning |
| AutoML | Automated process of model selection, training, and tuning. | General ML |
| Batch Effect | Systematic differences between data batches, often in biomedical data. | Bioinformatics |
| Bias Mitigation | Techniques to reduce bias in machine learning models. | General ML |
| Bioinformatics | Application of computational tools to biological data. | Bioinformatics |
| Biomarker | Biological molecule indicating a process, condition, or disease. | Clinical/Healthcare |
| Blockchain in FL | Using blockchain for secure, transparent model updates in FL. | Federated Learning |
| Bootstrapping | Resampling technique for estimating statistics or model performance. | Analytics |
| BYOD (Bring Your Own Data) | Participants contribute their own data to collaborative analysis. | Data & Privacy |
| Byzantine-Robust Aggregation | Aggregation methods resilient to malicious or faulty clients in FL. | Federated Learning |
| Casemix | Measuring clinical activity based on patient characteristics for reimbursement. | Clinical/Healthcare |
| Centralized Learning | Model training with all data collected in one location. | General ML |
| ChIP-Seq | Technique to analyze protein interactions with DNA. | Bioinformatics |
| Client Clustering | Grouping clients with similar data distributions in FL to enhance performance. | Federated Learning |
| Clinical Decision Support System (CDSS) | System providing clinicians with knowledge to enhance patient care decisions. | Clinical/Healthcare |
| Clinical Trial | Research study to evaluate medical, surgical, or behavioral interventions. | Clinical/Healthcare |
| Cohort Study | Observational study following a group over time. | Clinical/Healthcare |
| Common Data Model (CDM) | Standardized structure for organizing data to facilitate sharing and analysis. | Data & Privacy |
| Communication-Efficient Algorithms | FL algorithms designed to minimize communication overhead. | Federated Learning |
| Consent Management | Handling patient permissions for data use and sharing. | Data & Privacy |
| Continuous Learning | Model updates as new data arrives, without retraining from scratch. | General ML |
| Cross-Silo Federated Learning | FL among organizations (e.g., hospitals) with large datasets. | Federated Learning |
| Cross-Validation | Splitting data into folds to assess model performance. | Analytics |
| Data Acquisition | Gathering data from various sources for analysis. | Data & Privacy |
| Data Anonymization | Removing identifiable information from datasets to protect privacy. | Data & Privacy |
| Data Augmentation | Creating new data samples by modifying existing ones. | General ML |
| Data Cleaning | Correcting or removing erroneous data to improve quality. | Data & Privacy |
| Data Dictionary | Descriptive list of data elements in a system or database. | Data & Privacy |
| Data Drift | Change in data distribution over time, affecting model performance. | Analytics |
| Data Federation | Sharing data from distributed sources without centralization. | Data & Privacy |
| Data Governance | Managing data availability, usability, integrity, and security. | Data & Privacy |
| Data Harmonization | Standardizing data from multiple sources to a common format. | Data & Privacy |
| Data Imputation | Filling in missing data values using statistical methods. | Analytics |
| Data Integration | Combining data from different sources into a unified view. | Data & Privacy |
| Data Lake | Centralized repository for storing raw, unstructured data. | Data & Privacy |
| Data Leakage | Unintended exposure of information from outside the training dataset. | Data & Privacy |
| Data Lineage | Tracking data origin and transformations throughout its lifecycle. | Data & Privacy |
| Data Minimization | Limiting data collection to only what is necessary. | Data & Privacy |
| Data Preprocessing | Preparing data for analysis through normalization and encoding. | Analytics |
| Data Provenance | Documentation of data origins and processing history. | Data & Privacy |
| Data Quality | Measure of data’s accuracy, completeness, and reliability. | Data & Privacy |
| Data Stewardship | Overseeing data assets to ensure quality and compliance. | Data & Privacy |
| Data Use Agreement (DUA) | Contract governing data sharing and usage between parties. | Data & Privacy |
| Data Wrangling | Cleaning and transforming raw data into a usable format. | Analytics |
| Deep Learning | Machine learning using neural networks with multiple layers. | General ML |
| De-identification | Removing or obscuring personal identifiers from data. | Data & Privacy |
| Descriptive Analytics | Examining data to understand past events and trends. | Analytics |
| Differential Expression | Identifying genes expressed differently between conditions. | Bioinformatics |
| Differential Privacy | Ensuring outputs do not reveal individual data points. | Security/Privacy |
| Digital Biomarker | Digital data indicating health status or disease progression. | Clinical/Healthcare |
| Digital Pathology | Analysis of digitized pathology slides using computational methods. | Clinical/Healthcare |
| Digital Twin | Virtual representation of a patient or system for simulation. | Clinical/Healthcare |
| Distributed Learning | Model training across multiple locations or devices. | General ML |
| DNA Sequencing | Determining the order of nucleotides in DNA. | Bioinformatics |
| Edge Computing in FL | Performing FL computations on edge devices to reduce latency. | Federated Learning |
| Electronic Health Record (EHR) | Digital record of a patient’s medical history. | Clinical/Healthcare |
| Electronic Medical Record (EMR) | Digital version of a patient’s paper chart. | Clinical/Healthcare |
| Ensemble Learning | Combining multiple models to improve prediction accuracy. | General ML |
| Ethics Board | Committee overseeing ethical aspects of research and data use. | Clinical/Healthcare |
| Exploratory Data Analysis (EDA) | Summarizing main characteristics of datasets through analysis. | Analytics |
| Explainable AI (XAI) | AI systems whose decisions can be understood by humans. | General ML |
| FAIR Principles | Guidelines for making data Findable, Accessible, Interoperable, Reusable. | Data & Privacy |
| Federated Analytics | Analyzing distributed data without moving or centralizing it. | Federated Learning |
| Federated Averaging (FedAvg) | FL algorithm averaging local model parameters for global updates. | Federated Learning |
| Federated Feature Engineering | Feature engineering in FL without sharing raw data. | Federated Learning |
| Federated Learning | ML training across decentralized devices without data exchange. | Federated Learning |
| Federated One-Shot Analysis | Single-round federated analysis without iterative communication. | Federated Learning |
| Federated Query | Querying distributed datasets without centralizing data. | Federated Learning |
| FedProx | FL algorithm improving performance on non-IID data. | Federated Learning |
| FHIR | Standard for electronic healthcare information exchange. | Clinical/Healthcare |
| Genotype | Genetic makeup of an organism. | Bioinformatics |
| Genome-Wide Association Study (GWAS) | Study associating genetic variants with traits or diseases. | Bioinformatics |
| Generalization | Model’s ability to perform well on unseen data. | General ML |
| Gradient Leakage | Attack reconstructing training data from shared gradients. | Security/Privacy |
| Health Information Exchange (HIE) | Electronic sharing of health-related information among organizations. | Clinical/Healthcare |
| HL7 | Standards for transferring clinical and administrative data. | Clinical/Healthcare |
| Homomorphic Encryption | Encryption allowing computations on encrypted data without decryption. | Security/Privacy |
| Horizontal Federated Learning | FL with same features but different samples across clients. | Federated Learning |
| Horizontally Partitioned Data | Data with different rows stored in different locations. | Data & Privacy |
| Hyperparameter | Parameter set before training, not learned from data. | General ML |
| ICD-10 | International classification system for diseases and health conditions. | Clinical/Healthcare |
| Imbalanced Data | Datasets where some classes are underrepresented. | Analytics |
| Informed Consent | Patient agreement for data use in research. | Clinical/Healthcare |
| Interoperability | Ability of systems to exchange and use information. | Data & Privacy |
| k-anonymity | Ensuring records are indistinguishable from at least k-1 others. | Security/Privacy |
| Key Performance Indicators (KPIs) | Metrics evaluating organizational or activity success. | Analytics |
| Label Noise | Incorrect or inconsistent labels in training data. | Analytics |
| Label Propagation | Spreading labels from labeled to unlabeled data points. | General ML |
| Latency | Delay between input and response in a system. | Analytics |
| Local Differential Privacy | Privacy protection applied at the data source before sharing. | Security/Privacy |
| Longitudinal Study | Research collecting data from the same subjects over time. | Clinical/Healthcare |
| Machine Learning | Enabling computers to learn from data without explicit programming. | General ML |
| Medical Imaging | Creating visual representations of the interior of a body. | Clinical/Healthcare |
| Membership Inference | Attacks to identify if data was used in training. | Security/Privacy |
| Meta-Learning in Federated Learning | Meta-learning for fast adaptation of FL global models. | Federated Learning |
| Metabolomics | Study of chemical processes involving metabolites. | Bioinformatics |
| Minimum Data Set | Smallest set of data elements for a specific purpose. | Data & Privacy |
| mHealth | Using mobile devices for medicine and public health. | Clinical/Healthcare |
| Model Compression | Reducing model size for efficiency. | General ML |
| Model Deployment | Integrating machine learning models into production environments. | General ML |
| Model Drift | Model performance degrades due to changing data. | Analytics |
| Model Evaluation | Assessing model performance using metrics like accuracy. | Analytics |
| Model Explainability | Ability to interpret and understand model predictions. | General ML |
| Model Personalization | Adapting FL global models to individual client data. | Federated Learning |
| Model Poisoning | Malicious client updates degrading FL global models. | Security/Privacy |
| Model Selection | Choosing the best machine learning model for a task. | General ML |
| Model Training | Teaching machine learning models using data. | General ML |
| Multi-Omics | Integrative analysis of multiple omics data types. | Bioinformatics |
| Multi-Task Learning | Training models on multiple related tasks simultaneously. | General ML |
| Neural Network | Computational model inspired by the human brain. | General ML |
| Next-Generation Sequencing (NGS) | High-throughput DNA sequencing technologies. | Bioinformatics |
| Non-IID Data | Data not independently and identically distributed across clients. | Federated Learning |
| OHDSI | Community developing standards for observational health data. | Clinical/Healthcare |
| Omics Data | Large-scale datasets from genomics, proteomics, etc. | Bioinformatics |
| One-Shot Federated Learning | FL training global model in a single communication round. | Federated Learning |
| Ontology | Structured vocabulary for a domain, enabling data integration. | Bioinformatics |
| Overfitting | Model learns training data too well, performs poorly on new data. | General ML |
| Patient Cohort | Group of patients sharing common characteristics. | Clinical/Healthcare |
| Patient Similarity Learning | Identifying similar patients for diagnosis or treatment planning. | Clinical/Healthcare |
| Pathology Informatics | Application of informatics in pathology for data management and analysis. | Clinical/Healthcare |
| Personal Health Record (PHR) | Health record managed and controlled by the patient. | Clinical/Healthcare |
| Personalized Federated Learning (PFL) | FL customizing models for each client’s data. | Federated Learning |
| Personally Identifiable Information (PII) | Data that can identify an individual. | Data & Privacy |
| Pharmacogenomics | Study of how genes affect drug response. | Bioinformatics |
| Phenotype | Observable characteristics of an organism. | Bioinformatics |
| Predictive Analytics | Predicting future events using data analysis. | Analytics |
| Prescriptive Analytics | Recommending actions for optimal outcomes using data. | Analytics |
| Privacy by Design | Incorporating privacy into system design from the start. | Security/Privacy |
| Privacy-Preserving Computation | Computations that protect private data. | Security/Privacy |
| Proteomics | Study of the structure and function of proteins. | Bioinformatics |
| Pseudonymization | Replacing private identifiers with fake identifiers. | Security/Privacy |
| Quality Assurance (QA) | Ensuring data and processes meet defined quality standards. | Analytics |
| Quality Control (QC) | Operational techniques to fulfill quality requirements. | Analytics |
| Real-World Data (RWD) | Data collected from routine clinical practice. | Clinical/Healthcare |
| Real-World Evidence (RWE) | Clinical evidence from real-world data analysis. | Clinical/Healthcare |
| Reproducibility | Ability to obtain consistent results using the same data and methods. | Analytics |
| Scaffold | FL algorithm reducing client drift using control variates. | Federated Learning |
| Secure Aggregation | Protocol ensuring only aggregated updates are visible to the server. | Security/Privacy |
| Secure Enclave | Hardware-based secure area for sensitive computations. | Security/Privacy |
| Secure Multi-Party Computation | Cryptographic protocol for private multi-party computations. | Security/Privacy |
| Semi-Supervised Learning | ML using both labeled and unlabeled data. | General ML |
| SHAP Values | Method for explaining individual model predictions. | Analytics |
| Single-Cell Analysis | Study of gene expression at the single-cell level. | Bioinformatics |
| SNOMED CT | Standardized clinical terminology for electronic health records. | Clinical/Healthcare |
| Synthetic Data | Artificially generated data resembling real data. | Data & Privacy |
| Supervised Learning | ML using labeled data to train models. | General ML |
| Swarm Learning | Decentralized ML using blockchain for coordination. | Federated Learning |
| Telemedicine | Remote diagnosis and treatment via telecommunications. | Clinical/Healthcare |
| Test Data | Dataset for evaluating trained model performance. | Analytics |
| Tokenization | Converting sensitive data into non-sensitive tokens. | Security/Privacy |
| Training Data | Dataset used to train machine learning models. | Analytics |
| Transfer Learning | Reusing a pre-trained model for a new task. | General ML |
| Transcriptomics | Study of RNA transcripts produced by the genome. | Bioinformatics |
| Trusted Execution Environment (TEE) | Secure area of a processor for sensitive computations. | Security/Privacy |
| Underfitting | Model too simple to capture data patterns. | General ML |
| Unsupervised Learning | ML finding patterns in unlabeled data. | General ML |
| Validation Data | Dataset for tuning hyperparameters to prevent overfitting. | Analytics |
| Variant Calling | Identifying genetic variants from sequence data. | Bioinformatics |
| Vertical Federated Learning | FL with different features for the same samples across clients. | Federated Learning |
| Vertically Partitioned Data | Data with different columns stored in different locations. | Data & Privacy |
| Zero-Knowledge Proof | Proving knowledge of information without revealing it. | Security/Privacy |