Publications - Robert Hoehndorf

LEP-AD: language embedding of proteins and attention to drugs predicts drug-target interactions

Alsulami, Reem, Lehmann, Robert, Daga, Anuj, Khan, Sumeer A., Grünberg, Raik, Abogosh, Ahmed, Cabrero, David Gomez, Arold, Stefan T., Hoehndorf, Robert, Tegner, Jesper and Kiani, Narsis A.

Journal of Cheminformatics (2026)

Biomedical informatics

INTRODUCTION: Predicting drug-target interactions remains a significant challenge in drug development and lead optimization. Recent advances have leveraged machine learning algorithms to model drug-target interactions from molecular and sequence data. MATERIALS AND METHODS: In this work, we use Evolutionary Scale Modeling (ESM-3) to construct a transformer-based protein language representation for drug-target interaction prediction. We introduce LEP-AD (Language Embedding of Proteins and Attention to Drugs), a modular architecture that combines pretrained protein language models with graph-based molecular encoders to predict binding affinity values. RESULTS: We systematically benchmark LEP-AD alongside a range of established deep learning methods across multiple datasets-Davis, KIBA, DTC, Metz, ToxCast, and STITCH. To assess predictive validity, we compare model-derived rankings of drug-target interactions with experimental results reported in the literature. In addition, we perform new experimental assays to evaluate the binding of three ATP-competitive Src kinase inhibitors-Dasatinib, UM-164, and Saracatinib-where experimentally measured IC₅₀ and pKᵢ values are consistent with the predicted rankings. CONCLUSION: In summary, our benchmark highlights the strengths and limitations of current drug-target interaction models across diverse datasets and evaluation settings. The results emphasize the impact of pretrained protein and molecular representations on predictive performance and illustrate the persistent challenges of generalization, while the modular LEP-AD framework provides a flexible reference point for comparative evaluation. SCIENTIFIC CONTRIBUTION: This study presents LEP-AD, a modular deep learning framework for drug-target interaction prediction that integrates pretrained protein language representations with graph-based molecular encoders. Beyond introducing the architecture, we provide a systematic benchmark under similarity-aware evaluation settings and experimental validation, highlighting the impact of pretrained protein embeddings on predictive behavior across diverse datasets.

@article{Alsulami2026,
  abstract = {{INTRODUCTION: Predicting drug-target interactions remains a significant challenge in drug development and lead optimization. Recent advances have leveraged machine learning algorithms to model drug-target interactions from molecular and sequence data. MATERIALS AND METHODS: In this work, we use Evolutionary Scale Modeling (ESM-3) to construct a transformer-based protein language representation for drug-target interaction prediction. We introduce LEP-AD (Language Embedding of Proteins and Attention to Drugs), a modular architecture that combines pretrained protein language models with graph-based molecular encoders to predict binding affinity values. RESULTS: We systematically benchmark LEP-AD alongside a range of established deep learning methods across multiple datasets-Davis, KIBA, DTC, Metz, ToxCast, and STITCH. To assess predictive validity, we compare model-derived rankings of drug-target interactions with experimental results reported in the literature. In addition, we perform new experimental assays to evaluate the binding of three ATP-competitive Src kinase inhibitors-Dasatinib, UM-164, and Saracatinib-where experimentally measured IC₅₀ and pKᵢ values are consistent with the predicted rankings. CONCLUSION: In summary, our benchmark highlights the strengths and limitations of current drug-target interaction models across diverse datasets and evaluation settings. The results emphasize the impact of pretrained protein and molecular representations on predictive performance and illustrate the persistent challenges of generalization, while the modular LEP-AD framework provides a flexible reference point for comparative evaluation. SCIENTIFIC CONTRIBUTION: This study presents LEP-AD, a modular deep learning framework for drug-target interaction prediction that integrates pretrained protein language representations with graph-based molecular encoders. Beyond introducing the architecture, we provide a systematic benchmark under similarity-aware evaluation settings and experimental validation, highlighting the impact of pretrained protein embeddings on predictive behavior across diverse datasets.}},
  author = {Alsulami,  Reem and Lehmann,  Robert and Daga,  Anuj and Khan,  Sumeer A. and Gr\"{u}nberg,  Raik and Abogosh,  Ahmed and Cabrero,  David Gomez and Arold,  Stefan T. and Hoehndorf,  Robert and Tegner,  Jesper and Kiani,  Narsis A.},
  doi = {10.1186/s13321-026-01167-9},
  issn = {1758-2946},
  journal = {Journal of Cheminformatics},
  month = {April},
  publisher = {Springer Science and Business Media LLC},
  title = {LEP-AD: language embedding of proteins and attention to drugs predicts drug-target interactions},
  url = {http://dx.doi.org/10.1186/s13321-026-01167-9},
  year = {2026}
}

Molecular basis and cellular effects of Janus-class–driven cytoplasmic PYK2 coacervates

Colombo, Giovanni, Salem, Israa, Szczepski, Kacper, Yu, Piao, Alfaiyz, Shaden, Guzmán-Vega, Francisco Javier, Abogosh, Ahmed, Kulmanov, Maxat, Al-Harthi, Samah, Kadaré, Gress, Hoehndorf, Robert, Girault, Jean-Antoine, Jaremko, Łukasz, Momin, Afaque A. and Arold, Stefan T.

Communications Biology (2026)

Drug mechanismsBioengineering

@article{Colombo2026,
  abstract = {{Kinase activity is increasingly linked to biomolecular phase separation. Focal adhesion kinase (FAK) forms membrane-associated condensates with paxillin to promote adhesion. Here we show that its paralogue, proline-rich tyrosine kinase 2 (PYK2), undergoes phase separation via a distinct mechanism. PYK2 forms cytoplasmic condensates primarily driven by its kinase-FAT linker (KFL) region. Overexpression of PYK2 induces condensates enriched in its autophosphorylated form, which sequester paxillin from focal adhesions and impair cell adhesion. We uncover an autoregulatory mechanism involving the KFL, linking self-association, autophosphorylation, and condensation. Uncommon among known phase separation drivers, KFL condensation is phosphorylation-independent and its sequence belongs to the "Janus" class. Using a transformer-based protein language model, we identified non-homologous sequences with similar features, many from adhesion and cytoskeletal regulators. We validated the phase-separating potential of several of these sequences in cells. These findings reveal a mechanism linking phase separation with kinase activation, and demonstrate distinct condensation behavior in homologs. Our results also highlight how protein concentration modulates condensate function, with implications for disease, and expand the landscape of phase separation drivers.}},
  author = {Colombo,  Giovanni and Salem,  Israa and Szczepski,  Kacper and Yu,  Piao and Alfaiyz,  Shaden and Guzmán-Vega,  Francisco Javier and Abogosh,  Ahmed and Kulmanov,  Maxat and Al-Harthi,  Samah and Kadaré,  Gress and Hoehndorf,  Robert and Girault,  Jean-Antoine and Jaremko,  Łukasz and Momin,  Afaque A. and Arold,  Stefan T.},
  doi = {10.1038/s42003-025-09463-0},
  issn = {2399-3642},
  journal = {Communications Biology},
  month = {January},
  publisher = {Springer Science and Business Media LLC},
  title = {Molecular basis and cellular effects of Janus-class–driven cytoplasmic PYK2 coacervates},
  url = {http://dx.doi.org/10.1038/s42003-025-09463-0},
  year = {2026}
}

VarLand: A pipeline to map the structural landscape of missense variants at the proteome scale

Guzmán-Vega, Francisco J., Cardona-Londoño, Kelly J., González-Álvarez, Ana C., Peña-Guerra, Karla A., Althagafi, Azza, Khan, Tanisha, Hoehndorf, Robert and Arold, Stefan T.

Journal of Biological Chemistry, vol. 302(2), pp. 111071 (2026)

GenomicsBiomedical informatics

@article{GuzmnVega2026,
  abstract = {{Missense variant pathogenicity often arises from disruptions to protein structural features. The integration of large-scale genetic sequencing into clinical workflows, and the availability of accurate artificial intelligence-based protein structure predictions present an opportunity to assess the structure-function relationship of missense variants at a population scale. To harness this potential, we developed VarLand, a computational pipeline that extracts 29 structural and biophysical features from AlphaFold-predicted protein models and nine complementary annotation tools. We applied VarLand to pathogenic missense variants from ClinVar and a population-specific dataset of rare Middle Eastern variants, comparing their feature profiles to high-frequency benign variants from the Genome Aggregation Database (gnomAD). Our analysis confirms that pathogenic variants are significantly enriched in ordered regions, buried residues, and sites with high intramolecular contact density, whereas benign variants preferentially occur in disordered, solvent-exposed regions. However, VarLand also uncovered feature landscape variations across protein functional classes and disease categories, suggesting differences in underlying disease mechanisms. Furthermore, variants from the artificial intelligence-based AlphaMissense database showed a stronger association between structural order and pathogenicity than clinical datasets, indicating residual bias from structure-centric training. These findings demonstrate the effectiveness of multidimensional structural profiling by VarLand to uncover not only broad structure-pathogenicity relationships but also dataset-specific and class-specific deviations, offering deeper insight into disease mechanisms.}},
  author = {Guzmán-Vega,  Francisco J. and Cardona-Londoño,  Kelly J. and González-Álvarez,  Ana C. and Peña-Guerra,  Karla A. and Althagafi,  Azza and Khan,  Tanisha and Hoehndorf,  Robert and Arold,  Stefan T.},
  doi = {10.1016/j.jbc.2025.111071},
  issn = {0021-9258},
  journal = {Journal of Biological Chemistry},
  month = {February},
  number = {2},
  pages = {111071},
  publisher = {Elsevier BV},
  title = {VarLand: A pipeline to map the structural landscape of missense variants at the proteome scale},
  url = {http://dx.doi.org/10.1016/j.jbc.2025.111071},
  volume = {302},
  year = {2026}
}

DELE: Deductive EL++ Embeddings for Knowledge Base Completion

Mashkova, Olga, Zhapa-Camacho, Fernando and Hoehndorf, Robert

Neurosymbolic Artificial Intelligence, vol. 2 (2026)

Biomedical informatics

@article{Mashkova2026,
  abstract = {{Ontology embeddings map classes, roles, and individuals in ontologies into    R  n   , and within    R  n   similarity between entities can be computed or new axioms inferred. For ontologies in the Description Logic    E L   + +    , several optimization-based embedding methods have been developed that explicitly generate models of an ontology. However, these methods suffer from some limitations; they do not distinguish between statements that are unprovable and provably false, and therefore they may use entailed statements as negatives. Furthermore, they do not utilize the deductive closure of an ontology to identify statements that are inferred but not asserted. We evaluated a set of embedding methods for    E L   + +    ontologies, incorporating several modifications that aim to make use of the ontology deductive closure. In particular, we designed novel negative losses that account both for the deductive closure and different types of negatives and formulated evaluation methods for knowledge base completion. We demonstrate that our embedding methods improve over the baseline ontology embedding in the task of knowledge base or ontology completion.}},
  author = {Mashkova,  Olga and Zhapa-Camacho,  Fernando and Hoehndorf,  Robert},
  doi = {10.1177/29498732261420011},
  issn = {2949-8732},
  journal = {Neurosymbolic Artificial Intelligence},
  month = {January},
  publisher = {SAGE Publications},
  title = {DELE: Deductive EL++ Embeddings for Knowledge Base Completion},
  url = {http://dx.doi.org/10.1177/29498732261420011},
  volume = {2},
  year = {2026}
}

INDIGENA: inductive prediction of disease–gene associations using phenotype ontologies

Zhapa-Camacho, Fernando and Hoehndorf, Robert

Bioinformatics (2026)

Disease geneticsNeuro-symbolic AIBioinformatics

@article{ZhapaCamacho2026indigena,
  abstract = {{MOTIVATION: Predicting gene-disease associations (GDAs) can be framed as a ranking problem where genes are ranked for a query disease based on features such as phenotypic similarity. By describing phenotypes using phenotype ontologies, ontology-based semantic similarity measures can be used. However, traditional semantic similarity measures use only the ontology taxonomy. Recent methods based on ontology embeddings compare phenotypes in latent space; these methods can use all ontology axioms as well as a supervised signal, but are inherently transductive, i.e., query entities must already be known at the time of learning embeddings, and therefore these methods do not generalize to novel diseases (sets of phenotypes) at inference time. RESULTS: We developed INDIGENA, an inductive disease-gene association method for ranking genes based on a set of phenotypes. Our method first uses a graph projection to map axioms from phenotype ontologies to a graph structure, and then uses graph embeddings to create latent representations of phenotypes. We use an explicit aggregation strategy to combine phenotype embeddings into representations of genes or diseases, allowing us to generalize to novel sets of phenotypes. We also develop a method to make the phenotype embeddings and the similarity measure task-specific by including a supervised signal from known gene-disease associations. We apply our method to mouse models of human disease and demonstrate that we can significantly improve over the inductive semantic similarity baseline measures, and reach a performance similar to transductive methods for predicting gene-disease associations while being more general. AVAILABILITY AND IMPLEMENTATION: https://github.com/bio-ontology-research-group/indigena.}},
  author = {Zhapa-Camacho,  Fernando and Hoehndorf,  Robert},
  doi = {10.1093/bioinformatics/btag325},
  issn = {1367-4811},
  journal = {Bioinformatics},
  month = {May},
  publisher = {Oxford University Press (OUP)},
  title = {{INDIGENA: inductive prediction of disease–gene associations using phenotype ontologies}},
  url = {https://doi.org/10.1093/bioinformatics/btag325},
  year = {2026}
}

SIDEKICK: A Semantically Integrated Resource for Drug Effects, Indications, and Contraindications

Ashhad, Mohammad, Mashkova, Olga, Henao, Ricardo and Hoehndorf, Robert

The Semantic Web -- ESWC 2026, pp. 253-276 (2026)

Biomedical informatics

@inbook{Ashhad2026,
  abstract = {{Pharmacovigilance and clinical decision support systems utilize structured drug safety data to guide medical practice. However, existing datasets frequently depend on terminologies such as MedDRA, which limits their semantic reasoning capabilities and their interoperability with Semantic Web ontologies and knowledge graphs. To address this gap, we developed SIDEKICK, a knowledge graph that standardizes drug indications, contraindications, and adverse reactions from FDA Structured Product Labels. We developed and used a workflow based on Large Language Model (LLM) extraction and Graph-Retrieval Augmented Generation (Graph RAG) for ontology mapping. We processed over 50,000 drug labels and mapped terms to the Human Phenotype Ontology (HPO), the MONDO Disease Ontology, and RxNorm. Our semantically integrated resource outperforms the SIDER and ONSIDES databases when applied to the task of drug repurposing by side effect similarity. We serialized the dataset as a Resource Description Framework (RDF) graph and employed the Semanticscience Integrated Ontology (SIO) as upper level ontology to further improve interoperability. Consequently, SIDEKICK enables automated safety surveillance and phenotype-based similarity analysis for drug repurposing.}},
  author = {Ashhad,  Mohammad and Mashkova,  Olga and Henao,  Ricardo and Hoehndorf,  Robert},
  booktitle = {The Semantic Web -- {ESWC} 2026},
  doi = {10.1007/978-3-032-25159-6_14},
  isbn = {9783032251596},
  issn = {1611-3349},
  pages = {253--276},
  publisher = {Springer Nature Switzerland},
  title = {{SIDEKICK}: A Semantically Integrated Resource for Drug Effects, Indications, and Contraindications},
  url = {http://dx.doi.org/10.1007/978-3-032-25159-6_14},
  year = {2026}
}

Robust Knowledge Graph Embedding via Denoising

Song, Tengwei, Ma, Xudong, Liu, Yang, Luo, Jie and Hoehndorf, Robert

The Semantic Web -- ESWC 2026, pp. 417-435 (2026)

Biomedical informatics

@inbook{Song2026,
  abstract = {{Knowledge graph embedding models have achieved remarkable success in link prediction and reasoning tasks, yet they remain highly vulnerable to perturbations in the embedding space. Such perturbations, whether introduced by noisy triples, representation drift or adversarial manipulation, can lead to severe degradation in prediction stability and significantly affect downstream multi-hop reasoning processes. To address this challenge, we propose a unified robustness enhancement framework named Robust Knowledge Graph Embedding via Denoising. The framework explicitly incorporates denoising as an auxiliary learning signal and views knowledge graph embedding models as energy-based systems, allowing us to exploit the theoretical connection between denoising objectives and score matching. This enables the model to learn stable gradients with respect to perturbed representations and improves resilience against embedding-level noise. In addition, we introduce certified robustness metrics for knowledge graph embedding based on randomized smoothing, offering a principled way to measure the certified radius within which model predictions remain unchanged. Extensive experiments on widely used benchmark datasets demonstrate that the proposed framework consistently improves both predictive performance and robustness across various categories of knowledge graph embedding models. The results further show that our method is effective under substantial perturbations and offers meaningful gains in multi-hop reasoning scenarios, highlighting its potential as a general robustness enhancement strategy for knowledge graph representation learning. Our code is available at https://github.com/tewiSong/RKGE .}},
  author = {Song,  Tengwei and Ma,  Xudong and Liu,  Yang and Luo,  Jie and Hoehndorf,  Robert},
  booktitle = {The Semantic Web -- {ESWC} 2026},
  doi = {10.1007/978-3-032-25156-5_22},
  isbn = {9783032251565},
  issn = {1611-3349},
  pages = {417--435},
  publisher = {Springer Nature Switzerland},
  title = {Robust Knowledge Graph Embedding via Denoising},
  url = {http://dx.doi.org/10.1007/978-3-032-25156-5_22},
  year = {2026}
}

Fully Geometric Multi-hop Reasoning on Knowledge Graphs with Transitive Relations

Zhapa-Camacho, Fernando and Hoehndorf, Robert

The Semantic Web -- ESWC 2026, pp. 258-277 (2026)

Biomedical informatics

@inbook{ZhapaCamacho2026,
  abstract = {{Multi-hop logical reasoning on knowledge graphs requires faithfully mapping the logical semantics to latent space. Current geometric embedding methods show to be useful on this task by mapping entities to geometric regions and logical operations to latent transformations. While a geometric embedding can provide a direct interpretability framework for query answering, current methods have only leveraged the geometric construction of entities, failing to map logical operations to pure geometric transformations and, instead, using neural components to learn these operations. On the other hand, purely neural-based methods outperform geometric methods, but they lack interpretability in the latent space. We introduce GeometrE, a geometric embedding method for multi-hop reasoning, that maps every logical operation to a purely geometric operation in the latent space. Additionally, we introduce a transitive loss function and show that, unlike existing methods, it can preserve the logical rule \(\forall a,b,c: r(a,b) \wedge r(b,c) \rightarrow r(a,c)\) . Our experiments show that GeometrE outperforms current state-of-the-art geometric methods and remains competitive with existing neural-based methods on standard benchmark datasets.}},
  author = {Zhapa-Camacho,  Fernando and Hoehndorf,  Robert},
  booktitle = {The Semantic Web -- {ESWC} 2026},
  doi = {10.1007/978-3-032-25156-5_14},
  isbn = {9783032251565},
  issn = {1611-3349},
  pages = {258--277},
  publisher = {Springer Nature Switzerland},
  title = {Fully Geometric Multi-hop Reasoning on Knowledge Graphs with Transitive Relations},
  url = {http://dx.doi.org/10.1007/978-3-032-25156-5_14},
  year = {2026}
}

Genomic diversity and antimicrobial resistance of Staphylococcus aureus in Saudi Arabia: a nationwide study using whole-genome sequencing

Alarawi, Mohammed S., Altammami, Musaad, Abutarboush, Mohammed, Kulmanov, Maxat, Alkuraithy, Dalal M., Kafkas, Senay, Radley, Robert, Abdelhakim, Marwa, Aldakhil, Hind, Bawazeer, Reema A., Alolayan, Mohammed A., Alnafjan, Basel M., Huraysi, Abdulaziz A., Almaabadi, Amani, Suliman, Bandar A., Aljohani, Areej G., Hemeg, Hassan A., Almogbel, Mohammed S., Alazmi, Meshari, Bazaid, Abdulrahman S., Abujamel, Turki S., Hashem, Anwar M., Al-Zahrani, Ibrahim A., Abdoh, Mohammed S., Hobani, Haya I., Felemban, Rakan F., Alhazmi, Wafaa A., Hong, Pei-Ying, Alghoribi, Majed F., Aljohani, Sameera, Balkhy, Hanan, Alswaji, Abdulrahman, Alzayer, Maha, Alalwan, Bassam, Kaaki, Mai M., Hala, Sharif M., Fallatah, Omniya Ahmad, Bahitham, Wesam, Zakri, Samer, Alshehri, Mohammad A., Kameli, Nader, Algaissi, Abdullah, Alamer, Edrous, Alhazmi, Abdulaziz, Shajri, Amjad A., Darraj, Majid Ahmed, Kameli, Bandar, Sufyani, O. O., Rahama, Badreldin S., Bakr, Abrar A., Alhoshani, Fahad M., Alquait, Azzam A., Somily, Ali M., Albarrag, Ahmed M., Alosaimi, Lamia, Aldakeel, Sumayh A., Bahwerth, Fayez S., Khan, Mushtaq A., Abdelrahman, Tamir T., Fanning, Séamus, Tawfik, Essam A., Alyamani, Essam J., Gojobori, Takashi, Miyazaki, Satoru, Al-Fageeh, Mohammed B. and Hoehndorf, Robert

Microbial Genomics, vol. 11(11) (2025)

GenomicsBiomedical informatics

@article{Alarawi2025,
  abstract = {{Methicillin-resistant Staphylococcus aureus (MRSA) surveillance in regions with mass gatherings presents unique challenges for public health systems. Saudi Arabia, hosting millions of pilgrims annually, provides a distinctive setting for studying how human mobility shapes bacterial populations, yet comprehensive genomic surveillance data from this region remain limited. Here, we present an integrated analysis of S. aureus isolates collected across seven Saudi Arabian regions, combining whole-genome sequencing with extensive antimicrobial susceptibility testing and standardized metadata following findability, accessibility, interoperability and reusability data principles. Our analysis revealed striking differences between pilgrimage and non-pilgrimage cities. Pilgrimage cities showed significantly higher genetic diversity and antimicrobial resistance rates, harbouring numerous international strains, including recognized clones from diverse geographic origins. Reported lineage dynamics are changing, expanding toward community clones. While genomic prediction of antimicrobial resistance showed high accuracy for some antibiotics, particularly beta-lactams, with varying performance for others, it highlights the necessity for phenotypic testing in clinical settings. Our findings demonstrate how mass gatherings drive bacterial population structures and emphasize the importance of integrated surveillance approaches in regions with significant global connectivity and travel.}},
  author = {Alarawi,  Mohammed S. and Altammami,  Musaad and Abutarboush,  Mohammed and Kulmanov,  Maxat and Alkuraithy,  Dalal M. and Kafkas,  Senay and Radley,  Robert and Abdelhakim,  Marwa and Aldakhil,  Hind and Bawazeer,  Reema A. and Alolayan,  Mohammed A. and Alnafjan,  Basel M. and Huraysi,  Abdulaziz A. and Almaabadi,  Amani and Suliman,  Bandar A. and Aljohani,  Areej G. and Hemeg,  Hassan A. and Almogbel,  Mohammed S. and Alazmi,  Meshari and Bazaid,  Abdulrahman S. and Abujamel,  Turki S. and Hashem,  Anwar M. and Al-Zahrani,  Ibrahim A. and Abdoh,  Mohammed S. and Hobani,  Haya I. and Felemban,  Rakan F. and Alhazmi,  Wafaa A. and Hong,  Pei-Ying and Alghoribi,  Majed F. and Aljohani,  Sameera and Balkhy,  Hanan and Alswaji,  Abdulrahman and Alzayer,  Maha and Alalwan,  Bassam and Kaaki,  Mai M. and Hala,  Sharif M. and Fallatah,  Omniya Ahmad and Bahitham,  Wesam and Zakri,  Samer and Alshehri,  Mohammad A. and Kameli,  Nader and Algaissi,  Abdullah and Alamer,  Edrous and Alhazmi,  Abdulaziz and Shajri,  Amjad A. and Darraj,  Majid Ahmed and Kameli,  Bandar and Sufyani,  O. O. and Rahama,  Badreldin S. and Bakr,  Abrar A. and Alhoshani,  Fahad M. and Alquait,  Azzam A. and Somily,  Ali M. and Albarrag,  Ahmed M. and Alosaimi,  Lamia and Aldakeel,  Sumayh A. and Bahwerth,  Fayez S. and Khan,  Mushtaq A. and Abdelrahman,  Tamir T. and Fanning,  Séamus and Tawfik,  Essam A. and Alyamani,  Essam J. and Gojobori,  Takashi and Miyazaki,  Satoru and Al-Fageeh,  Mohammed B. and Hoehndorf,  Robert},
  doi = {10.1099/mgen.0.001540},
  issn = {2057-5858},
  journal = {Microbial Genomics},
  month = {November},
  number = {11},
  publisher = {Microbiology Society},
  title = {Genomic diversity and antimicrobial resistance of Staphylococcus aureus in Saudi Arabia: a nationwide study using whole-genome sequencing},
  url = {http://dx.doi.org/10.1099/mgen.0.001540},
  volume = {11},
  year = {2025}
}

CAGI6 ID panel challenge: assessment of phenotype and variant predictions in 415 children with neurodevelopmental disorders (NDDs)

Aspromonte, Maria Cristina, Del Conte, Alessio, Zhu, Shaowen, Tan, Wuwei, Shen, Yang, Zhang, Yexian, Li, Qi, Wang, Maggie Haitian, Babbi, Giulia, Bovo, Samuele, Martelli, Pier Luigi, Casadio, Rita, Althagafi, Azza, Toonsi, Sumyyah, Kulmanov, Maxat, Hoehndorf, Robert, Katsonis, Panagiotis, Williams, Amanda, Lichtarge, Olivier, Xian, Su, Surento, Wesley, Pejaver, Vikas, Mooney, Sean D., Sunderam, Uma, Srinivasan, Rajgopal, Murgia, Alessandra, Piovesan, Damiano, Tosatto, Silvio C. E. and Leonardi, Emanuela

Human Genetics, vol. 144(2–3), pp. 227–242 (2025)

Rare diseasePhenotype informatics

@article{Aspromonte2025,
  abstract = {{Abstract
The Genetics of Neurodevelopmental Disorders Lab in Padua provided a new intellectual disability (ID) Panel challenge for computational methods to predict patient phenotypes and their causal variants in the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6). Eight research teams submitted a total of 30 models to predict phenotypes based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. Here, we assess the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and their causal variants. We also evaluated predictions for possible genetic causes in patients without a clear genetic diagnosis. Like the previous ID Panel challenge in CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (Pathogenic/Likely Pathogenic, Variants of Uncertain Significance and Risk Factors) were provided. The phenotypic traits and variant data of 150 patients from the CAGI5 ID Panel Challenge were provided as training set for predictors. The CAGI6 challenge confirms CAGI5 results that predicting phenotypes from gene panel data is highly challenging, with AUC values close to random, and no method able to predict relevant variants with both high accuracy and precision. However, a significant improvement is noted for the best method, with recall increasing from 66% to 82%. Several groups also successfully predicted difficult-to-detect variants, emphasizing the importance of variants initially excluded by the Padua NDD Lab.}},
  author = {Aspromonte,  Maria Cristina and Del Conte,  Alessio and Zhu,  Shaowen and Tan,  Wuwei and Shen,  Yang and Zhang,  Yexian and Li,  Qi and Wang,  Maggie Haitian and Babbi,  Giulia and Bovo,  Samuele and Martelli,  Pier Luigi and Casadio,  Rita and Althagafi,  Azza and Toonsi,  Sumyyah and Kulmanov,  Maxat and Hoehndorf,  Robert and Katsonis,  Panagiotis and Williams,  Amanda and Lichtarge,  Olivier and Xian,  Su and Surento,  Wesley and Pejaver,  Vikas and Mooney,  Sean D. and Sunderam,  Uma and Srinivasan,  Rajgopal and Murgia,  Alessandra and Piovesan,  Damiano and Tosatto,  Silvio C. E. and Leonardi,  Emanuela},
  doi = {10.1007/s00439-024-02722-w},
  issn = {1432-1203},
  journal = {Human Genetics},
  month = {January},
  number = {2–3},
  pages = {227–242},
  publisher = {Springer Science and Business Media LLC},
  title = {CAGI6 ID panel challenge: assessment of phenotype and variant predictions in 415 children with neurodevelopmental disorders (NDDs)},
  url = {http://dx.doi.org/10.1007/s00439-024-02722-w},
  volume = {144},
  year = {2025}
}

Whole genome transcriptomic profiling reveals distinct sex-specific responses to heat stroke

Bouchama, Abderrezak, Gomez, Maria, Abdullah, Mashan L., Al Mahri, Saeed, Malik, Shuja Shafi, Yezli, Saber, Mohammad, Sameer, Lehe, Cynthia, Abuyassin, Bisher and Hoehndorf, Robert

Journal of Applied Physiology, vol. 138(4), pp. 964–978 (2025)

GenomicsBioengineering

Heat-related mortality remains health challenges exacerbated by climate change, with sex-based differences in outcomes, yet underlying mechanisms remain poorly understood. This study examined transcriptomic responses to heat exposure in peripheral blood mononuclear cells from 19 patients with heat stroke (HS; 8 males, mean age 64.8 ± 6.6 yr; 11 females, mean age 49.7 ± 11 yr) and 19 controls (11 males, mean age 48.9 ± 9.6 yr; 8 females, mean age 44.9 ± 11.8 yr). At admission, gene expression revealed upregulation of heat shock protein genes, and pathway analysis demonstrated activation of heat shock and unfolded protein responses across both sexes consistent with proteotoxic stress. However, distinct metabolic, oxidative stress, cell cycle control, and immune responses were observed within each sex. Females displayed inhibition of protein synthesis, oxidative phosphorylation, and metabolic pathways, including glucose metabolism, indicative of a hypometabolic state. Males maintained metabolic activity precooling and enhanced adenosine triphosphate production postcooling. Females activated nuclear factor erythroid 2-related factor 2 (NRF2)-mediated oxidative stress responses and inhibited DNA replication and mitosis, potentially mitigating genomic instability, whereas these pathways showed limited regulation in males. Females promoted innate immunity via interleukin (IL)-6, inflammasome, and triggering receptor expressed on myeloid cells 1 (TREM1) signaling, whereas males showed suppression of both innate and adaptive immunity, including IL-12, Th1, and T-cell receptor pathways. Upstream analysis identified over 100 transcription factors in both sexes. Males primarily relied on transcriptional mechanisms, whereas females also exhibited translational regulation via La ribonucleoprotein 1 (LARP1), fragile X messenger ribonucleoprotein 1 (FMR1), insulin-like growth factor 2 mRNA binding protein 1 (IGF2BP1), and eukaryotic translation initiation factor 6 (EIF6). These findings suggest distinct, sex-specific molecular adaptations to heat stroke, underscoring the need for targeted therapeutic strategies to mitigate heat-induced morbidity and mortality.NEW & NOTEWORTHY Heat-related mortality continues to rise with climate change. Our transcriptomic analysis reveals distinct sex-specific metabolic strategies to heat stroke: females enter a hypometabolic state, an evolutionary adaptation that conserves energy, whereas males sustain metabolic activity. Transcription factors and a subset of translation regulators in females modulate proteostasis and bioenergetics, driving these sex-specific pathways. These novel findings highlight the critical need to consider sex-specific differences in heat-related illnesses and inform carefully targeted interventions to improve patient outcomes.

@article{Bouchama2025,
  abstract = {{Heat-related mortality remains health challenges exacerbated by climate change, with sex-based differences in outcomes, yet underlying mechanisms remain poorly understood. This study examined transcriptomic responses to heat exposure in peripheral blood mononuclear cells from 19 patients with heat stroke (HS; 8 males, mean age 64.8 ± 6.6 yr; 11 females, mean age 49.7 ± 11 yr) and 19 controls (11 males, mean age 48.9 ± 9.6 yr; 8 females, mean age 44.9 ± 11.8 yr). At admission, gene expression revealed upregulation of heat shock protein genes, and pathway analysis demonstrated activation of heat shock and unfolded protein responses across both sexes consistent with proteotoxic stress. However, distinct metabolic, oxidative stress, cell cycle control, and immune responses were observed within each sex. Females displayed inhibition of protein synthesis, oxidative phosphorylation, and metabolic pathways, including glucose metabolism, indicative of a hypometabolic state. Males maintained metabolic activity precooling and enhanced adenosine triphosphate production postcooling. Females activated nuclear factor erythroid 2-related factor 2 (NRF2)-mediated oxidative stress responses and inhibited DNA replication and mitosis, potentially mitigating genomic instability, whereas these pathways showed limited regulation in males. Females promoted innate immunity via interleukin (IL)-6, inflammasome, and triggering receptor expressed on myeloid cells 1 (TREM1) signaling, whereas males showed suppression of both innate and adaptive immunity, including IL-12, Th1, and T-cell receptor pathways. Upstream analysis identified over 100 transcription factors in both sexes. Males primarily relied on transcriptional mechanisms, whereas females also exhibited translational regulation via La ribonucleoprotein 1 (LARP1), fragile X messenger ribonucleoprotein 1 (FMR1), insulin-like growth factor 2 mRNA binding protein 1 (IGF2BP1), and eukaryotic translation initiation factor 6 (EIF6). These findings suggest distinct, sex-specific molecular adaptations to heat stroke, underscoring the need for targeted therapeutic strategies to mitigate heat-induced morbidity and mortality.NEW & NOTEWORTHY Heat-related mortality continues to rise with climate change. Our transcriptomic analysis reveals distinct sex-specific metabolic strategies to heat stroke: females enter a hypometabolic state, an evolutionary adaptation that conserves energy, whereas males sustain metabolic activity. Transcription factors and a subset of translation regulators in females modulate proteostasis and bioenergetics, driving these sex-specific pathways. These novel findings highlight the critical need to consider sex-specific differences in heat-related illnesses and inform carefully targeted interventions to improve patient outcomes.}},
  author = {Bouchama,  Abderrezak and Gomez,  Maria and Abdullah,  Mashan L. and Al Mahri,  Saeed and Malik,  Shuja Shafi and Yezli,  Saber and Mohammad,  Sameer and Lehe,  Cynthia and Abuyassin,  Bisher and Hoehndorf,  Robert},
  doi = {10.1152/japplphysiol.00001.2025},
  issn = {1522-1601},
  journal = {Journal of Applied Physiology},
  month = {April},
  number = {4},
  pages = {964–978},
  publisher = {American Physiological Society},
  title = {Whole genome transcriptomic profiling reveals distinct sex-specific responses to heat stroke},
  url = {http://dx.doi.org/10.1152/japplphysiol.00001.2025},
  volume = {138},
  year = {2025}
}

Ontology Embedding: A Survey of Methods, Applications and Resources

Chen, Jiaoyan, Mashkova, Olga, Zhapa-Camacho, Fernando, Hoehndorf, Robert, He, Yuan and Horrocks, Ian

IEEE Transactions on Knowledge and Data Engineering, vol. 37(7), pp. 4193–4212 (2025)

Neuro-symbolic AI

@article{Chen2025,
  abstract = {{Ontologies are widely used for representing domain knowledge and meta data, playing an increasingly important role in Information Systems, the Semantic Web, Bioinformatics and many other domains. However, logical reasoning that ontologies can directly support are quite limited in learning, approximation and prediction. One straightforward solution is to integrate statistical analysis and machine learning. To this end, automatically learning vector representation for knowledge of an ontology i.e., ontology embedding has been widely investigated. Numerous papers have been published on ontology embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field. To bridge this gap, we write this survey paper, which first introduces different kinds of semantics of ontologies and formally defines ontology embedding as well as its property of faithfulness. Based on this, it systematically categorizes and analyses a relatively complete set of over 80 papers, according to the ontologies they aim at and their technical solutions including geometric modeling, sequence modeling and graph propagation. This survey also introduces the applications of ontology embedding in ontology engineering, machine learning augmentation and life sciences, presents a new library mOWL and discusses the challenges and future directions.}},
  author = {Chen,  Jiaoyan and Mashkova,  Olga and Zhapa-Camacho,  Fernando and Hoehndorf,  Robert and He,  Yuan and Horrocks,  Ian},
  doi = {10.1109/tkde.2025.3559023},
  issn = {2326-3865},
  journal = {IEEE Transactions on Knowledge and Data Engineering},
  month = {July},
  number = {7},
  pages = {4193–4212},
  publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
  title = {Ontology Embedding: A Survey of Methods,  Applications and Resources},
  url = {http://dx.doi.org/10.1109/TKDE.2025.3559023},
  volume = {37},
  year = {2025}
}

Age-related differences in gene expression and pathway activation following heatstroke

Gomez, Maria, Al Mahri, Saeed, Abdullah, Mashan, Malik, Shuja Shafi, Yezli, Saber, Yassin, Yara, Khan, Anas, Lehe, Cynthia, Mohammad, Sameer, Hoehndorf, Robert and Bouchama, Abderrezak

Physiological Genomics, vol. 57(2), pp. 65–79 (2025)

Biomedical informaticsBioengineering

This study investigates the molecular responses to heatstroke in young and old patients by comparing whole-genome transcriptomes between age groups. We analyzed transcriptomic profiles from patients categorized into two age-defined cohorts: young (mean age = 44.9 ± 6 yr) and old (mean age = 66.1 ± 4 yr). Control subjects, exposed to similar environmental heat conditions but without developing heatstroke, were also included in the analysis to provide a baseline for comparison. Despite uniform heatstroke severity at admission, as indicated by core body temperature, consciousness level, and organ damage markers, notable gene expression differences emerged. Old patients showed 37% fewer differentially expressed genes compared with young patients at admission, with a shift toward gene upregulation, deviating from the usual downregulation seen in heat stress responses. Both age groups exhibited increased heat shock protein gene expression, activated the heat stress, and unfolded protein responses indicating comparable proteotoxic stress. Nonetheless, age-specific differences were evident in critical regulatory pathways like Sirtuin, mTOR, and p53 signaling, along with key pathways related to proteostasis, energy metabolism, oxidative stress, and immune responses. Following cooling, older adults exhibited a decline in the heat stress response and a cessation of the unfolded protein response, in contrast to the sustained responses seen in younger individuals. This pattern suggests an age-related adaptability or a diminished protective response capacity with aging. These findings provide insights into the biological mechanisms that may contribute to age-specific vulnerabilities to heat.NEW & NOTEWORTHY Our study reveals distinct molecular responses to heatstroke across age groups, with older adults showing fewer differentially expressed genes and an atypical pattern of gene upregulation, contrasting with the downregulation in usual heat stress responses. It also uncovers a reduced heat stress response and an abbreviated unfolded protein response in older adults, likely impairing their cellular repair mechanisms. This contributes to increased vulnerability during severe heat waves, underscoring the urgent need for age-specific interventions.

@article{Gomez2025,
  abstract = {{This study investigates the molecular responses to heatstroke in young and old patients by comparing whole-genome transcriptomes between age groups. We analyzed transcriptomic profiles from patients categorized into two age-defined cohorts: young (mean age = 44.9 ± 6 yr) and old (mean age = 66.1 ± 4 yr). Control subjects, exposed to similar environmental heat conditions but without developing heatstroke, were also included in the analysis to provide a baseline for comparison. Despite uniform heatstroke severity at admission, as indicated by core body temperature, consciousness level, and organ damage markers, notable gene expression differences emerged. Old patients showed 37% fewer differentially expressed genes compared with young patients at admission, with a shift toward gene upregulation, deviating from the usual downregulation seen in heat stress responses. Both age groups exhibited increased heat shock protein gene expression, activated the heat stress, and unfolded protein responses indicating comparable proteotoxic stress. Nonetheless, age-specific differences were evident in critical regulatory pathways like Sirtuin, mTOR, and p53 signaling, along with key pathways related to proteostasis, energy metabolism, oxidative stress, and immune responses. Following cooling, older adults exhibited a decline in the heat stress response and a cessation of the unfolded protein response, in contrast to the sustained responses seen in younger individuals. This pattern suggests an age-related adaptability or a diminished protective response capacity with aging. These findings provide insights into the biological mechanisms that may contribute to age-specific vulnerabilities to heat.NEW & NOTEWORTHY Our study reveals distinct molecular responses to heatstroke across age groups, with older adults showing fewer differentially expressed genes and an atypical pattern of gene upregulation, contrasting with the downregulation in usual heat stress responses. It also uncovers a reduced heat stress response and an abbreviated unfolded protein response in older adults, likely impairing their cellular repair mechanisms. This contributes to increased vulnerability during severe heat waves, underscoring the urgent need for age-specific interventions.}},
  author = {Gomez,  Maria and Al Mahri,  Saeed and Abdullah,  Mashan and Malik,  Shuja Shafi and Yezli,  Saber and Yassin,  Yara and Khan,  Anas and Lehe,  Cynthia and Mohammad,  Sameer and Hoehndorf,  Robert and Bouchama,  Abderrezak},
  doi = {10.1152/physiolgenomics.00053.2024},
  issn = {1531-2267},
  journal = {Physiological Genomics},
  month = {February},
  number = {2},
  pages = {65–79},
  publisher = {American Physiological Society},
  title = {Age-related differences in gene expression and pathway activation following heatstroke},
  url = {http://dx.doi.org/10.1152/physiolgenomics.00053.2024},
  volume = {57},
  year = {2025}
}

The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients

Kafkas, Şenay, Abdelhakim, Marwa, Althagafi, Azza, Toonsi, Sumyyah, Alghamdi, Malak, Schofield, Paul N. and Hoehndorf, Robert

Scientific Reports, vol. 15(1) (2025)

Rare diseaseBiomedical informatics

@article{Kafkas2025,
  abstract = {{Computational methods for identifying gene-disease associations can use both genomic and phenotypic information to prioritize genes and variants that may be associated with genetic diseases. Phenotype-based methods commonly rely on comparing phenotypes observed in a patient with databases of genotype-to-phenotype associations using measures of semantic similarity. They are constrained by the quality and completeness of these resources as well as the quality and completeness of patient phenotype annotation. Genotype-to-phenotype associations used by these methods are largely derived from the literature and coded using phenotype ontologies. Large Language Models (LLMs) have been trained on large amounts of text and data and have shown their potential to answer complex questions across multiple domains. Here, we evaluate the effectiveness of LLMs in prioritizing disease-associated genes compared to existing bioinformatics methods. We show that LLMs can prioritize disease-associated genes as well, or better than, dedicated bioinformatics methods relying on pre-defined phenotype similarity, when gene sets range from 5 to 100 candidates. We apply our approach to a cohort of undiagnosed patients with rare diseases and show that LLMs can be used to provide diagnostic support that helps in identifying plausible candidate genes. Our results show that LLMs may offer an alternative to traditional bioinformatics methods to prioritize disease-associated genes based on disease phenotypes. They may, therefore, potentially enhance diagnostic accuracy and simplify the process for rare genetic diseases.}},
  author = {Kafkas,  Şenay and Abdelhakim,  Marwa and Althagafi,  Azza and Toonsi,  Sumyyah and Alghamdi,  Malak and Schofield,  Paul N. and Hoehndorf,  Robert},
  doi = {10.1038/s41598-025-99539-y},
  issn = {2045-2322},
  journal = {Scientific Reports},
  month = {April},
  number = {1},
  publisher = {Springer Science and Business Media LLC},
  title = {The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients},
  url = {http://dx.doi.org/10.1038/s41598-025-99539-y},
  volume = {15},
  year = {2025}
}

Phased genome assemblies and pangenome graphs of human populations of Japan and Saudi Arabia

Kulmanov, Maxat, Ashouri, Saeideh, Liu, Yang, Abdelhakim, Marwa, Alsolme, Ebtehal, Nagasaki, Masao, Ohkawa, Yasuyuki, Suzuki, Yutaka, Tawfiq, Rund, Tokunaga, Katsushi, Katayama, Toshiaki, Abedalthagafi, Malak S., Hoehndorf, Robert and Kawai, Yosuke

Scientific Data, vol. 12(1) (2025)

Genomics

@article{Kulmanov2025,
  abstract = {{The selection of a reference sequence in genome analysis is critical, as it serves as the foundation for all downstream analyses. Recently, the pangenome graph has been proposed as a data model that incorporates haplotypes from multiple individuals. Here we present JaSaPaGe, a pangenome graph reference for Saudi Arabian and Japanese populations, both of which have been significantly underrepresented in previous genomic studies. We constructed JaSaPaGe from high-quality phased diploid assemblies which were made utilizing PacBio high-fidelity long reads, Nanopore long reads, and Hi-C short reads of 9 Saudi and 10 Japanese individuals. Quality evaluation of the pangenome graph by variant calling showed that our pangenome outperformed earlier linear reference genomes (GRCh38 and T2T-CHM13) and showed comparable performance to the pangenome graph provided by the Human Pangenome Reference Consortium (HPRC), with more variants found in Japanese and Saudi samples using their population-specific pangenomes. This pangenome reference will serve as a valuable resource for both the research and clinical communities in Japan and Saudi Arabia.}},
  author = {Kulmanov,  Maxat and Ashouri,  Saeideh and Liu,  Yang and Abdelhakim,  Marwa and Alsolme,  Ebtehal and Nagasaki,  Masao and Ohkawa,  Yasuyuki and Suzuki,  Yutaka and Tawfiq,  Rund and Tokunaga,  Katsushi and Katayama,  Toshiaki and Abedalthagafi,  Malak S. and Hoehndorf,  Robert and Kawai,  Yosuke},
  doi = {10.1038/s41597-025-05652-y},
  issn = {2052-4463},
  journal = {Scientific Data},
  month = {August},
  number = {1},
  publisher = {Springer Science and Business Media LLC},
  title = {Phased genome assemblies and pangenome graphs of human populations of Japan and Saudi Arabia},
  url = {http://dx.doi.org/10.1038/s41597-025-05652-y},
  volume = {12},
  year = {2025}
}

Genomic landscape of retinoblastoma: Insights into risk stratification and precision pediatric Neuro-Oncology

Maktabi, Azza, Liu, Yang, Almesfer, Saleh, Abdelhakim, Marwa, Aldakhil, Hind, Kulmanov, Maxat, Edward, Deepak P, Hoehndorf, Robert and Abedalthagafi, Malak

Neuro-Oncology Pediatrics (2025)

GenomicsRare disease

@article{Maktabi2025,
  abstract = {{Abstract Background Retinoblastoma is the most common intraocular malignancy of childhood, yet its genomic landscape remains incompletely defined, particularly in understudied populations. Beyond RB1 loss, the contribution of additional somatic and germline alterations to disease heterogeneity and clinical behavior is unclear. Methods We performed whole-exome sequencing of 166 retinoblastoma samples from 166 patients with matched germline DNA, representing the largest cohort analyzed to date. Clinical data were available for 160 patients. Variant calling, copy number alteration (CNA) profiling, and integrative analyses were performed to characterize genetic drivers and their associations with clinical features. Results Pathogenic RB1 variants were identified in 120 patients, and MYCN amplification in 6 patients. Additional recurrent alterations involved BCOR, CCND3, ERBB2, and PDGFRB. Copy number gains of 6p (41.3%) and 17q (8.1%) were significantly associated with high-risk features including rubeosis, subretinal seeding, and tumor extension beyond the lamina cribrosa. Germline ERBB2 variants correlated with orbital invasion, while germline PDGFRB variants were associated with second primary cancers. Together, these findings underscore the genetic heterogeneity of retinoblastoma and reveal novel genotype–phenotype correlations. Conclusions This study provides the most comprehensive genomic characterization of retinoblastoma to date, expands the known mutational spectrum, and identifies biomarkers with direct clinical relevance. These insights have the potential to refine risk stratification, inform precision therapeutic strategies, and improve long-term outcomes for children with retinoblastoma.}},
  author = {Maktabi,  Azza and Liu,  Yang and Almesfer,  Saleh and Abdelhakim,  Marwa and Aldakhil,  Hind and Kulmanov,  Maxat and Edward,  Deepak P and Hoehndorf,  Robert and Abedalthagafi,  Malak},
  doi = {10.1093/neuped/wuaf017},
  issn = {2977-4454},
  journal = {Neuro-Oncology Pediatrics},
  month = {December},
  publisher = {Oxford University Press (OUP)},
  title = {Genomic landscape of retinoblastoma: Insights into risk stratification and precision pediatric Neuro-Oncology},
  url = {http://dx.doi.org/10.1093/neuped/wuaf017},
  year = {2025}
}

Nanodesigner: resolving the complex-CDR interdependency with iterative refinement

Rios Zertuche, Melissa Maria, Kafkas, Şenay, Renn, Dominik, Rueping, Magnus and Hoehndorf, Robert

Journal of Cheminformatics, vol. 17(1) (2025)

BioengineeringDrug mechanisms

@article{RiosZertuche2025,
  abstract = {{Abstract

Camelid heavy-chain only antibodies consist of two heavy chains and single variable domains (VHHs), which retain antigen-binding functionality even when isolated. The term “nanobody” is now more generally used for describing small, single-domain antibodies. Several antibody generative models have been developed for the sequence and structure co-design of the complementarity-determining regions (CDRs) based on the binding interface with a target antigen. However, these models are not tailored for nanobodies and are often constrained by their reliance on experimentally determined antigen–antibody structures, which are labor-intensive to obtain. Here, we introduce NanoDesigner, a tool for nanobody design and optimization based on generative AI methods. NanoDesigner integrates key stages—structure prediction, docking, CDR generation, and side-chain packing—into an iterative framework based on an expectation maximization (EM) algorithm. The algorithm effectively tackles an interdependency challenge where accurate docking presupposes
a priori
knowledge of the CDR conformation, while effective CDR generation relies on accurate docking outputs to guide its design. NanoDesigner approximately doubles the success rate of de novo nanobody designs through continuous refinement of docking and CDR generation.}},
  author = {Rios Zertuche,  Melissa Maria and Kafkas,  Şenay and Renn,  Dominik and Rueping,  Magnus and Hoehndorf,  Robert},
  doi = {10.1186/s13321-025-01069-2},
  issn = {1758-2946},
  journal = {Journal of Cheminformatics},
  month = {August},
  number = {1},
  publisher = {Springer Science and Business Media LLC},
  title = {Nanodesigner: resolving the complex-CDR interdependency with iterative refinement},
  url = {http://dx.doi.org/10.1186/s13321-025-01069-2},
  volume = {17},
  year = {2025}
}

Causal knowledge graph analysis identifies adverse drug effects

Toonsi, Sumyyah, Schofield, Paul N and Hoehndorf, Robert

Bioinformatics, vol. 42(1), In: Lu, Zhiyong (Ed.) (2025)

Drug mechanisms

@article{Toonsi2025,
  abstract = {{The data is available through https://github.com/bio-ontology-research-group/Mediation-Analysis-using-Causal-Knowledge-Graph.}},
  author = {Toonsi,  Sumyyah and Schofield,  Paul N and Hoehndorf,  Robert},
  doi = {10.1093/bioinformatics/btaf661},
  editor = {Lu,  Zhiyong},
  issn = {1367-4811},
  journal = {Bioinformatics},
  month = {December},
  number = {1},
  publisher = {Oxford University Press (OUP)},
  title = {Causal knowledge graph analysis identifies adverse drug effects},
  url = {http://dx.doi.org/10.1093/bioinformatics/btaf661},
  volume = {42},
  year = {2025}
}

Lattice-Based ALC Ontology Embeddings With Saturation

Zhapa-Camacho, Fernando and Hoehndorf, Robert

Neurosymbolic Artificial Intelligence, vol. 1 (2025)

Biomedical informatics

@article{ZhapaCamacho2025,
  abstract = {{Generating vector representations (embeddings) of OWL ontologies is a growing task due to its applications in predicting missing facts and knowledge-enhanced learning in fields such as bioinformatics. The underlying semantics of OWL ontologies are expressed using Description Logics (DLs). Initial approaches to generate embeddings relied on constructing a graph out of ontologies, neglecting the semantics of the logic therein. Recent semantic-preserving embedding methods often target lightweight DL languages such as    E L   + +    , ignoring more expressive information in ontologies. Although some approaches aim to embed more descriptive DLs such as   A L C   , those methods require the existence of individuals, while many real-world ontologies are devoid of them. We propose an ontology embedding method for the   A L C   DL language that considers the lattice structure of concept descriptions. We use connections between DL and Category Theory to materialize the lattice structure and embed it using an order-preserving embedding method. We show that our method outperforms state-of-the-art methods in several knowledge base completion tasks. This is an extended version of our previous work, where we incorporate saturation procedures that increase the information within the constructed lattices. We make our code and data available at https://github.com/bio-ontology-research-group/catE .}},
  author = {Zhapa-Camacho,  Fernando and Hoehndorf,  Robert},
  doi = {10.1177/29498732251340186},
  issn = {2949-8732},
  journal = {Neurosymbolic Artificial Intelligence},
  month = {June},
  publisher = {SAGE Publications},
  title = {Lattice-Based ALC Ontology Embeddings With Saturation},
  url = {http://dx.doi.org/10.1177/29498732251340186},
  volume = {1},
  year = {2025}
}

LLM Agent Based Protein Function Prediction

Zhapa-Camacho, Fernando, Mashkova, Olga, Hoehndorf, Robert and Kulmanov, Maxat

Biocomputing 2026, pp. 508–519 (2025)

Protein function

@inproceedings{ZhapaCamacho2025llm,
  abstract = {{Generating vector representations (embeddings) of OWL ontologies is a growing task due to its applications in predicting missing facts and knowledge-enhanced learning in fields such as bioinformatics. The underlying semantics of OWL ontologies are expressed using Description Logics (DLs). Initial approaches to generate embeddings relied on constructing a graph out of ontologies, neglecting the semantics of the logic therein. Recent semantic-preserving embedding methods often target lightweight DL languages such as    E L   + +    , ignoring more expressive information in ontologies. Although some approaches aim to embed more descriptive DLs such as   A L C   , those methods require the existence of individuals, while many real-world ontologies are devoid of them. We propose an ontology embedding method for the   A L C   DL language that considers the lattice structure of concept descriptions. We use connections between DL and Category Theory to materialize the lattice structure and embed it using an order-preserving embedding method. We show that our method outperforms state-of-the-art methods in several knowledge base completion tasks. This is an extended version of our previous work, where we incorporate saturation procedures that increase the information within the constructed lattices. We make our code and data available at https://github.com/bio-ontology-research-group/catE .}},
  author = {Zhapa-Camacho,  Fernando and Mashkova,  Olga and Hoehndorf,  Robert and Kulmanov,  Maxat},
  booktitle = {Biocomputing 2026},
  doi = {10.1142/9789819824755_0036},
  month = {December},
  pages = {508–519},
  publisher = {WORLD SCIENTIFIC},
  title = {LLM Agent Based Protein Function Prediction},
  url = {http://dx.doi.org/10.1142/9789819824755_0036},
  year = {2025}
}

Sa1216: Development of colorectal cancer and matched healthy organoids from Saudi patients: a case study

Alhattab, Dana, Barakeh, Duna, Khoja, Basma, Elhadi, Ahmad, Miro, Jameel, Alessy, Saleh A., Alharbi, Ahmed, Bokhary, Manal, Alzahrani, May, Ali, Saga, Almohamdi, Wadha, Hefni, Lama, Moretti, Manola, Liu, Yang, Abdelhakim, Marwa, Abdullah, Abeer, Alomaim, Waleed, Hoehndorf, Robert, Hauser, Charlotte and Alqahtani, Saleh A.

Gastroenterology, vol. 169(1), pp. S-400 (2025)

BioengineeringBiomedical informatics

@article{Alhattab2025,
  author = {Alhattab, Dana and Barakeh, Duna and Khoja, Basma and Elhadi, Ahmad and Miro, Jameel and Alessy, Saleh A. and Alharbi, Ahmed and Bokhary, Manal and Alzahrani, May and Ali, Saga and Almohamdi, Wadha and Hefni, Lama and Moretti, Manola and Liu, Yang and Abdelhakim, Marwa and Abdullah, Abeer and Alomaim, Waleed and Hoehndorf, Robert and Hauser, Charlotte and Alqahtani, Saleh A.},
  doi = {10.1016/s0016-5085(25)01866-9},
  issn = {0016-5085},
  journal = {Gastroenterology},
  month = {May},
  number = {1},
  pages = {S--400},
  publisher = {Elsevier BV},
  title = {Sa1216: Development of colorectal cancer and matched healthy organoids from Saudi patients: a case study},
  url = {http://dx.doi.org/10.1016/S0016-5085(25)01866-9},
  volume = {169},
  year = {2025}
}

Su1295: Chemically defined peptide-based matrices enabling the development of colorectal organoid models for therapeutic applications and disease modeling

Alhattab, Dana, Barakeh, Duna, Khoja, Basma, Elhadi, Ahmad, Miro, Jameel, Alessy, Saleh A., Alharbi, Ahmed, Bokhary, Manal, Alzahrani, May, Ali, Saga, Almohamdi, Wadha, Hefni, Lama, Moretti, Manola, Abdullah, Abeer, Alomaim, Waleed, Hoehndorf, Robert, Hauser, Charlotte and Alqahtani, Saleh A.

Gastroenterology, vol. 169(1), pp. S-734 (2025)

BioengineeringDrug mechanisms

@article{Alhattab2025b,
  author = {Alhattab, Dana and Barakeh, Duna and Khoja, Basma and Elhadi, Ahmad and Miro, Jameel and Alessy, Saleh A. and Alharbi, Ahmed and Bokhary, Manal and Alzahrani, May and Ali, Saga and Almohamdi, Wadha and Hefni, Lama and Moretti, Manola and Abdullah, Abeer and Alomaim, Waleed and Hoehndorf, Robert and Hauser, Charlotte and Alqahtani, Saleh A.},
  doi = {10.1016/s0016-5085(25)02643-5},
  issn = {0016-5085},
  journal = {Gastroenterology},
  month = {May},
  number = {1},
  pages = {S--734},
  publisher = {Elsevier BV},
  title = {Su1295: Chemically defined peptide-based matrices enabling the development of colorectal organoid models for therapeutic applications and disease modeling},
  url = {http://dx.doi.org/10.1016/S0016-5085(25)02643-5},
  volume = {169},
  year = {2025}
}

Neuro-Symbolic AI in Life Sciences

Hoehndorf, Robert, Pesquita, Catia and Zhapa-Camacho, Fernando

Handbook on Neurosymbolic AI and Knowledge Graphs (2025)

Neuro-symbolic AI

@inbook{Hoehndorf2025,
  abstract = {{Life sciences have a long history of driving advancements in various disciplines, including mathematics, philosophy, and logic. In recent years, life sciences have also become a significant application area for Artificial Intelligence (AI) technologies, including for neuro-symbolic AI methods. The life sciences knowledge infrastructure, characterized by its widespread use of ontologies, complex annotation models, large size, and community standards, presents unique challenges and opportunities for neuro-symbolic AI. We outline how neuro-symbolic methods have been applied and developed to address these challenges. We describe semantic similarity measures, knowledge graph embeddings, ontology embeddings, and knowledge-enhanced learning in the context of formalized life science knowledge. While there has been significant progress, we also outline multiple remaining challenges that provide opportunities for future research.}},
  author = {Hoehndorf,  Robert and Pesquita,  Catia and Zhapa-Camacho,  Fernando},
  booktitle = {Handbook on Neurosymbolic AI and Knowledge Graphs},
  doi = {10.3233/faia250239},
  isbn = {9781643685793},
  issn = {1879-8314},
  month = {March},
  publisher = {IOS Press},
  title = {Neuro-Symbolic AI in Life Sciences},
  url = {http://dx.doi.org/10.3233/faia250239},
  year = {2025}
}

Computational prediction of protein functional annotations

Kulmanov, Maxat and Hoehndorf, Robert

Protein Function Prediction, pp. 3-28 (2025)

Protein function

@inbook{Kulmanov2025x,
  abstract = {{Protein function prediction is a crucial task in bioinformatics and computational biology, as it enables the understanding of disease mechanisms, development of new therapeutics, and improvement of crop yields. Despite significant advances, the majority of protein functions remain unknown or poorly annotated, hindering our understanding of biological systems. This review provides a comprehensive overview of the available methods for protein function prediction, categorizing them into eight classes based on the sources of information they use. We examine over 35 methods, including traditional sequence-based approaches and recent advances in machine learning and natural language processing. We also discuss the incorporation of background knowledge in Gene Ontology and zero-shot predictions. To improve protein function prediction, we highlight the need for developing more accurate and robust methods that integrate multiple sources of information. We provide several practical notes for choosing and interpreting the results of protein function prediction methods.}},
  author = {Kulmanov,  Maxat and Hoehndorf,  Robert},
  booktitle = {Protein Function Prediction},
  doi = {10.1007/978-1-0716-4662-5_1},
  isbn = {9781071646625},
  issn = {1940-6029},
  month = {February},
  pages = {3--28},
  publisher = {Springer US},
  title = {Computational prediction of protein functional annotations},
  url = {http://dx.doi.org/10.1007/978-1-0716-4662-5_1},
  year = {2025}
}

The informatics of developmental phenotypes

Schofield, Paul N., Hoehndorf, Robert, Gkoutos, Georgios V. and Smith, Cynthia L.

Kaufman’s Atlas of Mouse Development Supplement, pp. 457–470 (2025)

Phenotype informaticsBiomedical informatics

@inbook{Schofield2025,
  author = {Schofield,  Paul N. and Hoehndorf,  Robert and Gkoutos,  Georgios V. and Smith,  Cynthia L.},
  booktitle = {Kaufman’s Atlas of Mouse Development Supplement},
  doi = {10.1016/b978-0-443-23739-3.00012-2},
  isbn = {9780443237393},
  pages = {457–470},
  publisher = {Elsevier},
  title = {The informatics of developmental phenotypes},
  url = {http://dx.doi.org/10.1016/B978-0-443-23739-3.00012-2},
  year = {2025}
}

Annotating genomes with DeepGO protein function prediction tools

Tawfiq, Rund, Niu, Kexin, Kulmanov, Maxat and Hoehndorf, Robert

Protein Function Prediction, pp. 171-189 (2025)

Protein functionGenomics

@inbook{Tawfiq2025,
  abstract = {{This chapter explores the evolution of DeepGO, a suite of deep learning-based tools for protein function prediction, in the form of Gene Ontology (GO) terms, and their applications in genome annotation. We provide a comprehensive overview of the different versions of DeepGO, highlighting key advancements introduced by each method. To demonstrate the practical application of these tools, we present a case study on the annotation of a bacterial genome using the latest Deep GO model, DeepGO-SE. We showcase the efficiency and accuracy of DeepGO-SE in predicting protein functions and discuss the model’s parameters. This chapter serves as a guide for researchers looking to enhance their genomic analyses using deep learning-based function prediction methods.}},
  author = {Tawfiq,  Rund and Niu,  Kexin and Kulmanov,  Maxat and Hoehndorf,  Robert},
  booktitle = {Protein Function Prediction},
  doi = {10.1007/978-1-0716-4662-5_10},
  isbn = {9781071646625},
  issn = {1940-6029},
  month = {February},
  pages = {171--189},
  publisher = {Springer US},
  title = {Annotating genomes with DeepGO protein function prediction tools},
  url = {http://dx.doi.org/10.1007/978-1-0716-4662-5_10},
  year = {2025}
}

Predicting protein functions using positive-unlabeled ranking with ontology-based priors

Zhapa-Camacho, Fernando, Tang, Zhenwei, Kulmanov, Maxat and Hoehndorf, Robert

Bioinformatics, vol. 40(Supplement_1), pp. i401-i409 (2024)

Protein functionNeuro-symbolic AI

@article{10.1093/bioinformatics/btae237,
  abstract = {Automated protein function prediction is a crucial and widely studied problem in bioinformatics. Computationally, protein function is a multilabel classification problem where only positive samples are defined and there is a large number of unlabeled annotations. Most existing methods rely on the assumption that the unlabeled set of protein function annotations are negatives, inducing the false negative issue, where potential positive samples are trained as negatives. We introduce a novel approach named PU-GO, wherein we address function prediction as a positive-unlabeled ranking problem. We apply empirical risk minimization, i.e. we minimize the classification risk of a classifier where class priors are obtained from the Gene Ontology hierarchical structure. We show that our approach is more robust than other state-of-the-art methods on similarity-based and time-based benchmark datasets.Data and code are available at https://github.com/bio-ontology-research-group/PU-GO.},
  author = {Zhapa-Camacho, Fernando and Tang, Zhenwei and Kulmanov, Maxat and Hoehndorf, Robert},
  doi = {10.1093/bioinformatics/btae237},
  issn = {1367-4811},
  journal = {Bioinformatics},
  month = {06},
  number = {Supplement\_1},
  pages = {i401-i409},
  title = {Predicting protein functions using positive-unlabeled ranking with ontology-based priors},
  url = {https://doi.org/10.1093/bioinformatics/btae237},
  volume = {40},
  year = {2024}
}

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning

Althagafi, Azza, Zhapa-Camacho, Fernando and Hoehndorf, Robert

Bioinformatics, vol. 40(5) (2024)

Rare diseaseNeuro-symbolic AI

@article{Althagafi2024,
  abstract = {{EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.}},
  author = {Althagafi,  Azza and Zhapa-Camacho,  Fernando and Hoehndorf,  Robert},
  doi = {10.1093/bioinformatics/btae301},
  issn = {1367-4811},
  journal = {Bioinformatics},
  month = {May},
  number = {5},
  publisher = {Oxford University Press (OUP)},
  title = {Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning},
  url = {http://dx.doi.org/10.1093/bioinformatics/btae301},
  volume = {40},
  year = {2024}
}

An open source knowledge graph ecosystem for the life sciences

Callahan, Tiffany J., Tripodi, Ignacio J., Stefanski, Adrianne L., Cappelletti, Luca, Taneja, Sanya B., Wyrwa, Jordan M., Casiraghi, Elena, Matentzoglu, Nicolas A., Reese, Justin, Silverstein, Jonathan C., Hoyt, Charles Tapley, Boyce, Richard D., Malec, Scott A., Unni, Deepak R., Joachimiak, Marcin P., Robinson, Peter N., Mungall, Christopher J., Cavalleri, Emanuele, Fontana, Tommaso, Valentini, Giorgio, Mesiti, Marco, Gillenwater, Lucas A., Santangelo, Brook, Vasilevsky, Nicole A., Hoehndorf, Robert, Bennett, Tellen D., Ryan, Patrick B., Hripcsak, George, Kahn, Michael G., Bada, Michael, Baumgartner, William A. and Hunter, Lawrence E.

Scientific Data, vol. 11(1) (2024)

Ontology engineering

@article{Callahan2024,
  abstract = {{AbstractTranslational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.}},
  author = {Callahan,  Tiffany J. and Tripodi,  Ignacio J. and Stefanski,  Adrianne L. and Cappelletti,  Luca and Taneja,  Sanya B. and Wyrwa,  Jordan M. and Casiraghi,  Elena and Matentzoglu,  Nicolas A. and Reese,  Justin and Silverstein,  Jonathan C. and Hoyt,  Charles Tapley and Boyce,  Richard D. and Malec,  Scott A. and Unni,  Deepak R. and Joachimiak,  Marcin P. and Robinson,  Peter N. and Mungall,  Christopher J. and Cavalleri,  Emanuele and Fontana,  Tommaso and Valentini,  Giorgio and Mesiti,  Marco and Gillenwater,  Lucas A. and Santangelo,  Brook and Vasilevsky,  Nicole A. and Hoehndorf,  Robert and Bennett,  Tellen D. and Ryan,  Patrick B. and Hripcsak,  George and Kahn,  Michael G. and Bada,  Michael and Baumgartner,  William A. and Hunter,  Lawrence E.},
  doi = {10.1038/s41597-024-03171-w},
  issn = {2052-4463},
  journal = {Scientific Data},
  month = {April},
  number = {1},
  publisher = {Springer Science and Business Media LLC},
  title = {An open source knowledge graph ecosystem for the life sciences},
  url = {http://dx.doi.org/10.1038/s41597-024-03171-w},
  volume = {11},
  year = {2024}
}

Protein function prediction as approximate semantic entailment

Kulmanov, Maxat, Guzmán-Vega, Francisco J., Duek Roggli, Paula, Lane, Lydie, Arold, Stefan T. and Hoehndorf, Robert

Nature Machine Intelligence, vol. 6(2), pp. 220–228 (2024)

Biomedical informatics

@article{Kulmanov2024,
  abstract = {{Abstract The Gene Ontology (GO) is a formal, axiomatic theory with over 100,000 axioms that describe the molecular functions, biological processes and cellular locations of proteins in three subontologies. Predicting the functions of proteins using the GO requires both learning and reasoning capabilities in order to maintain consistency and exploit the background knowledge in the GO. Many methods have been developed to automatically predict protein functions, but effectively exploiting all the axioms in the GO for knowledge-enhanced learning has remained a challenge. We have developed DeepGO-SE, a method that predicts GO functions from protein sequences using a pretrained large language model. DeepGO-SE generates multiple approximate models of GO, and a neural network predicts the truth values of statements about protein functions in these approximate models. We aggregate the truth values over multiple models so that DeepGO-SE approximates semantic entailment when predicting protein functions. We show, using several benchmarks, that the approach effectively exploits background knowledge in the GO and improves protein function prediction compared to state-of-the-art methods.}},
  author = {Kulmanov,  Maxat and Guzmán-Vega,  Francisco J. and Duek Roggli,  Paula and Lane,  Lydie and Arold,  Stefan T. and Hoehndorf,  Robert},
  doi = {10.1038/s42256-024-00795-w},
  issn = {2522-5839},
  journal = {Nature Machine Intelligence},
  month = {February},
  number = {2},
  pages = {220–228},
  publisher = {Springer Science and Business Media LLC},
  title = {Protein function prediction as approximate semantic entailment},
  url = {http://dx.doi.org/10.1038/s42256-024-00795-w},
  volume = {6},
  year = {2024}
}

A reference quality, fully annotated diploid genome from a Saudi individual

Kulmanov, Maxat, Tawfiq, Rund, Liu, Yang, Al Ali, Hatoon, Abdelhakim, Marwa, Alarawi, Mohammed, Aldakhil, Hind, Alhattab, Dana, Alsolme, Ebtehal A., Althagafi, Azza, Angelov, Angel, Bougouffa, Salim, Driguez, Patrick, Park, Changsook, Putra, Alexander, Reyes-Ramos, Ana M., Hauser, Charlotte A. E., Cheung, Ming Sin, Abedalthagafi, Malak S. and Hoehndorf, Robert

Scientific Data, vol. 11(1) (2024)

Genomics

@article{Kulmanov2024diploid,
  abstract = {{Abstract The Gene Ontology (GO) is a formal, axiomatic theory with over 100,000 axioms that describe the molecular functions, biological processes and cellular locations of proteins in three subontologies. Predicting the functions of proteins using the GO requires both learning and reasoning capabilities in order to maintain consistency and exploit the background knowledge in the GO. Many methods have been developed to automatically predict protein functions, but effectively exploiting all the axioms in the GO for knowledge-enhanced learning has remained a challenge. We have developed DeepGO-SE, a method that predicts GO functions from protein sequences using a pretrained large language model. DeepGO-SE generates multiple approximate models of GO, and a neural network predicts the truth values of statements about protein functions in these approximate models. We aggregate the truth values over multiple models so that DeepGO-SE approximates semantic entailment when predicting protein functions. We show, using several benchmarks, that the approach effectively exploits background knowledge in the GO and improves protein function prediction compared to state-of-the-art methods.}},
  author = {Kulmanov,  Maxat and Tawfiq,  Rund and Liu,  Yang and Al Ali,  Hatoon and Abdelhakim,  Marwa and Alarawi,  Mohammed and Aldakhil,  Hind and Alhattab,  Dana and Alsolme,  Ebtehal A. and Althagafi,  Azza and Angelov,  Angel and Bougouffa,  Salim and Driguez,  Patrick and Park,  Changsook and Putra,  Alexander and Reyes-Ramos,  Ana M. and Hauser,  Charlotte A. E. and Cheung,  Ming Sin and Abedalthagafi,  Malak S. and Hoehndorf,  Robert},
  doi = {10.1038/s41597-024-04121-2},
  issn = {2052-4463},
  journal = {Scientific Data},
  month = {November},
  number = {1},
  publisher = {Springer Science and Business Media LLC},
  title = {A reference quality,  fully annotated diploid genome from a Saudi individual},
  url = {http://dx.doi.org/10.1038/s41597-024-04121-2},
  volume = {11},
  year = {2024}
}

Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project

Stenton, Sarah L., O’Leary, Melanie C., Lemire, Gabrielle, VanNoy, Grace E., DiTroia, Stephanie, Ganesh, Vijay S., Groopman, Emily, O’Heir, Emily, Mangilog, Brian, Osei-Owusu, Ikeoluwa, Pais, Lynn S., Serrano, Jillian, Singer-Berk, Moriel, Weisburd, Ben, Wilson, Michael W., Austin-Tse, Christina, Abdelhakim, Marwa, Althagafi, Azza, Babbi, Giulia, Bellazzi, Riccardo, Bovo, Samuele, Carta, Maria Giulia, Casadio, Rita, Coenen, Pieter-Jan, De Paoli, Federica, Floris, Matteo, Gajapathy, Manavalan, Hoehndorf, Robert, Jacobsen, Julius O. B., Joseph, Thomas, Kamandula, Akash, Katsonis, Panagiotis, Kint, Cyrielle, Lichtarge, Olivier, Limongelli, Ivan, Lu, Yulan, Magni, Paolo, Mamidi, Tarun Karthik Kumar, Martelli, Pier Luigi, Mulargia, Marta, Nicora, Giovanna, Nykamp, Keith, Pejaver, Vikas, Peng, Yisu, Pham, Thi Hong Cam, Podda, Maurizio S., Rao, Aditya, Rizzo, Ettore, Saipradeep, Vangala G., Savojardo, Castrense, Schols, Peter, Shen, Yang, Sivadasan, Naveen, Smedley, Damian, Soru, Dorian, Srinivasan, Rajgopal, Sun, Yuanfei, Sunderam, Uma, Tan, Wuwei, Tiwari, Naina, Wang, Xiao, Wang, Yaqiong, Williams, Amanda, Worthey, Elizabeth A., Yin, Rujie, You, Yuning, Zeiberg, Daniel, Zucca, Susanna, Bakolitsa, Constantina, Brenner, Steven E., Fullerton, Stephanie M., Radivojac, Predrag, Rehm, Heidi L. and O’Donnell-Luria, Anne

Human Genomics, vol. 18(1) (2024)

Rare diseaseGenomics

@article{Stenton2024,
  abstract = {{Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.}},
  author = {Stenton,  Sarah L. and O’Leary,  Melanie C. and Lemire,  Gabrielle and VanNoy,  Grace E. and DiTroia,  Stephanie and Ganesh,  Vijay S. and Groopman,  Emily and O’Heir,  Emily and Mangilog,  Brian and Osei-Owusu,  Ikeoluwa and Pais,  Lynn S. and Serrano,  Jillian and Singer-Berk,  Moriel and Weisburd,  Ben and Wilson,  Michael W. and Austin-Tse,  Christina and Abdelhakim,  Marwa and Althagafi,  Azza and Babbi,  Giulia and Bellazzi,  Riccardo and Bovo,  Samuele and Carta,  Maria Giulia and Casadio,  Rita and Coenen,  Pieter-Jan and De Paoli,  Federica and Floris,  Matteo and Gajapathy,  Manavalan and Hoehndorf,  Robert and Jacobsen,  Julius O. B. and Joseph,  Thomas and Kamandula,  Akash and Katsonis,  Panagiotis and Kint,  Cyrielle and Lichtarge,  Olivier and Limongelli,  Ivan and Lu,  Yulan and Magni,  Paolo and Mamidi,  Tarun Karthik Kumar and Martelli,  Pier Luigi and Mulargia,  Marta and Nicora,  Giovanna and Nykamp,  Keith and Pejaver,  Vikas and Peng,  Yisu and Pham,  Thi Hong Cam and Podda,  Maurizio S. and Rao,  Aditya and Rizzo,  Ettore and Saipradeep,  Vangala G. and Savojardo,  Castrense and Schols,  Peter and Shen,  Yang and Sivadasan,  Naveen and Smedley,  Damian and Soru,  Dorian and Srinivasan,  Rajgopal and Sun,  Yuanfei and Sunderam,  Uma and Tan,  Wuwei and Tiwari,  Naina and Wang,  Xiao and Wang,  Yaqiong and Williams,  Amanda and Worthey,  Elizabeth A. and Yin,  Rujie and You,  Yuning and Zeiberg,  Daniel and Zucca,  Susanna and Bakolitsa,  Constantina and Brenner,  Steven E. and Fullerton,  Stephanie M. and Radivojac,  Predrag and Rehm,  Heidi L. and O’Donnell-Luria,  Anne},
  doi = {10.1186/s40246-024-00604-w},
  issn = {1479-7364},
  journal = {Human Genomics},
  month = {April},
  number = {1},
  publisher = {Springer Science and Business Media LLC},
  title = {Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project},
  url = {http://dx.doi.org/10.1186/s40246-024-00604-w},
  volume = {18},
  year = {2024}
}

DeepGOMeta for functional insights into microbial communities using deep learning-based protein function prediction

Tawfiq, Rund, Niu, Kexin, Hoehndorf, Robert and Kulmanov, Maxat

Scientific Reports, vol. 14(1) (2024)

Microbial communitiesProtein function

@article{Tawfiq2024,
  abstract = {{AbstractAnalyzing microbial samples remains computationally challenging due to their diversity and complexity. The lack of robust de novo protein function prediction methods exacerbates the difficulty in deriving functional insights from these samples. Traditional prediction methods, dependent on homology and sequence similarity, often fail to predict functions for novel proteins and proteins without known homologs. Moreover, most of these methods have been trained on largely eukaryotic data, and have not been evaluated on or applied to microbial datasets. This research introduces DeepGOMeta, a deep learning model designed for protein function prediction as Gene Ontology (GO) terms, trained on a dataset relevant to microbes. The model is applied to diverse microbial datasets to demonstrate its use for gaining biological insights. Data and code are available at https://github.com/bio-ontology-research-group/deepgometa}},
  author = {Tawfiq,  Rund and Niu,  Kexin and Hoehndorf,  Robert and Kulmanov,  Maxat},
  doi = {10.1038/s41598-024-82956-w},
  issn = {2045-2322},
  journal = {Scientific Reports},
  month = {December},
  number = {1},
  publisher = {Springer Science and Business Media LLC},
  title = {DeepGOMeta for functional insights into microbial communities using deep learning-based protein function prediction},
  url = {http://dx.doi.org/10.1038/s41598-024-82956-w},
  volume = {14},
  year = {2024}
}

Causal relationships between diseases mined from the literature improve the use of polygenic risk scores

Toonsi, Sumyyah, Gauran, Iris Ivy, Ombao, Hernando, Schofield, Paul N and Hoehndorf, Robert

Bioinformatics, vol. 40(11) (2024)

Biomedical informaticsRare disease

@article{Toonsi2024,
  abstract = {{The data are available through https://github.com/bio-ontology-research-group/causal-relations-between-diseases.}},
  author = {Toonsi,  Sumyyah and Gauran,  Iris Ivy and Ombao,  Hernando and Schofield,  Paul N and Hoehndorf,  Robert},
  doi = {10.1093/bioinformatics/btae639},
  issn = {1367-4811},
  journal = {Bioinformatics},
  month = {October},
  number = {11},
  publisher = {Oxford University Press (OUP)},
  title = {Causal relationships between diseases mined from the literature improve the use of polygenic risk scores},
  url = {http://dx.doi.org/10.1093/bioinformatics/btae639},
  volume = {40},
  year = {2024}
}

Semantic units: organizing knowledge graphs into semantically meaningful units of representation

Vogt, Lars, Kuhn, Tobias and Hoehndorf, Robert

Journal of Biomedical Semantics, vol. 15(1) (2024)

Applied Ontology

Abstract Background In today’s landscape of data management, the importance of knowledge graphs and ontologies is escalating as critical mechanisms aligned with the FAIR Guiding Principles—ensuring data and metadata are Findable, Accessible, Interoperable, and Reusable. We discuss three challenges that may hinder the effective exploitation of the full potential of FAIR knowledge graphs. Results We introduce “semantic units” as a conceptual solution, although currently exemplified only in a limited prototype. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs by adding another layer of triples on top of the conventional data layer. Semantic units and their subgraphs are represented by their own resource that instantiates a corresponding semantic unit class. We distinguish statement and compound units as basic categories of semantic units. A statement unit is the smallest, independent proposition that is semantically meaningful for a human reader. Depending on the relation of its underlying proposition, it consists of one or more triples. Organizing a knowledge graph into statement units results in a partition of the graph, with each triple belonging to exactly one statement unit. A compound unit, on the other hand, is a semantically meaningful collection of statement and compound units that form larger subgraphs. Some semantic units organize the graph into different levels of representational granularity, others orthogonally into different types of granularity trees or different frames of reference, structuring and organizing the knowledge graph into partially overlapping, partially enclosed subgraphs, each of which can be referenced by its own resource. Conclusions Semantic units, applicable in RDF/OWL and labeled property graphs, offer support for making statements about statements and facilitate graph-alignment, subgraph-matching, knowledge graph profiling, and for management of access restrictions to sensitive data. Additionally, we argue that organizing the graph into semantic units promotes the differentiation of ontological and discursive information, and that it also supports the differentiation of multiple frames of reference within the graph.

@article{Vogt2024,
  abstract = {{Abstract
Background
In today’s landscape of data management, the importance of knowledge graphs and ontologies is escalating as critical mechanisms aligned with the FAIR Guiding Principles—ensuring data and metadata are Findable, Accessible, Interoperable, and Reusable. We discuss three challenges that may hinder the effective exploitation of the full potential of FAIR knowledge graphs.

Results
We introduce “semantic units” as a conceptual solution, although currently exemplified only in a limited prototype. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs by adding another layer of triples on top of the conventional data layer. Semantic units and their subgraphs are represented by their own resource that instantiates a corresponding semantic unit class. We distinguish statement and compound units as basic categories of semantic units. A statement unit is the smallest, independent proposition that is semantically meaningful for a human reader. Depending on the relation of its underlying proposition, it consists of one or more triples. Organizing a knowledge graph into statement units results in a partition of the graph, with each triple belonging to exactly one statement unit. A compound unit, on the other hand, is a semantically meaningful collection of statement and compound units that form larger subgraphs. Some semantic units organize the graph into different levels of representational granularity, others orthogonally into different types of granularity trees or different frames of reference, structuring and organizing the knowledge graph into partially overlapping, partially enclosed subgraphs, each of which can be referenced by its own resource.

Conclusions
Semantic units, applicable in RDF/OWL and labeled property graphs, offer support for making statements about statements and facilitate graph-alignment, subgraph-matching, knowledge graph profiling, and for management of access restrictions to sensitive data. Additionally, we argue that organizing the graph into semantic units promotes the differentiation of ontological and discursive information, and that it also supports the differentiation of multiple frames of reference within the graph.}},
  author = {Vogt,  Lars and Kuhn,  Tobias and Hoehndorf,  Robert},
  doi = {10.1186/s13326-024-00310-5},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  month = {May},
  number = {1},
  publisher = {Springer Science and Business Media LLC},
  title = {Semantic units: organizing knowledge graphs into semantically meaningful units of representation},
  url = {http://dx.doi.org/10.1186/s13326-024-00310-5},
  volume = {15},
  year = {2024}
}

Large-Scale Knowledge Integration for Enhanced Molecular Property Prediction

Ghunaim, Yasir and Hoehndorf, Robert

Neural-Symbolic Learning and Reasoning, pp. 89-97 (2024)

Neuro-symbolic AI

@inbook{Ghunaim2024,
  abstract = {{Pre-training machine learning models on molecular properties has proven effective for generating robust and generalizable representations, which is critical for advancements in drug discovery and materials science. While recent work has primarily focused on data-driven approaches, the KANO model introduces a novel paradigm by incorporating knowledge-enhanced pre-training. In this work, we expand upon KANO by integrating the large-scale ChEBI knowledge graph, which includes 2,840 functional groups – significantly more than the original 82 used in KANO. We explore two approaches, Replace and Integrate, to incorporate this extensive knowledge into the KANO framework. Our results demonstrate that including ChEBI leads to improved performance on 9 out of 14 molecular property prediction datasets. This highlights the importance of utilizing a larger and more diverse set of functional groups to enhance molecular representations for property predictions.}},
  author = {Ghunaim,  Yasir and Hoehndorf,  Robert},
  booktitle = {Neural-Symbolic Learning and Reasoning},
  doi = {10.1007/978-3-031-71170-1_10},
  isbn = {9783031711701},
  issn = {1611-3349},
  pages = {89--97},
  publisher = {Springer Nature Switzerland},
  title = {Large-Scale Knowledge Integration for Enhanced Molecular Property Prediction},
  url = {http://dx.doi.org/10.1007/978-3-031-71170-1_10},
  year = {2024}
}

Enhancing Geometric Ontology Embeddings for mathcalEmathcalL++ with Negative Sampling and Deductive Closure Filtering

Mashkova, Olga, Zhapa-Camacho, Fernando and Hoehndorf, Robert

Neural-Symbolic Learning and Reasoning, pp. 331-354 (2024)

Biomedical informatics

@inbook{Mashkova2024,
  abstract = {{Ontology embeddings map classes, relations, and individuals in ontologies into \(\mathbb {R}^n\) , and within \(\mathbb {R}^n\) similarity between entities can be computed or new axioms inferred. For ontologies in the Description Logic \(\mathcal{E}\mathcal{L}^{++}\) , several embedding methods have been developed that explicitly generate models of an ontology. However, these methods suffer from some limitations; they do not distinguish between statements that are unprovable and provably false, and therefore they may use entailed statements as negatives. Furthermore, they do not utilize the deductive closure of an ontology to identify statements that are inferred but not asserted. We evaluated a set of embedding methods for \(\mathcal{E}\mathcal{L}^{++}\) ontologies based on high-dimensional ball representation of concept descriptions, incorporating several modifications that aim to make use of the ontology deductive closure. In particular, we designed novel negative losses that account both for the deductive closure and different types of negatives. We demonstrate that our embedding methods improve over the baseline ontology embedding in the task of knowledge base or ontology completion.}},
  author = {Mashkova,  Olga and Zhapa-Camacho,  Fernando and Hoehndorf,  Robert},
  booktitle = {Neural-Symbolic Learning and Reasoning},
  doi = {10.1007/978-3-031-71167-1_18},
  isbn = {9783031711671},
  issn = {1611-3349},
  pages = {331--354},
  publisher = {Springer Nature Switzerland},
  title = {Enhancing Geometric Ontology Embeddings for $\mathcal{E}\mathcal{L}^{++}$ with Negative Sampling and Deductive Closure Filtering},
  url = {http://dx.doi.org/10.1007/978-3-031-71167-1_18},
  year = {2024}
}

Lattice-Preserving mathcal ALC Ontology Embeddings

Zhapa-Camacho, Fernando and Hoehndorf, Robert

Neural-Symbolic Learning and Reasoning, pp. 355-369 (2024)

Biomedical informatics

@inbook{ZhapaCamacho2024,
  abstract = {{Generating vector representations (embeddings) of OWL ontologies is a growing task due to its applications in predicting missing facts and knowledge-enhanced learning in fields such as bioinformatics. The underlying semantics of OWL ontologies is expressed using Description Logics (DLs). Initial approaches to generate embeddings relied on constructing a graph out of ontologies, neglecting the semantics of the logic therein. Recent semantic-preserving embedding methods often target lightweight DL languages like \(\mathcal{E}\mathcal{L}^{++}\) , ignoring more expressive information in ontologies. Although some approaches aim to embed more descriptive DLs like \(\mathcal {ALC}\) , those methods require the existence of individuals, while many real-world ontologies are devoid of them. We propose an ontology embedding method for the \(\mathcal {ALC}\) DL language that considers the lattice structure of concept descriptions. We use connections between DL and Category Theory to materialize the lattice structure and embed it using an order-preserving embedding method. We show that our method outperforms state-of-the-art methods in several knowledge base completion tasks. We make our code and data available at https://github.com/bio-ontology-research-group/catE .}},
  author = {Zhapa-Camacho,  Fernando and Hoehndorf,  Robert},
  booktitle = {Neural-Symbolic Learning and Reasoning},
  doi = {10.1007/978-3-031-71167-1_19},
  isbn = {9783031711671},
  issn = {1611-3349},
  pages = {355--369},
  publisher = {Springer Nature Switzerland},
  title = {Lattice-Preserving $\mathcal {ALC}$ Ontology Embeddings},
  url = {http://dx.doi.org/10.1007/978-3-031-71167-1_19},
  year = {2024}
}

The Impact of Mechanical Cues on the Metabolomic and Transcriptomic Profiles of Human Dermal Fibroblasts Cultured in Ultrashort Self-Assembling Peptide 3D Scaffolds

Sherin Abdelrahman, Rui Ge, Hepi H. Susapto, Yang Liu, Faris Samkari, Manola Moretti, Xinzhi Liu, Robert Hoehndorf, Abdul-Hamid Emwas, Mariusz Jaremko, Ranim H. Rawas and Charlotte A. E. Hauser

ACS Nano, vol. 17(15), pp. 14508-14531 (2023)

BioengineeringGenomics

@article{Abdelrahman2023,
  abstract = {{Cells' interactions with their microenvironment influence their morphological features and regulate crucial cellular functions including proliferation, differentiation, metabolism, and gene expression. Most biological data available are based on in vitro two-dimensional (2D) cellular models, which fail to recapitulate the three-dimensional (3D) in vivo systems. This can be attributed to the lack of cell-matrix interaction and the limitless access to nutrients and oxygen, in contrast to in vivo systems. Despite the emergence of a plethora of 3D matrices to address this challenge, there are few reports offering a proper characterization of these matrices or studying how the cell-matrix interaction influences cellular metabolism in correlation with gene expression. In this study, two tetrameric ultrashort self-assembling peptide sequences, FFIK and FIIK, were used to create in vitro 3D models using well-described human dermal fibroblast cells. The peptide sequences are derived from naturally occurring amino acids that are capable of self-assembling into stable hydrogels without UV or chemical cross-linking. Our results showed that 2D cultured fibroblasts exhibited distinct metabolic and transcriptomic profiles compared to 3D cultured cells. The observed changes in the metabolomic and transcriptomic profiles were closely interconnected and influenced several important metabolic pathways including the TCA cycle, glycolysis, MAPK signaling cascades, and hemostasis. Data provided here may lead to clearer insights into the influence of the surrounding microenvironment on human dermal fibroblast metabolic patterns and molecular mechanisms, underscoring the importance of utilizing efficient 3D in vitro models to study such complex mechanisms.}},
  author = {Sherin Abdelrahman and Rui Ge and Hepi H. Susapto and Yang Liu and Faris Samkari and Manola Moretti and Xinzhi Liu and Robert Hoehndorf and Abdul-Hamid Emwas and Mariusz Jaremko and Ranim H. Rawas and Charlotte A. E. Hauser$^*$},
  doi = {10.1021/acsnano.3c01176},
  journal = {{ACS} Nano},
  month = {July},
  number = {15},
  pages = {14508--14531},
  publisher = {American Chemical Society ({ACS})},
  title = {The Impact of Mechanical Cues on the Metabolomic and Transcriptomic Profiles of Human Dermal Fibroblasts Cultured in Ultrashort Self-Assembling Peptide 3D Scaffolds},
  url = {https://doi.org/10.1021/acsnano.3c01176},
  volume = {17},
  year = {2023}
}

Improving the classification of cardinality phenotypes using collections

Sarah M. Alghamdi and Robert Hoehndorf

Journal of Biomedical Semantics, vol. 14(1) (2023)

Applied Ontology

@article{Alghamdi2023,
  abstract = {{We reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.}},
  author = {Sarah M. Alghamdi and Robert Hoehndorf$^*$},
  doi = {10.1186/s13326-023-00290-y},
  journal = {Journal of Biomedical Semantics},
  month = {August},
  number = {1},
  publisher = {Springer Science and Business Media {LLC}},
  title = {Improving the classification of cardinality phenotypes using collections},
  url = {https://doi.org/10.1186/s13326-023-00290-y},
  volume = {14},
  year = {2023}
}

Genomic landscape in Saudi patients with hepatocellular carcinoma using whole-genome sequencing: a pilot study

Mazen Hassanain, Yang Liu, Weam Hussain, Albandri Binowayn, Duna Barakeh, Ebtehal Alsolme, Faisal AlSaif, Ghaida Almasaad, Mohammed AlSwayyed, Maram Alaqel, Rana Aljunidel, Sherin Abdelrahman, Charlotte A. E. Hauser, Saleh Alqahtani, Robert Hoehndorf and Malak Abedalthagafi

Frontiers in Gastroenterology, vol. 2 (2023)

GenomicsBiomedical informatics

@article{Hassanain2023,
  abstract = {{Our findings indicate that most of the HCC patients possess cancer-related genetic variants, and the altered pathways in these patients exhibit similarities. Notably, resistant patients exhibit a higher frequency of aberrations in sorafenib-related genes than do sensitive patients. Specifically, 4 out of 10 resistant individuals demonstrated 13 somatic mutations, whereas none of the three sensitive patients exhibited any. Similarly, 7 out of 10 resistant patients possessed 30 germline mutations, while none were observed in the sensitive group (two-sided Fisher's exact test; somatic: p=0.50, germline: 0.07). These results contribute to our understanding of the genetic landscape of HCC and highlight potential therapeutic targets that could aid in overcoming treatment resistance.}},
  author = {Mazen Hassanain and Yang Liu and Weam Hussain and Albandri Binowayn and Duna Barakeh and Ebtehal Alsolme and Faisal AlSaif and Ghaida Almasaad and Mohammed AlSwayyed and Maram Alaqel and Rana Aljunidel and Sherin Abdelrahman and Charlotte A. E. Hauser and Saleh Alqahtani and Robert Hoehndorf and Malak Abedalthagafi$^*$},
  doi = {10.3389/fgstr.2023.1205415},
  journal = {Frontiers in Gastroenterology},
  month = {August},
  publisher = {Frontiers Media {SA}},
  title = {Genomic landscape in Saudi patients with hepatocellular carcinoma using whole-genome sequencing: a pilot study},
  url = {https://doi.org/10.3389/fgstr.2023.1205415},
  volume = {2},
  year = {2023}
}

Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes

Șenay Kafkas, Marwa Abdelhakim, Mahmut Uludag, Azza Althagafi, Malak Alghamdi and Robert Hoehndorf

BMC Bioinformatics, vol. 24(1) (2023)

Rare diseaseBiomedical informatics

@article{Kafkas2023,
  abstract = {{Abstract
Background
Identifying variants associated with diseases is a challenging task in medical genetics research. Current studies that prioritize variants within individual genomes generally rely on known variants, evidence from literature and genomes, and patient symptoms and clinical signs. The functionalities of the existing tools, which rank variants based on given patient symptoms and clinical signs, are restricted to the coverage of ontologies such as the Human Phenotype Ontology (HPO). However, most clinicians do not limit themselves to HPO while describing patient symptoms/signs and their associated variants/genes. There is thus a need for an automated tool that can prioritize variants based on freely expressed patient symptoms and clinical signs.

Results
STARVar is a Symptom-based Tool for Automatic Ranking of Variants using evidence from literature and genomes. STARVar uses patient symptoms and clinical signs, either linked to HPO or expressed in free text format. It returns a ranked list of variants based on a combined score from two classifiers utilizing evidence from genomics and literature. STARVar improves over related tools on a set of synthetic patients. In addition, we demonstrated its distinct contribution to the domain on another synthetic dataset covering publicly available clinical genotype–phenotype associations by using symptoms and clinical signs expressed in free text format.

Conclusions
STARVar stands as a unique and efficient tool that has the advantage of ranking variants with flexibly expressed patient symptoms in free-form text. Therefore, STARVar can be easily integrated into bioinformatics workflows designed to analyze disease-associated genomes.

Availability
STARVar is freely available from https://github.com/bio-ontology-research-group/STARVar.}},
  author = {Șenay Kafkas and Marwa Abdelhakim and Mahmut Uludag and Azza Althagafi and Malak Alghamdi and Robert Hoehndorf$^*$},
  doi = {10.1186/s12859-023-05406-w},
  journal = {{BMC} Bioinformatics},
  month = {July},
  number = {1},
  publisher = {Springer Science and Business Media {LLC}},
  title = {Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes},
  url = {https://doi.org/10.1186/s12859-023-05406-w},
  volume = {24},
  year = {2023}
}

Klarigi: Characteristic explanations for semantic biomedical data

Luke T. Slater, John A. Williams, Paul N. Schofield, Sophie Russell, Samantha C. Pendleton, Andreas Karwath, Hilary Fanning, Simon Ball, Robert Hoehndorf and Georgios V. Gkoutos

Computers in Biology and Medicine, pp. 106425 (2023)

Ontology engineering

Annotation of biomedical entities with ontology classes provides for formal semantic analysis and mobilisation of background knowledge in determining their relationships. To date, enrichment analysis has been routinely employed to identify classes that are over-represented in annotations across sets of groups, such as biosample gene expression profiles or patient phenotypes, and is useful for a range of tasks including differential diagnosis and causative variant prioritisation. These approaches, however, usually consider only univariate relationships, make limited use of the semantic features of ontologies, and provide limited information and evaluation of the explanatory power of both singular and grouped candidate classes. Moreover, they are not designed to solve the problem of deriving cohesive, characteristic, and discriminatory sets of classes for entity groups. We have developed a new tool, called Klarigi, which introduces multiple scoring heuristics for identification of classes that are both compositional and discriminatory for groups of entities annotated with ontology classes. The tool includes a novel algorithm for derivation of multivariable semantic explanations for entity groups, makes use of semantic inference through live use of an ontology reasoner, and includes a classification method for identifying the discriminatory power of candidate sets, in addition to significance testing apposite to traditional enrichment approaches. We describe the design and implementation of Klarigi, including its scoring and explanation determination methods, and evaluate its use in application to two test cases with clinical significance, comparing and contrasting methods and results with literature-based and enrichment analysis methods. We demonstrate that Klarigi produces characteristic and discriminatory explanations for groups of biomedical entities in two settings. We also show that these explanations recapitulate and extend the knowledge held in existing biomedical databases and literature for several diseases. We conclude that Klarigi provides a distinct and valuable perspective on biomedical datasets when compared with traditional enrichment methods, and therefore constitutes a new method by which biomedical datasets can be explored, contributing to improved insight into semantic data.

@article{Slater2022,
  abstract = {{Annotation of biomedical entities with ontology classes provides for formal semantic analysis and mobilisation of background knowledge in determining their relationships. To date, enrichment analysis has been routinely employed to identify classes that are over-represented in annotations across sets of groups, such as biosample gene expression profiles or patient phenotypes, and is useful for a range of tasks including differential diagnosis and causative variant prioritisation. These approaches, however, usually consider only univariate relationships, make limited use of the semantic features of ontologies, and provide limited information and evaluation of the explanatory power of both singular and grouped candidate classes. Moreover, they are not designed to solve the problem of deriving cohesive, characteristic, and discriminatory sets of classes for entity groups. We have developed a new tool, called Klarigi, which introduces multiple scoring heuristics for identification of classes that are both compositional and discriminatory for groups of entities annotated with ontology classes. The tool includes a novel algorithm for derivation of multivariable semantic explanations for entity groups, makes use of semantic inference through live use of an ontology reasoner, and includes a classification method for identifying the discriminatory power of candidate sets, in addition to significance testing apposite to traditional enrichment approaches. We describe the design and implementation of Klarigi, including its scoring and explanation determination methods, and evaluate its use in application to two test cases with clinical significance, comparing and contrasting methods and results with literature-based and enrichment analysis methods. We demonstrate that Klarigi produces characteristic and discriminatory explanations for groups of biomedical entities in two settings. We also show that these explanations recapitulate and extend the knowledge held in existing biomedical databases and literature for several diseases. We conclude that Klarigi provides a distinct and valuable perspective on biomedical datasets when compared with traditional enrichment methods, and therefore constitutes a new method by which biomedical datasets can be explored, contributing to improved insight into semantic data.}},
  author = {Luke T. Slater$^*$ and John A. Williams and Paul N. Schofield and Sophie Russell and Samantha C. Pendleton and Andreas Karwath and Hilary Fanning and Simon Ball and Robert Hoehndorf and Georgios V. Gkoutos},
  doi = {10.1016/j.compbiomed.2022.106425},
  journal = {Computers in Biology and Medicine},
  month = {December},
  pages = {106425},
  publisher = {Elsevier {BV}},
  title = {Klarigi: Characteristic explanations for semantic biomedical data},
  url = {https://doi.org/10.1016/j.compbiomed.2022.106425},
  year = {2023}
}

mOWL: Python library for machine learning with biomedical ontologies

Fernando Zhapa-Camacho, Maxat Kulmanov and Robert Hoehndorf

Bioinformatics, vol. 39(1) (2023)

Neuro-symbolic AI

@article{ZhapaCamacho2022,
  abstract = {{Supplementary data are available at Bioinformatics online.}},
  author = {Fernando Zhapa-Camacho and Maxat Kulmanov and Robert Hoehndorf},
  doi = {10.1093/bioinformatics/btac811},
  journal = {Bioinformatics},
  month = {December},
  number = {1},
  publisher = {Oxford University Press ({OUP})},
  title = {{mOWL}: Python library for machine learning with biomedical ontologies},
  url = {https://doi.org/10.1093/bioinformatics/btac811},
  volume = {39},
  year = {2023}
}

Exploring the Use of Ontology Components for Distantly-Supervised Disease and Phenotype Named Entity Recognition

Sumyyah Toonsi and Senay Kafkas and Robert Hoehndorf

Proceedings of the International Conference on Biomedical Ontologies 2023 together with the Workshop on Ontologies for Infectious and Immune-Mediated Disease Data Science (OIIDDS 2023) and the FAIR Ontology Harmonization and TRUST Data Interoperability Workshop (FOHTI 2023), Bras\'\ilia, Brazil, August 28 - September 1, 2023, vol. 3603, pp. 13-24, In: Fernanda Farinelli and Amanda Damasceno de Souza and Eduardo Ribeiro Felipe (Ed.) (2023)

Applied Ontology

@inproceedings{DBLP:conf/icbo/ToonsiKH23,
  abstract = {{The lack of curated corpora is one of the major obstacles for Named Entity Recognition (NER). With the advancements in deep learning and development of robust language models, distant supervision utilizing weakly labelled data is often used to alleviate this problem. Previous approaches utilized weakly labeled corpora from Wikipedia or from the literature. However, to the best of our knowledge, none of them explored the use of the different ontology components for disease/phenotype NER under the distant supervision scheme. In this study, we explored whether different ontology components can be used to develop a distantly supervised disease/phenotype entity recognition model. We trained different models by considering ontology labels, synonyms, definitions, axioms and their combinations in addition to a model trained on literature. Results showed that content from the disease/phenotype ontologies can be exploited to develop a NER model performing at the state-of-the-art level. In particular, models that utilised both the ontology definitions and axioms showed competitive performance compared to the model trained on literature. This relieves the need of finding and annotating external corpora. Furthermore, models trained using ontology components made zero-shot predictions on the test datasets which were not observed by the models training on the literature based datasets.}},
  author = {Sumyyah Toonsi and
Senay Kafkas and
Robert Hoehndorf},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/conf/icbo/ToonsiKH23.bib},
  booktitle = {Proceedings of the International Conference on Biomedical Ontologies
2023 together with the Workshop on Ontologies for Infectious and Immune-Mediated
Disease Data Science {(OIIDDS} 2023) and the {FAIR} Ontology Harmonization
and {TRUST} Data Interoperability Workshop {(FOHTI} 2023), Bras{\'{\i}}lia,
Brazil, August 28 - September 1, 2023},
  editor = {Fernanda Farinelli and
Amanda Damasceno de Souza and
Eduardo Ribeiro Felipe},
  pages = {13--24},
  publisher = {CEUR-WS.org},
  series = {{CEUR} Workshop Proceedings},
  timestamp = {Sat, 13 Jan 2024 01:49:11 +0100},
  title = {Exploring the Use of Ontology Components for Distantly-Supervised
Disease and Phenotype Named Entity Recognition},
  url = {https://ceur-ws.org/Vol-3603/Paper2.pdf},
  volume = {3603},
  year = {2023}
}

From Axioms over Graphs to Vectors, and Back Again: Evaluating the Properties of Graph-based Ontology Embeddings

Fernando Zhapa-Camacho and Robert Hoehndorf

Proceedings of the 17th International Workshop on Neural-Symbolic Learning and Reasoning, La Certosa di Pontignano, Siena, Italy, July 3-5, 2023, vol. 3432, pp. 85-102, In: Artur S. d'Avila Garcez and Tarek R. Besold and Marco Gori and Ernesto Jiménez-Ruiz (Ed.) (2023)

Neuro-symbolic AI

@inproceedings{DBLP:conf/nesy/Zhapa-CamachoH23,
  abstract = {{Several approaches have been developed that generate embeddings for Description Logic ontologies and use these embeddings in machine learning. One approach of generating ontologies embeddings is by first embedding the ontologies into a graph structure, i.e., introducing a set of nodes and edges for named entities and logical axioms, and then applying a graph embedding to embed the graph in R𝑛 . Methods that embed ontologies in graphs (graph projections) have different formal properties related to the type of axioms they can utilize, whether the projections are invertible or not, and whether they can be applied to asserted axioms or their deductive closure. We analyze, qualitatively and quantitatively, several graph projection methods that have been used to embed ontologies, and we demonstrate the effect of the properties of graph projections on the performance of predicting axioms from ontology embeddings. We find that there are substantial differences between different projection methods, and both the projection of axioms into nodes and edges as well ontological choices in representing knowledge will impact the success of using ontology embeddings to predict axioms.}},
  author = {Fernando Zhapa{-}Camacho and
Robert Hoehndorf},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/conf/nesy/Zhapa-CamachoH23.bib},
  booktitle = {Proceedings of the 17th International Workshop on Neural-Symbolic
Learning and Reasoning, La Certosa di Pontignano, Siena, Italy, July
3-5, 2023},
  editor = {Artur S. d'Avila Garcez and
Tarek R. Besold and
Marco Gori and
Ernesto Jim{\'{e}}nez{-}Ruiz},
  pages = {85--102},
  publisher = {CEUR-WS.org},
  series = {{CEUR} Workshop Proceedings},
  timestamp = {Tue, 11 Jul 2023 17:14:10 +0200},
  title = {From Axioms over Graphs to Vectors, and Back Again: Evaluating the
Properties of Graph-based Ontology Embeddings},
  url = {https://ceur-ws.org/Vol-3432/paper7.pdf},
  volume = {3432},
  year = {2023}
}

Evaluating Different Methods for Semantic Reasoning Over Ontologies

Fernando Zhapa-Camacho and Robert Hoehndorf

Joint Proceedings of Scholarly QALD 2023 and SemREC 2023 co-located with 22nd International Semantic Web Conference ISWC 2023, Athens, Greece, November 6-10, 2023, vol. 3592, In: Debayan Banerjee and Ricardo Usbeck and Nandana Mihindukulasooriya and Gunjan Singh and Raghava Mutharaju and Pavan Kapanipathi (Ed.) (2023)

Ontology engineering

@inproceedings{DBLP:conf/semweb/Zhapa-CamachoH23,
  abstract = {{Reasoning over knowledge bases such as Semantic Web ontologies enables the discovery of new facts from existing knowledge. Knowledge-enhanced machine learning has motivated the development of neuro-symbolic reasoners, which enable faster but approximate computation of new facts or entailments. Neuro-symbolic methods generate vector representations (embeddings) of entities in a knowledge base. We analyze some ontology embedding methods, by implementing them as neuro-symbolic reasoners and evaluating their predictive performance on the datasets and tasks provided by the Semantic Reasoning Evaluation Challenge 2023. We explore two types of embedding methods: graph-based and modeltheoretic. Regarding graph-based embeddings, we evaluated the impact of different combinations of graph representation of ontologies with knowledge graph embedding methods. For model-theoretic embeddings, which create models for theories, we evaluate the impact of using several models, enabling approximate semantic entailment.}},
  author = {Fernando Zhapa{-}Camacho and
Robert Hoehndorf},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/conf/semweb/Zhapa-CamachoH23.bib},
  booktitle = {Joint Proceedings of Scholarly {QALD} 2023 and SemREC 2023 co-located
with 22nd International Semantic Web Conference {ISWC} 2023, Athens,
Greece, November 6-10, 2023},
  editor = {Debayan Banerjee and
Ricardo Usbeck and
Nandana Mihindukulasooriya and
Gunjan Singh and
Raghava Mutharaju and
Pavan Kapanipathi},
  publisher = {CEUR-WS.org},
  series = {{CEUR} Workshop Proceedings},
  timestamp = {Tue, 02 Jan 2024 17:44:44 +0100},
  title = {Evaluating Different Methods for Semantic Reasoning Over Ontologies},
  url = {https://ceur-ws.org/Vol-3592/paper9.pdf},
  volume = {3592},
  year = {2023}
}

Updating the CEMO ontology for future epidemiological challenges

Núria Queralt-Rosinach and Paul N. Schofield and Marco Roos and Robert Hoehndorf

14th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences (SWAT4HCLS 2023), Basel, Switzerland, February 13-16, 2023, vol. 3415, pp. 151-152, In: Atsuko Yamaguchi and Andrea Splendiani and M. Scott Marshall and Chris Baker and Jerven T. Bolleman and Albert Burger and Leyla Jael Castro and Ole Eigenbrod and Sabine Österle and Martin Romacker and Andra Waagmeester (Ed.) (2023)

Applied Ontology

@inproceedings{DBLP:conf/swat4ls/Queralt-Rosinach23,
  abstract = {{The COVID-19 epidemiology and monitoring ontology (CEMO) is an OWL ontology built during the COVID-19 pandemic for better exchange, integration and reuse of epidemiological information. Here, we present an update of the development of the ontology and future directions in order to make it usable under different scenarios and new challenges.}},
  author = {N{\'{u}}ria Queralt{-}Rosinach and
Paul N. Schofield and
Marco Roos and
Robert Hoehndorf},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/conf/swat4ls/Queralt-Rosinach23.bib},
  booktitle = {14th International Conference on Semantic Web Applications and Tools
for Health Care and Life Sciences {(SWAT4HCLS} 2023), Basel, Switzerland,
February 13-16, 2023},
  editor = {Atsuko Yamaguchi and
Andrea Splendiani and
M. Scott Marshall and
Chris Baker and
Jerven T. Bolleman and
Albert Burger and
Leyla Jael Castro and
Ole Eigenbrod and
Sabine {\"{O}}sterle and
Martin Romacker and
Andra Waagmeester},
  pages = {151--152},
  publisher = {CEUR-WS.org},
  series = {{CEUR} Workshop Proceedings},
  timestamp = {Mon, 28 Aug 2023 17:23:07 +0200},
  title = {Updating the {CEMO} ontology for future epidemiological challenges},
  url = {https://ceur-ws.org/Vol-3415/paper-34.pdf},
  volume = {3415},
  year = {2023}
}

Neural Multi-hop Logical Query Answering with Concept-Level Answers

Tang, Zhenwei, Pei, Shichao, Peng, Xi, Zhuang, Fuzhen, Zhang, Xiangliang and Hoehndorf, Robert

The Semantic Web – ISWC 2023, pp. 522-540, In: Payne, Terry R., Presutti, Valentina, Qi, Guilin, Poveda-Villalón, María, Stoilos, Giorgos, Hollink, Laura, Kaoudi, Zoi, Cheng, Gong and Li, Juanzi (Eds.) (2023)

Neuro-symbolic AI

@inproceedings{tang_neural_2023,
  abstract = {Neural multi-hop logical query answering ({LQA}) is a fundamental task to explore relational data such as knowledge graphs, which aims at answering multi-hop queries with logical operations based on distributed representations of queries and answers. Although previous {LQA} methods can give specific instance-level answers, they are not able to provide descriptive concept-level answers, where each concept is a description of a set of instances. Concept-level answers are more comprehensible to users and are of great usefulness in the field of applied ontology. In this work, we formulate the problem of {LQA} with concept-level answers ({LQAC}), solving which needs to address challenges in incorporating, representing, and operating on concepts. We propose an original solution for {LQAC}. Firstly, we incorporate description logic-based ontological axioms to provide the source of concepts. Then, we represent concepts and queries as fuzzy sets, i.e., sets whose elements have degrees of membership, to bridge concepts and queries with instances. Moreover, we design operators involving concepts on top of fuzzy set representation of concepts and queries for optimization and inference. Extensive experimental results on three real-world datasets demonstrate the effectiveness of our method for {LQAC}. In particular, we show that our method is promising in discovering complex logical biomedical facts.},
  author = {Tang, Zhenwei and Pei, Shichao and Peng, Xi and Zhuang, Fuzhen and Zhang, Xiangliang and Hoehndorf$^*$, Robert},
  booktitle = {The Semantic Web – {ISWC} 2023},
  date = {2023},
  editor = {Payne, Terry R. and Presutti, Valentina and Qi, Guilin and Poveda-Villalón, María and Stoilos, Giorgos and Hollink, Laura and Kaoudi, Zoi and Cheng, Gong and Li, Juanzi},
  isbn = {978-3-031-47240-4},
  location = {Cham},
  pages = {522--540},
  publisher = {Springer Nature Switzerland},
  title = {Neural Multi-hop Logical Query Answering with Concept-Level Answers},
  year = {2023}
}

Contribution of model organism phenotypes to the computational identification of human disease genes

Sarah Alghamdi, Paul N. Schofield and Robert Hoehndorf

Disease Models & Mechanisms, vol. 15(7) (2022)

Rare diseasePhenotype informaticsSemantic similarity

@article{Alghamdi2022,
  abstract = {{Computing phenotypic similarity helps identify new disease genes and diagnose rare diseases. Genotype-phenotype data from orthologous genes in model organisms can compensate for lack of human data and increase genome coverage. In the past decade, cross-species phenotype comparisons have proven valuble, and several ontologies have been developed for this purpose. The relative contribution of different model organisms to computational identification of disease-associated genes is not fully explored. We used phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in model organisms to disease-associated phenotypes in humans. Semantic machine learning methods were used to measure the contribution of different model organisms to the identification of known human gene-disease associations. We found that mouse genotype-phenotype data provided the most important dataset in the identification of human disease genes by semantic similarity and machine learning over phenotype ontologies. Other model organisms' data did not improve identification over that obtained using the mouse alone, and therefore did not contribute significantly to this task. Our work impacts on the development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation. This article has an associated First Person interview with the first author of the paper.}},
  author = {Sarah Alghamdi and Paul N. Schofield and Robert Hoehndorf$^*$},
  doi = {10.1242/dmm.049441},
  journal = {Disease Models \& Mechanisms},
  month = {July},
  number = {7},
  publisher = {The Company of Biologists},
  title = {Contribution of model organism phenotypes to the computational identification of human disease genes},
  url = {https://doi.org/10.1242/dmm.049441},
  volume = {15},
  year = {2022}
}

Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications

Mona Alshahrani, Abdullah Almansour, Asma Alkhaldi, Maha A. Thafar, Mahmut Uludag, Magbubah Essack and Robert Hoehndorf

PeerJ, vol. 10, pp. e13061 (2022)

Drug mechanismsNeuro-symbolic AI

@article{Alshahrani2022,
  abstract = {{Biomedical knowledge is represented in structured databases and published in biomedical literature, and different computational approaches have been developed to exploit each type of information in predictive models. However, the information in structured databases and literature is often complementary. We developed a machine learning method that combines information from literature and databases to predict drug targets and indications. To effectively utilize information in published literature, we integrate knowledge graphs and published literature using named entity recognition and normalization before applying a machine learning model that utilizes the combination of graph and literature. We then use supervised machine learning to show the effects of combining features from biomedical knowledge and published literature on the prediction of drug targets and drug indications. We demonstrate that our approach using datasets for drug-target interactions and drug indications is scalable to large graphs and can be used to improve the ranking of targets and indications by exploiting features from either structure or unstructured information alone.}},
  author = {Mona Alshahrani and Abdullah Almansour and Asma Alkhaldi and Maha A. Thafar and Mahmut Uludag and Magbubah Essack and Robert Hoehndorf$^*$},
  doi = {10.7717/peerj.13061},
  journal = {{PeerJ}},
  month = {April},
  pages = {e13061},
  publisher = {{PeerJ}},
  title = {Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications},
  url = {https://doi.org/10.7717/peerj.13061},
  volume = {10},
  year = {2022}
}

DeepSVP: integration of genotype and phenotype for structural variant prioritization using deep learning

Azza Althagafi, Lamia Alsubaie, Nagarajan Kathiresan, Katsuhiko Mineta, Taghrid Aloraini, Fuad Al Mutairi, Majid Alfadhel, Takashi Gojobori, Ahmad Alfares and Robert Hoehndorf

Bioinformatics, vol. 38(6), pp. 1677-1684 (2022)

Rare diseaseGenomics

@article{Althagafi2021,
  abstract = {{Supplementary data are available at Bioinformatics online.}},
  addendum = {IF: 6.94},
  author = {Azza Althagafi and Lamia Alsubaie and Nagarajan Kathiresan and Katsuhiko Mineta and Taghrid Aloraini and Fuad Al Mutairi and Majid Alfadhel and Takashi Gojobori and Ahmad Alfares and Robert Hoehndorf$^*$},
  doi = {10.1093/bioinformatics/btab859},
  issue = {6},
  journal = {Bioinformatics},
  month = {December},
  pages = {1677--1684},
  publisher = {Oxford University Press ({OUP})},
  title = {{DeepSVP}: integration of genotype and phenotype for structural variant prioritization using deep learning},
  url = {https://doi.org/10.1093/bioinformatics/btab859},
  volume = {38},
  year = {2022}
}

A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology

Yongqun He, Hong Yu, Anthony Huffman, Asiyah Yu Lin, Darren A. Natale, John Beverley, Ling Zheng, Yehoshua Perl, Zhigang Wang, Yingtong Liu, Edison Ong, Yang Wang, Philip Huang, Long Tran, Jinyang Du, Zalan Shah, Easheta Shah, Roshan Desai, Hsin-hui Huang, Yujia Tian, Eric Merrell, William D. Duncan, Sivaram Arabandi, Lynn M. Schriml, Jie Zheng, Anna Maria Masci, Liwei Wang, Hongfang Liu, Fatima Zohra Smaili, Robert Hoehndorf, Zoë May Pendlington, Paola Roncaglia, Xianwei Ye, Jiangan Xie, Yi-Wei Tang, Xiaolin Yang, Suyuan Peng, Luxia Zhang, Luonan Chen, Junguk Hur, Gilbert S. Omenn, Brian Athey and Barry Smith

Journal of Biomedical Semantics, vol. 13(1) (2022)

Applied Ontology

Abstract Background The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020. Results As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment. Conclusion CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications.

@article{He2022,
  abstract = {{Abstract
Background
The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020.

Results
As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment.

Conclusion
CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications.}},
  author = {Yongqun He$^*$ and Hong Yu$^*$ and Anthony Huffman and Asiyah Yu Lin and Darren A. Natale and John Beverley and Ling Zheng and Yehoshua Perl and Zhigang Wang and Yingtong Liu and Edison Ong and Yang Wang and Philip Huang and Long Tran and Jinyang Du and Zalan Shah and Easheta Shah and Roshan Desai and Hsin-hui Huang and Yujia Tian and Eric Merrell and William D. Duncan and Sivaram Arabandi and Lynn M. Schriml and Jie Zheng and Anna Maria Masci and Liwei Wang and Hongfang Liu and Fatima Zohra Smaili and Robert Hoehndorf and Zoë May Pendlington and Paola Roncaglia and Xianwei Ye and Jiangan Xie and Yi-Wei Tang and Xiaolin Yang and Suyuan Peng and Luxia Zhang and Luonan Chen and Junguk Hur and Gilbert S. Omenn and Brian Athey and Barry Smith},
  doi = {10.1186/s13326-022-00279-z},
  journal = {Journal of Biomedical Semantics},
  month = {October},
  number = {1},
  publisher = {Springer Science and Business Media {LLC}},
  title = {A comprehensive update on {CIDO}: the community-based coronavirus infectious disease ontology},
  url = {https://doi.org/10.1186/s13326-022-00279-z},
  volume = {13},
  year = {2022}
}

DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms

Maxat Kulmanov and Robert Hoehndorf

Bioinformatics, vol. 38(Supplement_1), pp. i238-i245 (2022)

Protein functionNeuro-symbolic AI

@article{Kulmanov2022,
  abstract = {{Supplementary data are available at Bioinformatics online.}},
  author = {Maxat Kulmanov and Robert Hoehndorf$^*$},
  doi = {10.1093/bioinformatics/btac256},
  journal = {Bioinformatics},
  month = {June},
  number = {Supplement{\_}1},
  pages = {i238--i245},
  publisher = {Oxford University Press ({OUP})},
  title = {{DeepGOZero}: improving protein function prediction from sequence and zero-shot learning based on ontology axioms},
  url = {https://doi.org/10.1093/bioinformatics/btac256},
  volume = {38},
  year = {2022}
}

Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

Luke T. Slater, Sophie Russell, Silver Makepeace, Alexander Carberry, Andreas Karwath, John A. Williams, Hilary Fanning, Simon Ball, Robert Hoehndorf and Georgios V. Gkoutos

BMC Medical Informatics and Decision Making, vol. 22(1) (2022)

Semantic similarity

@article{Slater2022b,
  abstract = {{Abstract

Background
Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area.


Methods
We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III).


Results
300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures.


Conclusion
We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.}},
  author = {Luke T. Slater and Sophie Russell and Silver Makepeace and Alexander Carberry and Andreas Karwath and John A. Williams and Hilary Fanning and Simon Ball and Robert Hoehndorf and Georgios V. Gkoutos$^*$},
  doi = {10.1186/s12911-022-01770-4},
  journal = {{BMC} Medical Informatics and Decision Making},
  month = {February},
  number = {1},
  publisher = {Springer Science and Business Media {LLC}},
  title = {Evaluating semantic similarity methods for comparison of text-derived phenotype profiles},
  url = {https://doi.org/10.1186/s12911-022-01770-4},
  volume = {22},
  year = {2022}
}

Using SPARQL to Unify Queries over Data, Ontologies, and Machine Learning Models in the PhenomeBrowser Knowledgebase

Ali Syed, Şenay Kafkas, Maxat Kulmanov and Robert Hoehndorf

Proceedings of the 13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, SWAT4HCLS 2022 (2022)

Ontology engineeringApplied Ontology

@inproceedings{Syed2022UsingST,
  author = {Ali Syed and Şenay Kafkas and Maxat Kulmanov and Robert Hoehndorf$^*$},
  booktitle = {Proceedings of the 13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, SWAT4HCLS 2022},
  title = {Using SPARQL to Unify Queries over Data, Ontologies, and Machine Learning Models in the PhenomeBrowser Knowledgebase},
  year = {2022}
}

Positive-Unlabeled Learning with Adversarial Data Augmentation for Knowledge Graph Completion

Zhenwei Tang, Shichao Pei, Zhao Zhang, Yongchun Zhu, Fuzhen Zhuang, Robert Hoehndorf and Xiangliang Zhang

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (2022)

Neuro-symbolic AI

@inproceedings{Tang2022,
  abstract = {{Most real-world knowledge graphs (KG) are far from complete and comprehensive. This problem has motivated efforts in predicting the most plausible missing facts to complete a given KG, i.e., knowledge graph completion (KGC). However, existing KGC methods suffer from two main issues, 1) the false negative issue, i.e., the sampled negative training instances may include potential true facts; and 2) the data sparsity issue, i.e., true facts account for only a tiny part of all possible facts. To this end, we propose positive-unlabeled learning with adversarial data augmentation (PUDA) for KGC. In particular, PUDA tailors positive-unlabeled risk estimator for the KGC task to deal with the false negative issue. Furthermore, to address the data sparsity issue, PUDA achieves a data augmentation strategy by unifying adversarial training and positive-unlabeled learning under the positive-unlabeled minimax game. Extensive experimental results on real-world benchmark datasets demonstrate the effectiveness and compatibility of our proposed method.}},
  author = {Zhenwei Tang and Shichao Pei and Zhao Zhang and Yongchun Zhu and Fuzhen Zhuang and Robert Hoehndorf and Xiangliang Zhang$^*$},
  booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence},
  doi = {10.24963/ijcai.2022/312},
  month = {July},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  title = {Positive-Unlabeled Learning with Adversarial Data Augmentation for Knowledge Graph Completion},
  url = {https://doi.org/10.24963/ijcai.2022/312},
  year = {2022}
}

DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes

Liu-Wei, Wang, Kafkas, Şenay, Chen, Jun, Dimonaco, Nicholas J, Tegnér, Jesper and Hoehndorf, Robert

Bioinformatics, vol. 37(17), pp. 2722-2729 (2021)

Drug mechanismsBiomedical informatics

@article{10.1093/bioinformatics/btab147,
  abstract = {{Infectious diseases caused by novel viruses have become a major public health concern. Rapid identification of virus–host interactions can reveal mechanistic insights into infectious diseases and shed light on potential treatments. Current computational prediction methods for novel viruses are based mainly on protein sequences. However, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e., signs and symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts.We developed DeepViral, a deep learning based method that predicts protein–protein interactions (PPI) between humans and viruses. Motivated by the potential utility of infectious disease phenotypes, we first embedded human proteins and viruses in a shared space using their associated phenotypes and functions, supported by formalized background knowledge from biomedical ontologies. By jointly learning from protein sequences and phenotype features, DeepViral significantly improves over existing sequence-based methods for intra- and inter-species PPI prediction.Code and datasets for reproduction and customization are available at https://github.com/bio-ontology-research-group/DeepViral. Prediction results for 14 virus families are available at https://doi.org/10.5281/zenodo.4429824.}},
  addendum = {IF: 6.94},
  author = {Liu-Wei, Wang and Kafkas, Şenay and Chen, Jun and Dimonaco, Nicholas J and Tegnér, Jesper and Hoehndorf$^*$, Robert},
  doi = {10.1093/bioinformatics/btab147},
  issn = {1367-4803},
  issue = {17},
  journal = {Bioinformatics},
  month = {03},
  pages = {2722-2729},
  title = {{DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes}},
  url = {https://doi.org/10.1093/bioinformatics/btab147},
  volume = {37},
  year = {2021}
}

Predicting candidate genes from phenotypes, functions and anatomical site of expression

Jun Chen, Azza Althagafi and Robert Hoehndorf

Bioinformatics, vol. 37(6), pp. 853-860 (2021)

Rare diseaseBiomedical informatics

@article{Chen2020,
  abstract = {{Supplementary data are available at Bioinformatics online.}},
  author = {Jun Chen and Azza Althagafi and Robert Hoehndorf$^*$},
  doi = {10.1093/bioinformatics/btaa879},
  journal = {Bioinformatics},
  month = {October},
  number = {6},
  pages = {853--860},
  publisher = {Oxford University Press ({OUP})},
  title = {Predicting candidate genes from phenotypes,  functions and anatomical site of expression},
  url = {https://doi.org/10.1093/bioinformatics/btaa879},
  volume = {37},
  year = {2021}
}

DTI-Voodoo: machine learning over interaction networks and ontology-based background knowledge predicts drugtextendashtarget interactions

Tilman Hinnerichs and Robert Hoehndorf

Bioinformatics, vol. 37(24), pp. 4835-4843 (2021)

Biomedical informatics

@article{Hinnerichs2021,
  abstract = {{Supplementary data are available at Bioinformatics online.}},
  author = {Tilman Hinnerichs and Robert Hoehndorf},
  doi = {10.1093/bioinformatics/btab548},
  journal = {Bioinformatics},
  month = {July},
  number = {24},
  pages = {4835--4843},
  publisher = {Oxford University Press ({OUP})},
  title = {{DTI}-Voodoo: machine learning over interaction networks and ontology-based background knowledge predicts drug{\textendash}target interactions},
  url = {https://doi.org/10.1093/bioinformatics/btab548},
  volume = {37},
  year = {2021}
}

DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web

Maxat Kulmanov, Fernando Zhapa-Camacho and Robert Hoehndorf

Nucleic Acids Research, vol. 49(W1), pp. W140-W146 (2021)

Protein functionOntology engineering

@article{Kulmanov2021,
  abstract = {{Understanding the functions of proteins is crucial to understand biological processes on a molecular level. Many more protein sequences are available than can be investigated experimentally. DeepGOPlus is a protein function prediction method based on deep learning and sequence similarity. DeepGOWeb makes the prediction model available through a website, an API, and through the SPARQL query language for interoperability with databases that rely on Semantic Web technologies. DeepGOWeb provides accurate and fast predictions and ensures that predicted functions are consistent with the Gene Ontology; it can provide predictions for any protein and any function in Gene Ontology. DeepGOWeb is freely available at https://deepgo.cbrc.kaust.edu.sa/.}},
  addendum = {IF: 16.97},
  author = {Maxat Kulmanov and Fernando Zhapa-Camacho and Robert Hoehndorf$^*$},
  doi = {10.1093/nar/gkab373},
  journal = {Nucleic Acids Research},
  month = {May},
  number = {W1},
  pages = {W140--W146},
  publisher = {Oxford University Press ({OUP})},
  title = {{DeepGOWeb}: fast and accurate protein function prediction on the (Semantic) Web},
  url = {https://doi.org/10.1093/nar/gkab373},
  volume = {49},
  year = {2021}
}

Multi-faceted semantic clustering with text-derived phenotypes

Luke T. Slater, John A. Williams, Andreas Karwath, Hilary Fanning, Simon Ball, Paul N. Schofield, Robert Hoehndorf and Georgios V. Gkoutos

Computers in Biology and Medicine, vol. 138, pp. 104904 (2021)

Biomedical informaticsPhenotype informatics

@article{Slater2021,
  abstract = {{Identification of ontology concepts in clinical narrative text enables the creation of phenotype profiles that can be associated with clinical entities, such as patients or drugs. Constructing patient phenotype profiles using formal ontologies enables their analysis via semantic similarity, in turn enabling the use of background knowledge in clustering or classification analyses. However, traditional semantic similarity approaches collapse complex relationships between patient phenotypes into a unitary similarity scores for each pair of patients. Moreover, single scores may be based only on matching terms with the greatest information content (IC), ignoring other dimensions of patient similarity. This process necessarily leads to a loss of information in the resulting representation of patient similarity, and is especially apparent when using very large text-derived and highly multi-morbid phenotype profiles. Moreover, it renders finding a biological explanation for similarity very difficult; the black box problem. In this article, we explore the generation of multiple semantic similarity scores for patients based on different facets of their phenotypic manifestation, which we define through different sub-graphs in the Human Phenotype Ontology. We further present a new methodology for deriving sets of qualitative class descriptions for groups of entities described by ontology terms. Leveraging this strategy to obtain meaningful explanations for our semantic clusters alongside other evaluation techniques, we show that semantic clustering with ontology-derived facets enables the representation, and thus identification of, clinically relevant phenotype relationships not easily recoverable using overall clustering alone. In this way, we demonstrate the potential of faceted semantic clustering for gaining a deeper and more nuanced understanding of text-derived patient phenotypes.}},
  addendum = {IF: 4.589},
  author = {Luke T. Slater$^*$ and John A. Williams and Andreas Karwath and Hilary Fanning and Simon Ball and Paul N. Schofield and Robert Hoehndorf and Georgios V. Gkoutos},
  doi = {10.1016/j.compbiomed.2021.104904},
  journal = {Computers in Biology and Medicine},
  month = {November},
  pages = {104904},
  publisher = {Elsevier {BV}},
  title = {Multi-faceted semantic clustering with text-derived phenotypes},
  url = {https://doi.org/10.1016/j.compbiomed.2021.104904},
  volume = {138},
  year = {2021}
}

Towards similarity-based differential diagnostics for common diseases

Luke T. Slater, Andreas Karwath, John A. Williams, Sophie Russell, Silver Makepeace, Alexander Carberry, Robert Hoehndorf and Georgios V. Gkoutos

Computers in Biology and Medicine, vol. 133, pp. 104360 (2021)

Biomedical informatics

@article{Slater2021b,
  abstract = {{Ontology-based phenotype profiles have been utilised for the purpose of differential diagnosis of rare genetic diseases, and for decision support in specific disease domains. Particularly, semantic similarity facilitates diagnostic hypothesis generation through comparison with disease phenotype profiles. However, the approach has not been applied for differential diagnosis of common diseases, or generalised clinical diagnostics from uncurated text-derived phenotypes. In this work, we describe the development of an approach for deriving patient phenotype profiles from clinical narrative text, and apply this to text associated with MIMIC-III patient visits. We then explore the use of semantic similarity with those text-derived phenotypes to classify primary patient diagnosis, comparing the use of patient-patient similarity and patient-disease similarity using phenotype-disease profiles previously mined from literature. We also consider a combined approach, in which literature-derived phenotypes are extended with the content of text-derived phenotypes we mined from 500 patients. The results reveal a powerful approach, showing that in one setting, uncurated text phenotypes can be used for differential diagnosis of common diseases, making use of information both inside and outside the setting. While the methods themselves should be explored for further optimisation, they could be applied to a variety of clinical tasks, such as differential diagnosis, cohort discovery, document and text classification, and outcome prediction.}},
  addendum = {IF: 4.59},
  author = {Luke T. Slater$^*$ and Andreas Karwath and John A. Williams and Sophie Russell and Silver Makepeace and Alexander Carberry and Robert Hoehndorf and Georgios V. Gkoutos},
  doi = {10.1016/j.compbiomed.2021.104360},
  journal = {Computers in Biology and Medicine},
  month = {June},
  pages = {104360},
  publisher = {Elsevier {BV}},
  title = {Towards similarity-based differential diagnostics for common diseases},
  url = {https://doi.org/10.1016/j.compbiomed.2021.104360},
  volume = {133},
  year = {2021}
}

Improved characterisation of clinical text through ontology-based vocabulary expansion

Luke T. Slater, William Bradlow, Simon Ball, Robert Hoehndorf and Georgios V Gkoutos

Journal of Biomedical Semantics, vol. 12(1) (2021)

Biomedical informaticsPhenotype informatics

@article{Slater2021expansion,
  abstract = {{Ontology-based phenotype profiles have been utilised for the purpose of differential diagnosis of rare genetic diseases, and for decision support in specific disease domains. Particularly, semantic similarity facilitates diagnostic hypothesis generation through comparison with disease phenotype profiles. However, the approach has not been applied for differential diagnosis of common diseases, or generalised clinical diagnostics from uncurated text-derived phenotypes. In this work, we describe the development of an approach for deriving patient phenotype profiles from clinical narrative text, and apply this to text associated with MIMIC-III patient visits. We then explore the use of semantic similarity with those text-derived phenotypes to classify primary patient diagnosis, comparing the use of patient-patient similarity and patient-disease similarity using phenotype-disease profiles previously mined from literature. We also consider a combined approach, in which literature-derived phenotypes are extended with the content of text-derived phenotypes we mined from 500 patients. The results reveal a powerful approach, showing that in one setting, uncurated text phenotypes can be used for differential diagnosis of common diseases, making use of information both inside and outside the setting. While the methods themselves should be explored for further optimisation, they could be applied to a variety of clinical tasks, such as differential diagnosis, cohort discovery, document and text classification, and outcome prediction.}},
  author = {Luke T. Slater and William Bradlow and Simon Ball and Robert Hoehndorf and Georgios V Gkoutos},
  doi = {10.1186/s13326-021-00241-5},
  journal = {Journal of Biomedical Semantics},
  month = {April},
  number = {1},
  publisher = {Springer Science and Business Media {LLC}},
  title = {Improved characterisation of clinical text through ontology-based vocabulary expansion},
  url = {https://doi.org/10.1186/s13326-021-00241-5},
  volume = {12},
  year = {2021}
}

A fast, accurate, and generalisable heuristic-based negation detection algorithm for clinical text

Luke T. Slater, William Bradlow, Dino FA. Motti, Robert Hoehndorf, Simon Ball and Georgios V. Gkoutos

Computers in Biology and Medicine, vol. 130, pp. 104216 (2021)

Biomedical informatics

@article{Slater2021c,
  abstract = {{Semantic similarity is a useful approach for comparing patient phenotypes, and holds the potential of an effective method for exploiting text-derived phenotypes for differential diagnosis, text and document classification, and outcome prediction. While approaches for context disambiguation are commonly used in text mining applications, forming a standard component of information extraction pipelines, their effects on semantic similarity calculations have not been widely explored. In this work, we evaluate how inclusion and disclusion of negated and uncertain mentions of concepts from text-derived phenotypes affects similarity of patients, and the use of those profiles to predict diagnosis. We report on the effectiveness of these approaches and report a very small, yet significant, improvement in performance when classifying primary diagnosis over MIMIC-III patient visits.}},
  addendum = {IF: 3.43},
  author = {Luke T. Slater$^*$ and William Bradlow and Dino FA. Motti and Robert Hoehndorf and Simon Ball and Georgios V. Gkoutos},
  doi = {10.1016/j.compbiomed.2021.104216},
  journal = {Computers in Biology and Medicine},
  month = {January},
  pages = {104216},
  publisher = {Elsevier {BV}},
  title = {A fast,  accurate,  and generalisable heuristic-based negation detection algorithm for clinical text},
  url = {https://doi.org/10.1016/j.compbiomed.2021.104216},
  volume = {130},
  year = {2021}
}

Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile Similarity

Luke T. Slater, Andreas Karwath, Robert Hoehndorf and Georgios V. Gkoutos

Frontiers in Digital Health, vol. 3 (2021)

Biomedical informaticsSemantic similarity

@article{Slater2021uncertainty,
  abstract = {{Semantic similarity is a useful approach for comparing patient phenotypes, and holds the potential of an effective method for exploiting text-derived phenotypes for differential diagnosis, text and document classification, and outcome prediction. While approaches for context disambiguation are commonly used in text mining applications, forming a standard component of information extraction pipelines, their effects on semantic similarity calculations have not been widely explored. In this work, we evaluate how inclusion and disclusion of negated and uncertain mentions of concepts from text-derived phenotypes affects similarity of patients, and the use of those profiles to predict diagnosis. We report on the effectiveness of these approaches and report a very small, yet significant, improvement in performance when classifying primary diagnosis over MIMIC-III patient visits.}},
  author = {Luke T. Slater and Andreas Karwath and Robert Hoehndorf and Georgios V. Gkoutos$^*$},
  doi = {10.3389/fdgth.2021.781227},
  journal = {Frontiers in Digital Health},
  month = {December},
  publisher = {Frontiers Media {SA}},
  title = {Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile Similarity},
  url = {https://doi.org/10.3389/fdgth.2021.781227},
  volume = {3},
  year = {2021}
}

Semantic similarity and machine learning with ontologies

Kulmanov, Maxat, Smaili, Fatima Zohra, Gao, Xin and Hoehndorf, Robert

Briefings in Bioinformatics (2020)

Semantic similarityApplied OntologyNeuro-symbolic AI

@article{10.1093/bib/bbaa199,
  abstract = {{Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.}},
  addendum = {IF: 11.62},
  author = {Kulmanov, Maxat and Smaili, Fatima Zohra and Gao, Xin and Hoehndorf$^*$, Robert},
  doi = {10.1093/bib/bbaa199},
  issn = {1477-4054},
  journal = {Briefings in Bioinformatics},
  month = {10},
  title = {{Semantic similarity and machine learning with ontologies}},
  url = {https://doi.org/10.1093/bib/bbaa199},
  year = {2020}
}

DeepGOPlus: improved protein function prediction from sequence

Kulmanov, Maxat and Hoehndorf, Robert

Bioinformatics, vol. 36(2), pp. 422-429 (2020)

Protein functionNeuro-symbolic AI

@article{10.1093/bioinformatics/btz595,
  abstract = {Protein function prediction is one of the major tasks of bioinformatics that can help in wide range of biological problems such as understanding disease mechanisms or finding drug targets. Many methods are available for predicting protein functions from sequence based features, protein–protein interaction networks, protein structure or literature. However, other than sequence, most of the features are difficult to obtain or not available for many proteins thereby limiting their scope. Furthermore, the performance of sequence-based function prediction methods is often lower than methods that incorporate multiple features and predicting protein functions may require a lot of time.We developed a novel method for predicting protein functions from sequence alone which combines deep convolutional neural network (CNN) model with sequence similarity based predictions. Our CNN model scans the sequence for motifs which are predictive for protein functions and combines this with functions of similar proteins (if available). We evaluate the performance of DeepGOPlus using the CAFA3 evaluation measures and achieve an Fmax of 0.390, 0.557 and 0.614 for BPO, MFO and CCO evaluations, respectively. These results would have made DeepGOPlus one of the three best predictors in CCO and the second best performing method in the BPO and MFO evaluations. We also compare DeepGOPlus with state-of-the-art methods such as DeepText2GO and GOLabeler on another dataset. DeepGOPlus can annotate around 40 protein sequences per second on common hardware, thereby making fast and accurate function predictions available for a wide range of proteins.http://deepgoplus.bio2vec.net/.Supplementary data are available at Bioinformatics online.},
  addendum = {IF: 6.94},
  author = {Kulmanov, Maxat and Hoehndorf, Robert$^*$},
  issn = {1367-4803},
  journal = {Bioinformatics},
  number = {2},
  pages = {422--429},
  title = {DeepGOPlus: improved protein function prediction from sequence},
  url = {https://doi.org/10.1093/bioinformatics/btz595},
  volume = {36},
  year = {2020}
}

DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier

Kulmanov, Maxat AND Hoehndorf

PLOS Computational Biology, vol. 16(11), pp. 1-22 (2020)

Rare diseasePhenotype informaticsNeuro-symbolic AI

@article{10.1371/journal.pcbi.1008453,
  abstract = {Author summary Gene–phenotype associations can help to understand the underlying mechanisms of many genetic diseases. However, experimental identification, often involving animal models, is time consuming and expensive. Computational methods that predict gene–phenotype associations can be used instead. We developed DeepPheno, a novel approach for predicting the phenotypes resulting from a loss of function of a single gene. We use gene functions and gene expression as information to prediction phenotypes. Our method uses a neural network classifier that is able to account for hierarchical dependencies between phenotypes. We extensively evaluate our method and compare it with related approaches, and we show that DeepPheno results in better performance in several evaluations. Furthermore, we found that many of the new predictions made by our method have been added to phenotype association databases released one year later. Overall, DeepPheno simulates some aspects of human physiology and how molecular and physiological alterations lead to abnormal phenotypes.},
  addendum = {IF: 4.43},
  author = {Kulmanov, Maxat AND Hoehndorf$^*$, Robert},
  doi = {10.1371/journal.pcbi.1008453},
  journal = {PLOS Computational Biology},
  month = {11},
  number = {11},
  pages = {1-22},
  publisher = {Public Library of Science},
  title = {DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier},
  url = {https://doi.org/10.1371/journal.pcbi.1008453},
  volume = {16},
  year = {2020}
}

Formal axioms in biomedical ontologies improve analysis and interpretation of associated data

Smaili, Fatima Z., Gao, Xin and Hoehndorf, Robert

Bioinformatics, vol. 36(7), pp. 2229-2236 (2020)

Applied OntologyOntology engineering

@article{10754/631015,
  abstract = {Motivation: There are now over 500 ontologies in the life sciences. Over the past years, significant resources have been invested into formalizing these biomedical ontologies. Formal axioms in ontologies have been developed and used to detect and ensure ontology consistency, find unsatisfiable classes, improve interoperability, guide ontology extension through the application of axiom-based design patterns, and encode domain background knowledge. At the same time, ontologies have extended their amount of human-readable information such as labels and definitions as well as other meta-data. As a consequence, biomedical ontologies now form large formalized domain knowledge bases and have a potential to improve ontology-based data analysis by providing background knowledge and relations between biological entities that are not otherwise connected. Results: We evaluate the contribution of formal axioms and ontology meta-data to the ontology-based prediction of protein-protein interactions and gene-disease associations. We find that the formal axioms that have been created for the Gene Ontology and several other ontologies significantly improve ontology- based prediction models through provision of domain-specific background knowledge. Furthermore, we find that the labels, synonyms and definitions in ontologies can also provide background knowledge that may be exploited for prediction. The axioms and meta-data of different ontologies contribute in varying degrees to improving data analysis. Our results have major implications on the further development of formal knowledge bases and ontologies in the life sciences, in particular as machine learning methods are more frequently being applied. Our findings clearly motivate the need for further development, and the systematic, application-driven evaluation and improvement, of formal axioms in ontologies},
  addendum = {IF: 6.94},
  author = {Smaili, Fatima Z. and Gao$^*$, Xin and Hoehndorf$^*$, Robert},
  journal = {Bioinformatics},
  number = {7},
  pages = {2229--2236},
  publisher = {Oxford University Press},
  title = {Formal axioms in biomedical ontologies improve analysis and interpretation of associated data},
  volume = {36},
  year = {2020}
}

DDIEM: drug database for inborn errors of metabolism

Marwa Abdelhakim, Eunice McMurray, Ali Raza Syed, Senay Kafkas, Allan Anthony Kamau, Paul N Schofield and Robert Hoehndorf

Orphanet Journal of Rare Diseases, vol. 15(1) (2020)

Drug mechanismsRare disease

@article{Abdelhakim2020,
  abstract = {{Abstract

Background
Inborn errors of metabolism (IEM) represent a subclass of rare inherited diseases caused by a wide range of defects in metabolic enzymes or their regulation. Of over a thousand characterized IEMs, only about half are understood at the molecular level, and overall the development of treatment and management strategies has proved challenging. An overview of the changing landscape of therapeutic approaches is helpful in assessing strategic patterns in the approach to therapy, but the information is scattered throughout the literature and public data resources.


Results
We gathered data on therapeutic strategies for 300 diseases into the Drug Database for Inborn Errors of Metabolism (DDIEM). Therapeutic approaches, including both successful and ineffective treatments, were manually classified by their mechanisms of action using a new ontology.


Conclusions

We present a manually curated, ontologically formalized knowledgebase of drugs, therapeutic procedures, and mitigated phenotypes. DDIEM is freely available through a web interface and for download at
http://ddiem.phenomebrowser.net
.}},
  addendum = {IF: 4.03},
  author = {Marwa Abdelhakim and Eunice McMurray and Ali Raza Syed and Senay Kafkas and Allan Anthony Kamau and Paul N Schofield and Robert Hoehndorf$^*$},
  doi = {10.1186/s13023-020-01428-2},
  journal = {Orphanet Journal of Rare Diseases},
  month = {June},
  number = {1},
  publisher = {Springer Science and Business Media {LLC}},
  title = {{DDIEM}: drug database for inborn errors of metabolism},
  url = {https://doi.org/10.1186/s13023-020-01428-2},
  volume = {15},
  year = {2020}
}

What is the right sequencing approach? Solo VS extended family analysis in consanguineous populations

Ahmed Alfares, Lamia Alsubaie, Taghrid Aloraini, Aljoharah Alaskar, Azza Althagafi, Ahmed Alahmad, Mamoon Rashid, Abdulrahman Alswaid, Ali Alothaim, Wafaa Eyaid, Faroug Ababneh, Mohammed Albalwi, Raniah Alotaibi, Mashael Almutairi, Nouf Altharawi, Alhanouf Alsamer, Marwa Abdelhakim, Senay Kafkas, Katsuhiko Mineta, Nicole Cheung, Abdallah M. Abdallah, Stine Büchmann-Møller, Yoshinori Fukasawa, Xiang Zhao, Issaac Rajan, Robert Hoehndorf, Fuad Al Mutairi, Takashi Gojobori and Majid Alfadhel

BMC Medical Genomics, vol. 13(1) (2020)

GenomicsRare disease

@article{Alfares2020,
  abstract = {{There was no difference in the hit rate between solo and extended family members. Trio-based analysis was a better approach than sibship testing, even in a consanguineous population. Finally, each additional family member helped to narrow down the number of variants by 50-75%. Our findings could help clinicians, researchers and testing laboratories select the most cost-effective and appropriate sequencing approach for their patients. Furthermore, using extended family analysis is a very useful tool for complex cases with novel genes.}},
  addendum = {IF: 3.19},
  author = {Ahmed Alfares and Lamia Alsubaie and Taghrid Aloraini and Aljoharah Alaskar and Azza Althagafi and Ahmed Alahmad and Mamoon Rashid and Abdulrahman Alswaid and Ali Alothaim and Wafaa Eyaid and Faroug Ababneh and Mohammed Albalwi and Raniah Alotaibi and Mashael Almutairi and Nouf Altharawi and Alhanouf Alsamer and Marwa Abdelhakim and Senay Kafkas and Katsuhiko Mineta and Nicole Cheung and Abdallah M. Abdallah and Stine B\"{u}chmann-M{\o}ller and Yoshinori Fukasawa and Xiang Zhao and Issaac Rajan and Robert Hoehndorf and Fuad Al Mutairi and Takashi Gojobori and Majid Alfadhel$^*$},
  doi = {10.1186/s12920-020-00743-8},
  journal = {{BMC} Medical Genomics},
  month = {July},
  number = {1},
  publisher = {Springer Science and Business Media {LLC}},
  title = {What is the right sequencing approach? Solo {VS} extended family analysis in consanguineous populations},
  url = {https://doi.org/10.1186/s12920-020-00743-8},
  volume = {13},
  year = {2020}
}

Combining lexical and context features for automatic ontology extension

Sara Althubaiti, Senay Kafkas, Marwa Abdelhakim and Robert Hoehndorf

Journal of Biomedical Semantics, vol. 11, pp. 1 (2020)

Ontology engineeringBiomedical informatics

@article{ontoextend,
  addendum = {IF: 1.99},
  author = {Sara Althubaiti and Senay Kafkas and Marwa Abdelhakim and Robert Hoehndorf$^*$},
  journal = {Journal of Biomedical Semantics},
  optannote = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  pages = {1},
  title = {Combining lexical and context features for automatic ontology extension},
  volume = {11},
  year = {2020}
}

Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies

Luke T. Slater, Georgios V. Gkoutos and Robert Hoehndorf

BMC Medical Informatics and Decision Making, vol. 20(S10) (2020)

Ontology engineeringApplied Ontology

Abstract Background Ontologies are widely used throughout the biomedical domain. These ontologies formally represent the classes and relations assumed to exist within a domain. As scientific domains are deeply interlinked, so too are their representations. While individual ontologies can be tested for consistency and coherency using automated reasoning methods, systematically combining ontologies of multiple domains together may reveal previously hidden contradictions. Methods We developed a method that tests for hidden unsatisfiabilities in an ontology that arise when combined with other ontologies. For this purpose, we combined sets of ontologies and use automated reasoning to determine whether unsatisfiable classes are present. In addition, we designed and implemented a novel algorithm that can determine justifications for contradictions across extremely large and complicated ontologies, and use these justifications to semi-automatically repair ontologies by identifying a small set of axioms that, when removed, result in a consistent and coherent set of ontologies. Results We tested the mutual consistency of the OBO Foundry and the OBO ontologies and find that the combined OBO Foundry gives rise to at least 636 unsatisfiable classes, while the OBO ontologies give rise to more than 300,000 unsatisfiable classes. We also applied our semi-automatic repair algorithm to each combination of OBO ontologies that resulted in unsatisfiable classes, finding that only 117 axioms could be removed to account for all cases of unsatisfiability across all OBO ontologies. Conclusions We identified a large set of hidden unsatisfiability across a broad range of biomedical ontologies, and we find that this large set of unsatisfiable classes is the result of a relatively small amount of axiomatic disagreements. Our results show that hidden unsatisfiability is a serious problem in ontology interoperability; however, our results also provide a way towards more consistent ontologies by addressing the issues we identified.

@article{Slater2020,
  abstract = {{Abstract

Background
Ontologies are widely used throughout the biomedical domain. These ontologies formally represent the classes and relations assumed to exist within a domain. As scientific domains are deeply interlinked, so too are their representations. While individual ontologies can be tested for consistency and coherency using automated reasoning methods, systematically combining ontologies of multiple domains together may reveal previously hidden contradictions.


Methods
We developed a method that tests for hidden unsatisfiabilities in an ontology that arise when combined with other ontologies. For this purpose, we combined sets of ontologies and use automated reasoning to determine whether unsatisfiable classes are present. In addition, we designed and implemented a novel algorithm that can determine justifications for contradictions across extremely large and complicated ontologies, and use these justifications to semi-automatically repair ontologies by identifying a small set of axioms that, when removed, result in a consistent and coherent set of ontologies.


Results
We tested the mutual consistency of the OBO Foundry and the OBO ontologies and find that the combined OBO Foundry gives rise to at least 636 unsatisfiable classes, while the OBO ontologies give rise to more than 300,000 unsatisfiable classes. We also applied our semi-automatic repair algorithm to each combination of OBO ontologies that resulted in unsatisfiable classes, finding that only 117 axioms could be removed to account for all cases of unsatisfiability across all OBO ontologies.


Conclusions
We identified a large set of hidden unsatisfiability across a broad range of biomedical ontologies, and we find that this large set of unsatisfiable classes is the result of a relatively small amount of axiomatic disagreements. Our results show that hidden unsatisfiability is a serious problem in ontology interoperability; however, our results also provide a way towards more consistent ontologies by addressing the issues we identified.}},
  addendum = {IF: 2.75},
  author = {Luke T. Slater$^*$ and Georgios V. Gkoutos and Robert Hoehndorf},
  doi = {10.1186/s12911-020-01336-2},
  journal = {{BMC} Medical Informatics and Decision Making},
  month = {December},
  number = {S10},
  publisher = {Springer Science and Business Media {LLC}},
  title = {Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies},
  url = {https://doi.org/10.1186/s12911-020-01336-2},
  volume = {20},
  year = {2020}
}

EMC10 homozygous variant identified in a family with global developmental delay, mild intellectual disability, and speech delay

Muhammad Umair, Mariam Ballow, Abdulaziz Asiri, Yusra Alyafee, Abeer Tuwaijri, Kheloud M. Alhamoudi, Taghrid Aloraini, Marwa Abdelhakim, Azza Thamer Althagafi, Senay Kafkas, Lamia Alsubaie, Muhammad Talal Alrifai, Robert Hoehndorf, Ahmed Alfares and Majid Alfadhel

Clinical Genetics, vol. 98(6), pp. 555-561 (2020)

Rare diseaseGenomics

@article{Umair2020,
  abstract = {{In recent years, several genes have been implicated in the variable disease presentation of global developmental delay (GDD) and intellectual disability (ID). The endoplasmic reticulum membrane protein complex (EMC) family is known to be involved in GDD and ID. Homozygous variants of EMC1 are associated with GDD, scoliosis, and cerebellar atrophy, indicating the relevance of this pathway for neurogenetic disorders. EMC10 is a bone marrow-derived angiogenic growth factor that plays an important role in infarct vascularization and promoting tissue repair. However, this gene has not been previously associated with human disease. Herein, we describe a Saudi family with two individuals segregating a recessive neurodevelopmental disorder. Both of the affected individuals showed mild ID, speech delay, and GDD. Whole-exome sequencing (WES) and Sanger sequencing were performed to identify candidate genes. Further, to elucidate the functional effects of the variant, quantitative real-time PCR (RT-qPCR)-based expression analysis was performed. WES revealed a homozygous splice acceptor site variant (c.679-1G>A) in EMC10 (chromosome 19q13.33) that segregated perfectly within the family. RT-qPCR showed a substantial decrease in the relative EMC10 gene expression in the patients, indicating the pathogenicity of the identified variant. For the first time in the literature, the EMC10 gene variant was associated with mild ID, speech delay, and GDD. Thus, this gene plays a key role in developmental milestones, with the potential to cause neurodevelopmental disorders in humans.}},
  addendum = {IF: 4.10},
  author = {Muhammad Umair and Mariam Ballow and Abdulaziz Asiri and Yusra Alyafee and Abeer Tuwaijri and Kheloud M. Alhamoudi and Taghrid Aloraini and Marwa Abdelhakim and Azza Thamer Althagafi and Senay Kafkas and Lamia Alsubaie and Muhammad Talal Alrifai and Robert Hoehndorf and Ahmed Alfares and Majid Alfadhel$^*$},
  doi = {10.1111/cge.13842},
  journal = {Clinical Genetics},
  month = {September},
  number = {6},
  pages = {555--561},
  publisher = {Wiley},
  title = {{EMC}10

homozygous variant identified in a family with global developmental delay,  mild intellectual disability,  and speech delay},
  url = {https://doi.org/10.1111/cge.13842},
  volume = {98},
  year = {2020}
}

BioHackathon 2015: Semantics of data for life sciences and reproducible research

Rutger A. Vos, Toshiaki Katayama, Hiroyuki Mishima, Shin Kawano, Shuichi Kawashima, Jin-Dong Kim, Yuki Moriya, Toshiaki Tokimatsu, Atsuko Yamaguchi, Yasunori Yamamoto, Hongyan Wu, Peter Amstutz, Erick Antezana, Nobuyuki P. Aoki, Kazuharu Arakawa, Jerven T. Bolleman, Evan Bolton, Raoul J. P. Bonnal, Hidemasa Bono, Kees Burger, Hirokazu Chiba, Kevin B. Cohen, Eric W. Deutsch, Jesualdo T. Fernández-Breis, Gang Fu, Takatomo Fujisawa, Atsushi Fukushima, Alexander Garc'ia, Naohisa Goto, Tudor Groza, Colin Hercus, Robert Hoehndorf, Kotone Itaya, Nick Juty, Takeshi Kawashima, Jee-Hyub Kim, Akira R. Kinjo, Masaaki Kotera, Kouji Kozaki, Sadahiro Kumagai, Tatsuya Kushida, Thomas Lütteke, Masaaki Matsubara, Joe Miyamoto, Attayeb Mohsen, Hiroshi Mori, Yuki Naito, Takeru Nakazato, Jeremy Nguyen-Xuan, Kozo Nishida, Naoki Nishida, Hiroyo Nishide, Soichi Ogishima, Tazro Ohta, Shujiro Okuda, Benedict Paten, Jean-Luc Perret, Philip Prathipati, Pjotr Prins, Núria Queralt-Rosinach, Daisuke Shinmachi, Shinya Suzuki, Tsuyosi Tabata, Terue Takatsuki, Kieron Taylor, Mark Thompson, Ikuo Uchiyama, Bruno Vieira, Chih-Hsuan Wei, Mark Wilkinson, Issaku Yamada, Ryota Yamanaka, Kazutoshi Yoshitake, Akiyasu C. Yoshizawa, Michel Dumontier, Kenjiro Kosaki and Toshihisa Takagi

F1000Research, vol. 9, pp. 136 (2020)

Ontology engineeringBiomedical informatics

@article{Vos2020,
  abstract = {{We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.}},
  author = {Rutger A. Vos and Toshiaki Katayama and Hiroyuki Mishima and Shin Kawano and Shuichi Kawashima and Jin-Dong Kim and Yuki Moriya and Toshiaki Tokimatsu and Atsuko Yamaguchi and Yasunori Yamamoto and Hongyan Wu and Peter Amstutz and Erick Antezana and Nobuyuki P. Aoki and Kazuharu Arakawa and Jerven T. Bolleman and Evan Bolton and Raoul J. P. Bonnal and Hidemasa Bono and Kees Burger and Hirokazu Chiba and Kevin B. Cohen and Eric W. Deutsch and Jesualdo T. Fern{\'{a}}ndez-Breis and Gang Fu and Takatomo Fujisawa and Atsushi Fukushima and Alexander Garc{\'{\i}}a and Naohisa Goto and Tudor Groza and Colin Hercus and Robert Hoehndorf and Kotone Itaya and Nick Juty and Takeshi Kawashima and Jee-Hyub Kim and Akira R. Kinjo and Masaaki Kotera and Kouji Kozaki and Sadahiro Kumagai and Tatsuya Kushida and Thomas L\"{u}tteke and Masaaki Matsubara and Joe Miyamoto and Attayeb Mohsen and Hiroshi Mori and Yuki Naito and Takeru Nakazato and Jeremy Nguyen-Xuan and Kozo Nishida and Naoki Nishida and Hiroyo Nishide and Soichi Ogishima and Tazro Ohta and Shujiro Okuda and Benedict Paten and Jean-Luc Perret and Philip Prathipati and Pjotr Prins and N{\'{u}}ria Queralt-Rosinach and Daisuke Shinmachi and Shinya Suzuki and Tsuyosi Tabata and Terue Takatsuki and Kieron Taylor and Mark Thompson and Ikuo Uchiyama and Bruno Vieira and Chih-Hsuan Wei and Mark Wilkinson and Issaku Yamada and Ryota Yamanaka and Kazutoshi Yoshitake and Akiyasu C. Yoshizawa and Michel Dumontier and Kenjiro Kosaki and Toshihisa Takagi$^*$},
  doi = {10.12688/f1000research.18236.1},
  journal = {F1000Research},
  month = {February},
  pages = {136},
  publisher = {F1000 Research Ltd},
  title = {{BioHackathon} 2015: Semantics of data for life sciences and reproducible research},
  url = {https://doi.org/10.12688/f1000research.18236.1},
  volume = {9},
  year = {2020}
}

JOWO 2020: The Joint Ontology Workshops : Proceedings of the Joint Ontology Workshops co-located with the Bolzano Summer of Knowledge (BOSK 2020)

Unknown Author

no. 2708, In: Hammar, Karl, Kutz, Oliver, Dimou, Anastasia, Hahmann, Torsten, Hoehndorf, Robert, Masolo, Claudio and Vita, Randi (Eds.) (2020)

Applied OntologyOntology engineering

@proceedings{1479305,
  editor = {Hammar, Karl and Kutz, Oliver and Dimou, Anastasia and Hahmann, Torsten and Hoehndorf, Robert and Masolo, Claudio and Vita, Randi},
  institution = {School of Engineering, Jönköping University, JTH, Jönköping AI Lab (JAIL)},
  number = {2708},
  publisher = {CEUR-WS},
  series = {CEUR Workshop Proceedings},
  title = {JOWO 2020: The Joint Ontology Workshops : Proceedings of the Joint Ontology Workshops co-located with the Bolzano Summer of Knowledge (BOSK 2020)},
  url = {http://ceur-ws.org/Vol-2708/},
  year = {2020}
}

BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services

Katayama, T, Kawashima, S, Micklem, G, Kawano, S, Kim, JD, Kocbek, S, Okamoto, S, Wang, Y, Wu, H, Yamaguchi, A, Yamamoto, Y, Antezana, E, Aoki-Kinoshita, KF, Arakawa, K, Banno, M, Baran, J, Bolleman, JT, Bonnal, RJP, Bono, H, Fernández-Breis, JT, Buels, R, Campbell, MP, Chiba, H, Cock, PJA, Cohen, KB, Dumontier, M, Fujisawa, T, Fujiwara, T, Garcia, L, Gaudet, P, Hattori, E, Hoehndorf, R, Itaya, K, Ito, M, Jamieson, D, Jupp, S, Juty, N, Kalderimis, A, Kato, F, Kawaji, H, Kawashima, T, Kinjo, AR, Komiyama, Y, Kotera, M, Kushida, T, Malone, J, Matsubara, M, Mizuno, S, Mizutani, S, Mori, H, Moriya, Y, Murakami, K, Nakazato, T, Nishide, H, Nishimura, Y, Ogishima, S, Ohta, T, Okuda, S, Ono, H, Perez-Riverol, Y, Shinmachi, D, Splendiani, A, Strozzi, F, Suzuki, S, Takehara, J, Thompson, M, Tokimatsu, T, Uchiyama, I, Verspoor, K, Wilkinson, MD, Wimalaratne, S, Yamada, I, Yamamoto, N, Yarimizu, M, Kawamoto, S and Takagi, T

F1000Research, vol. 8(1677) (2019)

Ontology engineeringBiomedical informatics

@article{10.12688,
  author = { Katayama$^*$, T and Kawashima, S and Micklem, G and Kawano, S and Kim, JD and Kocbek, S and Okamoto, S and Wang, Y and Wu, H and Yamaguchi, A and Yamamoto, Y and Antezana, E and Aoki-Kinoshita, KF and Arakawa, K and Banno, M and Baran, J and Bolleman, JT and Bonnal, RJP and Bono, H and Fernández-Breis, JT and Buels, R and Campbell, MP and Chiba, H and Cock, PJA and Cohen, KB and Dumontier, M and Fujisawa, T and Fujiwara, T and Garcia, L and Gaudet, P and Hattori, E and Hoehndorf, R and Itaya, K and Ito, M and Jamieson, D and Jupp, S and Juty, N and Kalderimis, A and Kato, F and Kawaji, H and Kawashima, T and Kinjo, AR and Komiyama, Y and Kotera, M and Kushida, T and Malone, J and Matsubara, M and Mizuno, S and Mizutani, S and Mori, H and Moriya, Y and Murakami, K and Nakazato, T and Nishide, H and Nishimura, Y and Ogishima, S and Ohta, T and Okuda, S and Ono, H and Perez-Riverol, Y and Shinmachi, D and Splendiani, A and Strozzi, F and Suzuki, S and Takehara, J and Thompson, M and Tokimatsu, T and Uchiyama, I and Verspoor, K and Wilkinson, MD and Wimalaratne, S and Yamada, I and Yamamoto, N and Yarimizu, M and Kawamoto, S and Takagi$^*$, T},
  journal = {F1000Research},
  number = {1677},
  title = {BioHackathon series in 2013 and 2014: improvements of semantic interoperability in life science data and services},
  volume = {8},
  year = {2019}
}

A Machine Learning Based Approach for Similarity Search on Biodiversity Knowledge Graphs

Claus Weiland, Maxat Kulmanov, Marco Schmidt and Robert Hoehndorf

Biodiversity Information Science and Standards, vol. 3, pp. e37048 (2019)

Semantic similarityNeuro-symbolic AI

Mass biodiversity data from scientific collections will be provided by world-wide digitization efforts like iDigBio in the U.S and DiSSCo in Europe. This opens up an increasing amount of data on wild type organisms, which enables the building of large biodiversity knowledge graphs comprising, inter alia, sequence, trait and occurrence data. Knowledge graphs model information in the form of entities and their relationships expressed in good practice as ontology-based annotations. Based on ontological descriptions, semantic similarity analysis makes linking of wild type data to genomic and proteonomic data of model organisms possible and thus supports knowledge discovery of crop wild relatives and underutilized species of interest for medicine, breeding and agriculture. Since classical similarity measurements focus on recording differences between character states (aiming to describe disease phenotypes), but not the character states in the sense of trait variations itself, new methods for similarity search are required. Machine learning algorithms operate on feature vectors, which are numeric representations of data (images, class labels etc) in n-dimensional vector space. We established a machine learning based workflow for similarity search on biodiversity entities using feature learning on ontologies and an associated RDF knowledge graph to project structured trait data into vector space. Vectors are then compared applying a similarity function (e.g. cosine similarity) to determine similarity between taxa based on trait semantics. We will present an application example of machine learning on biodiversity knowledge graphs using a pipeline built upon OPA2Vec, a method to generate feature vectors from the logical content of ontologies (Smaili et al. 2018), to successfully cluster plant species for life form and ecotype (e.g. tree vs. perennial plant) on the basis of their annotations with the Flora Phenotype Ontology (Hoehndorf et al. 2016).

@article{10.3897/biss.3.37048,
  abstract = {Mass biodiversity data from scientific collections will be provided by world-wide digitization efforts like iDigBio in the U.S and DiSSCo in Europe. This opens up an increasing amount of data on wild type organisms, which enables the building of large biodiversity knowledge graphs comprising, inter alia, sequence, trait and occurrence data. Knowledge graphs model information in the form of entities and their relationships expressed in good practice as ontology-based annotations. Based on ontological descriptions, semantic similarity analysis makes linking of wild type data to genomic and proteonomic data of model organisms possible and thus supports knowledge discovery of crop wild relatives and underutilized species of interest for medicine, breeding and agriculture. Since classical similarity measurements focus on recording differences between character states (aiming to describe disease phenotypes), but not the character states in the sense of trait variations itself, new methods for similarity search are required. Machine learning algorithms operate on feature vectors, which are numeric representations of data (images, class labels etc) in n-dimensional vector space. We established a machine learning based workflow for similarity search on biodiversity entities using feature learning on ontologies and an associated RDF knowledge graph to project structured trait data into vector space. Vectors are then compared applying a similarity function (e.g. cosine similarity) to determine similarity between taxa based on trait semantics. We will present an application example of machine learning on biodiversity knowledge graphs using a pipeline built upon OPA2Vec, a method to generate feature vectors from the logical content of ontologies (Smaili et al. 2018), to successfully cluster plant species for life form and ecotype (e.g. tree vs. perennial plant) on the basis of their annotations with the Flora Phenotype Ontology (Hoehndorf et al. 2016).},
  author = {Claus Weiland$^*$ and Maxat Kulmanov and Marco Schmidt and Robert Hoehndorf$^*$},
  journal = {Biodiversity Information Science and Standards},
  number = {},
  pages = {e37048},
  publisher = {Pensoft Publishers},
  title = {A Machine Learning Based Approach for Similarity Search on Biodiversity Knowledge Graphs},
  volume = {3},
  year = {2019}
}

Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies

Sarah M. Alghamdi, Beth A. Sundberg, John P. Sundberg, Paul N. Schofield and Robert Hoehndorf

Scientific Reports, vol. 9, pp. 4025 (2019)

Applied OntologyOntology engineering

@article{aging-ontologies,
  abstract = {Data are increasingly annotated with multiple ontologies to capture rich information about the features of the subject under investigation. Analysis may be performed over each ontology separately, but recently there has been a move to combine multiple ontologies to provide more powerful analytical possibilities. However, it is often not clear how to combine ontologies or how to assess or evaluate the potential design patterns available. Here we use a large and well-characterized dataset of anatomic pathology descriptions from a major study of aging mice. We show how different design patterns based on the MPATH and MA ontologies provide orthogonal axes of analysis, and perform differently in over-representation and semantic similarity applications. We discuss how such a data-driven approach might be used generally to generate and evaluate ontology design patterns.},
  addendum = {IF: 4.00},
  author = {Sarah M. Alghamdi and Beth A. Sundberg and John P. Sundberg and Paul N. Schofield and Robert Hoehndorf$^*$},
  journal = {Scientific Reports},
  month = {March},
  optannote = {},
  optkey = {},
  optnote = {},
  pages = {4025},
  title = {Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies},
  volume = {9},
  year = {2019}
}

Ontology-based prediction of cancer driver genes

Althubaiti, Sara, Karwath, Andreas, Dallol, Ashraf, Noor, Adeeb, Alkhayyat, Shadi Salem, Alwassia, Rolina, Mineta, Katsuhiko, Gojobori, Takashi, Beggs, Andrew D, Schofield, Paul N, Gkoutos, Georgios V and Hoehndorf, Robert

Scientific Reports, vol. 9, pp. 17405 (2019)

Applied OntologyBiomedical informatics

@article{Althubaiti561480,
  abstract = {Identifying and distinguishing cancer driver genes among thousands of candidate mutations remains a major challenge. Accurate identification of driver genes and driver mutations is critical for advancing cancer research and personalizing treatment based on accurate stratification of patients. Due to inter-tumor genetic heterogeneity, many driver mutations within a gene occur at low frequencies, which make it challenging to distinguish them from non-driver mutations. We have developed a novel method for identifying cancer driver genes. Our approach utilizes multiple complementary types of information, specifically cellular phenotypes, cellular locations, functions, and whole body physiological phenotypes as features. We demonstrate that our method can accurately identify known cancer driver genes and distinguish between their role in different types of cancer. In addition to confirming known driver genes, we identify several novel candidate driver genes. We demonstrate the utility of our method by validating its predictions in nasopharyngeal cancer and colorectal cancer using whole exome and whole genome sequencing.},
  addendum = {IF: 4.00},
  author = {Althubaiti, Sara and Karwath, Andreas and Dallol, Ashraf and Noor, Adeeb and Alkhayyat, Shadi Salem and Alwassia, Rolina and Mineta, Katsuhiko and Gojobori, Takashi and Beggs, Andrew D and Schofield, Paul N and Gkoutos, Georgios V and Hoehndorf$^*$, Robert},
  journal = {Scientific Reports},
  pages = {17405},
  publisher = {Springer-Nature},
  title = {Ontology-based prediction of cancer driver genes},
  url = {https://www.biorxiv.org/content/early/2019/02/27/561480},
  volume = {9},
  year = {2019}
}

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Zhou, Naihui, Jiang, Yuxiang, Bergquist, Timothy R, Lee, Alexandra J, Kacsoh, Balint Z, Crocker, Alex W, Lewis, Kimberley A, Georghiou, George, Nguyen, Huy N, Hamid, Md Nafiz, Davis, Larry, Dogan, Tunca, Atalay, Volkan, Rifaioglu, Ahmet S, Dalkiran, Alperen, Cetin-Atalay, Rengul, Zhang, Chengxin, Hurto, Rebecca L, Freddolino, Peter L, Zhang, Yang, Bhat, Prajwal, Supek, Fran, Fernández, José M, Gemovic, Branislava, Perovic, Vladimir R, Davidović, Radoslav S, Sumonja, Neven, Veljkovic, Nevena, Asgari, Ehsaneddin, Mofrad, Mohammad RK, Profiti, Giuseppe, Savojardo, Castrense, Martelli, Pier Luigi, Casadio, Rita, Boecker, Florian, Kahanda, Indika, Thurlby, Natalie, McHardy, Alice C, Renaux, Alexandre, Saidi, Rabie, Gough, Julian, Freitas, Alex A, Antczak, Magdalena, Fabris, Fabio, Wass, Mark N, Hou, Jie, Cheng, Jianlin, Hou, Jie, Wang, Zheng, Romero, Alfonso E, Paccanaro, Alberto, Yang, Haixuan, Goldberg, Tatyana, Zhao, Chenguang, Holm, Liisa, Törönen, Petri, Medlar, Alan J, Zosa, Elaine, Borukhov, Itamar, Novikov, Ilya, Wilkins, Angela, Lichtarge, Olivier, Chi, Po-Han, Tseng, Wei-Cheng, Linial, Michal, Rose, Peter W, Dessimoz, Christophe, Vidulin, Vedrana, Dzeroski, Saso, Sillitoe, Ian, Das, Sayoni, Lees, Jonathan Gill, Jones, David T, Wan, Cen, Cozzetto, Domenico, Fa, Rui, Torres, Mateo, Vesztrocy, Alex Wiarwick, Rodriguez, Jose Manuel, Tress, Michael L, Frasca, Marco, Notaro, Marco, Grossi, Giuliano, Petrini, Alessandro, Re, Matteo, Valentini, Giorgio, Mesiti, Marco, Roche, Daniel B, Reeb, Jonas, Ritchie, David W, Aridhi, Sabeur, Alborzi, Seyed Ziaeddin, Devignes, Marie-Dominique, Emily Koo, Da Chen, Bonneau, Richard, Gligorijević, Vladimir, Barot, Meet, Fang, Hai, Toppo, Stefano, Lavezzo, Enrico, Falda, Marco, Berselli, Michele, Tosatto, Silvio CE, Carraro, Marco, Piovesan, Damiano, Rehman, Hafeez Ur, Mao, Qizhong, Zhang, Shanshan, Vucetic, Slobodan, Black, Gage S, Jo, Dane, Larsen, Dallas J, Omdahl, Ashton R, Sagers, Luke W, Suh, Erica, Dayton, Jonathan B, McGuffin, Liam J, Brackenridge, Danielle A, Babbitt, Patricia C, Yunes, Jeffrey M, Fontana, Paolo, Zhang, Feng, Zhu, Shanfeng, You, Ronghui, Zhang, Zihan, Dai, Suyang, Yao, Shuwei, Tian, Weidong, Cao, Renzhi, Chandler, Caleb, Amezola, Miguel, Johnson, Devon, Chang, Jia-Ming, Liao, Wen-Hung, Liu, Yi-Wei, Pascarelli, Stefano, Frank, Yotam, Hoehndorf, Robert, Kulmanov, Maxat, Boudellioua, Imane, Politano, Gianfranco, Di Carlo, Stefano, Benso, Alfredo, Hakala, Kai, Ginter, Filip, Mehryary, Farrokh, Kaewphan, Suwisa, Björne, Jari, Moen, Hans, Tolvanen, Martti E E, Salakoski, Tapio, Kihara, Daisuke, Jain, Aashish, v Smuc, Tomislav, Altenhoff, Adrian, Ben-Hur, Asa, Rost, Burkhard, Brenner, Steven E, Orengo, Christine A, Jeffery, Constance J, Bosco, Giovanni, Hogan, Deborah A, Martin, Maria J, OtextquoterightDonovan, Claire, Mooney, Sean D, Greene, Casey S, Radivojac, Predrag and Friedberg, Iddo

Genome Biology, vol. 20, pp. 244 (2019)

Protein function

@article{cafa3,
  addendum = {IF: 10.81},
  author = {Zhou, Naihui and Jiang, Yuxiang and Bergquist, Timothy R and Lee, Alexandra J and Kacsoh, Balint Z and Crocker, Alex W and Lewis, Kimberley A and Georghiou, George and Nguyen, Huy N and Hamid, Md Nafiz and Davis, Larry and Dogan, Tunca and Atalay, Volkan and Rifaioglu, Ahmet S and Dalkiran, Alperen and Cetin-Atalay, Rengul and Zhang, Chengxin and Hurto, Rebecca L and Freddolino, Peter L and Zhang, Yang and Bhat, Prajwal and Supek, Fran and Fern{\'a}ndez, Jos{\'e} M and Gemovic, Branislava and Perovic, Vladimir R and Davidovi{\'c}, Radoslav S and Sumonja, Neven and Veljkovic, Nevena and Asgari, Ehsaneddin and Mofrad, Mohammad RK and Profiti, Giuseppe and Savojardo, Castrense and Martelli, Pier Luigi and Casadio, Rita and Boecker, Florian and Kahanda, Indika and Thurlby, Natalie and McHardy, Alice C and Renaux, Alexandre and Saidi, Rabie and Gough, Julian and Freitas, Alex A and Antczak, Magdalena and Fabris, Fabio and Wass, Mark N and Hou, Jie and Cheng, Jianlin and Hou, Jie and Wang, Zheng and Romero, Alfonso E and Paccanaro, Alberto and Yang, Haixuan and Goldberg, Tatyana and Zhao, Chenguang and Holm, Liisa and T{\"o}r{\"o}nen, Petri and Medlar, Alan J and Zosa, Elaine and Borukhov, Itamar and Novikov, Ilya and Wilkins, Angela and Lichtarge, Olivier and Chi, Po-Han and Tseng, Wei-Cheng and Linial, Michal and Rose, Peter W and Dessimoz, Christophe and Vidulin, Vedrana and Dzeroski, Saso and Sillitoe, Ian and Das, Sayoni and Lees, Jonathan Gill and Jones, David T and Wan, Cen and Cozzetto, Domenico and Fa, Rui and Torres, Mateo and Vesztrocy, Alex Wiarwick and Rodriguez, Jose Manuel and Tress, Michael L and Frasca, Marco and Notaro, Marco and Grossi, Giuliano and Petrini, Alessandro and Re, Matteo and Valentini, Giorgio and Mesiti, Marco and Roche, Daniel B and Reeb, Jonas and Ritchie, David W and Aridhi, Sabeur and Alborzi, Seyed Ziaeddin and Devignes, Marie-Dominique and Emily Koo, Da Chen and Bonneau, Richard and Gligorijevi{\'c}, Vladimir and Barot, Meet and Fang, Hai and Toppo, Stefano and Lavezzo, Enrico and Falda, Marco and Berselli, Michele and Tosatto, Silvio CE and Carraro, Marco and Piovesan, Damiano and Rehman, Hafeez Ur and Mao, Qizhong and Zhang, Shanshan and Vucetic, Slobodan and Black, Gage S and Jo, Dane and Larsen, Dallas J and Omdahl, Ashton R and Sagers, Luke W and Suh, Erica and Dayton, Jonathan B and McGuffin, Liam J and Brackenridge, Danielle A and Babbitt, Patricia C and Yunes, Jeffrey M and Fontana, Paolo and Zhang, Feng and Zhu, Shanfeng and You, Ronghui and Zhang, Zihan and Dai, Suyang and Yao, Shuwei and Tian, Weidong and Cao, Renzhi and Chandler, Caleb and Amezola, Miguel and Johnson, Devon and Chang, Jia-Ming and Liao, Wen-Hung and Liu, Yi-Wei and Pascarelli, Stefano and Frank, Yotam and Hoehndorf, Robert and Kulmanov, Maxat and Boudellioua, Imane and Politano, Gianfranco and Di Carlo, Stefano and Benso, Alfredo and Hakala, Kai and Ginter, Filip and Mehryary, Farrokh and Kaewphan, Suwisa and Bj{\"o}rne, Jari and Moen, Hans and Tolvanen, Martti E E and Salakoski, Tapio and Kihara, Daisuke and Jain, Aashish and {\v S}muc, Tomislav and Altenhoff, Adrian and Ben-Hur, Asa and Rost, Burkhard and Brenner, Steven E and Orengo, Christine A and Jeffery, Constance J and Bosco, Giovanni and Hogan, Deborah A and Martin, Maria J and O{\textquoteright}Donovan, Claire and Mooney, Sean D and Greene, Casey S and Radivojac, Predrag and Friedberg$^*$, Iddo},
  journal = {Genome Biology},
  optannote = {},
  optkey = {},
  optmonth = {},
  optnumber = {},
  optpages = {},
  optvolume = {},
  pages = {244},
  title = {The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens},
  volume = {20},
  year = {2019}
}

Hyaline Arteriolosclerosis in 30 Strains of Aged Inbred Mice

Timothy K. Cooper, Kathleen A. Silva, Victoria E. Kennedy, Sarah M. Alghamdi, Robert Hoehndorf, Beth A. Sundberg, Paul N. Schofield and John P. Sundberg

Veterinary Pathology, pp. 0300985819844822 (2019)

Phenotype informaticsBiomedical informatics

@article{Cooper2019,
  abstract = { During a screen for vascular phenotypes in aged laboratory mice, a unique discrete phenotype of hyaline arteriolosclerosis of the intertubular arteries and arterioles of the testes was identified in several inbred strains. Lesions were limited to the testes and did not occur as part of any renal, systemic, or pulmonary arteriopathy or vasculitis phenotype. There was no evidence of systemic or pulmonary hypertension, and lesions did not occur in ovaries of females. Frequency was highest in males of the SM/J (27/30, 90\%) and WSB/EiJ (19/26, 73\%) strains, aged 383 to 847 days. Lesions were sporadically present in males from several other inbred strains at a much lower (<20\%) frequency. The risk of testicular hyaline arteriolosclerosis is at least partially underpinned by a genetic predisposition that is not associated with other vascular lesions (including vasculitis), separating out the etiology of this form and site of arteriolosclerosis from other related conditions that often co-occur in other strains of mice and in humans. Because of their genetic uniformity and controlled dietary and environmental conditions, mice are an excellent model to dissect the pathogenesis of human disease conditions. In this study, a discrete genetically driven phenotype of testicular hyaline arteriolosclerosis in aging mice was identified. These observations open the possibility of identifying the underlying genetic variant(s) associated with the predisposition and therefore allowing future interrogation of the pathogenesis of this condition. },
  addendum = {IF: 2.11},
  author = {Timothy K. Cooper and Kathleen A. Silva and Victoria E. Kennedy and Sarah M. Alghamdi and Robert Hoehndorf and Beth A. Sundberg and Paul N. Schofield and John P. Sundberg$^*$},
  doi = {10.1177/0300985819844822},
  journal = {Veterinary Pathology},
  month = {May},
  pages = {0300985819844822},
  title = {Hyaline Arteriolosclerosis in 30 Strains of Aged Inbred Mice},
  url = {https://doi.org/10.1177/0300985819844822},
  year = {2019}
}

Ontology based text mining of gene-phenotype associations: application to candidate gene prediction

Kafkas, Şenay and Hoehndorf, Robert

Database, vol. 2019, pp. baz019 (2019)

Biomedical informaticsRare disease

Gene–phenotype associations play an important role in understanding the disease mechanisms which is a requirement for treatment development. A portion of gene–phenotype associations are observed mainly experimentally and made publicly available through several standard resources such as MGI. However, there is still a vast amount of gene–phenotype associations buried in the biomedical literature. Given the large amount of literature data, we need automated text mining tools to alleviate the burden in manual curation of gene–phenotype associations and to develop comprehensive resources. In this study, we present an ontology-based approach in combination with statistical methods to text mine gene–phenotype associations from the literature. Our method achieved AUC values of 0.90 and 0.75 in recovering known gene–phenotype associations from HPO and MGI respectively. We posit that candidate genes and their relevant diseases should be expressed with similar phenotypes in publications. Thus, we demonstrate the utility of our approach by predicting disease candidate genes based on the semantic similarities of phenotypes associated with genes and diseases. To the best of our knowledge, this is the first study using an ontology based approach to extract gene–phenotype associations from the literature. We evaluated our disease candidate prediction model on the gene–disease associations from MGI. Our model achieved AUC values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of gene–disease associations respectively. Our manual analysis on the text mined data revealed that our method can accurately extract gene–phenotype associations which are not currently covered by the existing public gene–phenotype resources. Overall, results indicate that our method can precisely extract known as well as new gene–phenotype associations from literature. All the data and methods are available at https://github.com/bio-ontology-research-group/genepheno.

@article{database2019,
  abstract = {{Gene–phenotype associations play an important role in understanding the disease mechanisms which is a requirement for treatment development. A portion of gene–phenotype associations are observed mainly experimentally and made publicly available through several standard resources such as MGI. However, there is still a vast amount of gene–phenotype associations buried in the biomedical literature. Given the large amount of literature data, we need automated text mining tools to alleviate the burden in manual curation of gene–phenotype associations and to develop comprehensive resources. In this study, we present an ontology-based approach in combination with statistical methods to text mine gene–phenotype associations from the literature. Our method achieved AUC values of 0.90 and 0.75 in recovering known gene–phenotype associations from HPO and MGI respectively. We posit that candidate genes and their relevant diseases should be expressed with similar phenotypes in publications. Thus, we demonstrate the utility of our approach by predicting disease candidate genes based on the semantic similarities of phenotypes associated with genes and diseases. To the best of our knowledge, this is the first study using an ontology based approach to extract gene–phenotype associations from the literature. We evaluated our disease candidate prediction model on the gene–disease associations from MGI. Our model achieved AUC values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of gene–disease associations respectively. Our manual analysis on the text mined data revealed that our method can accurately extract gene–phenotype associations which are not currently covered by the existing public gene–phenotype resources. Overall, results indicate that our method can precisely extract known as well as new gene–phenotype associations from literature. All the data and methods are available at https://github.com/bio-ontology-research-group/genepheno.}},
  addendum = {IF: 3.66},
  author = {Kafkas, Şenay and Hoehndorf$^*$, Robert},
  issn = {1758-0463},
  journal = {Database},
  month = {February},
  pages = {baz019},
  title = {{Ontology based text mining of gene-phenotype associations: application to candidate gene prediction}},
  volume = {2019},
  year = {2019}
}

DeepPVP: phenotype-based prioritization of causative variants using deep learning

Boudellioua, Imane, Kulmanov, Maxat, Schofield, Paul N, Gkoutos, Georgios V and Hoehndorf, Robert

BMC Bioinformatics, vol. 20, pp. 65 (2019)

Rare diseaseNeuro-symbolic AI

@article{deeppvp,
  abstract = {Background: Prioritization of variants in personal genomic data is a major challenge. Recently, computational methods that rely on comparing phenotype similarity have shown to be useful to identify causative variants. In these methods, pathogenicity prediction is combined with a semantic similarity measure to prioritize not only variants that are likely to be dysfunctional but those that are likely involved in the pathogenesis of a patient{\textquoteright}s phenotype. Results: We have developed DeepPVP, a variant prioritization method that combined automated inference with deep neural networks to identify the likely causative variants in whole exome or whole genome sequence data. We demonstrate that DeepPVP performs significantly better than existing methods, including phenotype-based methods that use similar features. DeepPVP is freely available at https://github.com/bio-ontology-research-group/phenomenet-vp Conclusions: DeepPVP further improves on existing variant prioritization methods both in terms of speed as well as accuracy.},
  addendum = {IF: 3.17},
  author = {Boudellioua, Imane and Kulmanov, Maxat and Schofield, Paul N and Gkoutos, Georgios V and Hoehndorf$^*$, Robert},
  journal = {BMC Bioinformatics},
  pages = {65},
  title = {DeepPVP: phenotype-based prioritization of causative variants using deep learning},
  url = {https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2633-8},
  volume = {20},
  year = {2019}
}

Nail abnormalities identified in an ageing study of 30 inbred mouse strains

Linn, Sarah C., Mustonen, Allison M., Silva, Kathleen A., Kennedy, Victoria E., Sundberg, Beth A., Bechtold, Lesley S., Alghamdi, Sarah M., Hoehndorf, Robert, Schofield, Paul N. and Sundberg, John P.

Experimental Dermatology, vol. 28(4), pp. 383-390 (2019)

Phenotype informatics

@article{Linn2018,
  abstract = {In a large-scale ageing study, 30 inbred mouse strains were systematically screened for histologic evidence of lesions in all organ systems. Ten strains were diagnosed with similar nail abnormalities. The highest frequency was noted in NON/ShiLtJ mice. Lesions identified fell into two main categories: acute to chronic penetration of the third phalangeal bone through the hyponychium with associated inflammation and bone remodelling or metaplasia of the nail matrix and nail bed associated with severe orthokeratotic hyperkeratosis replacing the nail plate. Penetration of the distal phalanx through the hyponychium appeared to be the initiating feature resulting in nail abnormalities. The accompanying acute to subacute inflammatory response was associated with osteolysis of the distal phalanx. Evaluation of young NON/ShiLtJ mice revealed that these lesions were not often found, or affected only one digit. The only other nail unit abnormality identified was sporadic subungual epidermoid inclusion cysts which closely resembled similar lesions in human patients. These abnormalities, being age-related developments, may have contributed to weight loss due to impacts upon feeding and should be a consideration for future research due to the potential to interact with other experimental factors in ageing studies using the affected strains of mice.},
  addendum = {IF: 3.37},
  author = {Linn, Sarah C. and Mustonen, Allison M. and Silva, Kathleen A. and Kennedy, Victoria E. and Sundberg, Beth A. and Bechtold, Lesley S. and Alghamdi, Sarah M. and Hoehndorf, Robert and Schofield, Paul N. and Sundberg$^*$, John P.},
  doi = {10.1111/exd.13759},
  journal = {Experimental Dermatology},
  keywords = {nail dystrophy, pododermatitis circumscripta, subungual intraosseous epidermoid inclusion cysts},
  month = {August},
  number = {4},
  pages = {383--390},
  title = {Nail abnormalities identified in an ageing study of 30 inbred mouse strains},
  url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/exd.13759},
  volume = {28},
  year = {2019}
}

Ontology based mining of pathogen--disease associations from literature

Senay Kafkas and Robert Hoehndorf

Journal of Biomedical Semantics, vol. 10, pp. 15 (2019)

Biomedical informaticsApplied Ontology

@article{padimi2019,
  addendum = {IF: 1.99},
  author = {Senay Kafkas and Robert Hoehndorf$^*$},
  journal = {Journal of Biomedical Semantics},
  optannote = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  pages = {15},
  title = {Ontology based mining of pathogen--disease associations from literature},
  volume = {10},
  year = {2019}
}

PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research

Kafkas, Senay, Abdelhakim, Marwa, Hashish, Yasmeen, Kulmanov, Maxat, Abdellatif, Marwa, Schofield, Paul N and Hoehndorf, Robert

Scientific Data, vol. 6(1), pp. 79 (2019)

Biomedical informaticsRare disease

@article{pathophenodb,
  abstract = {Understanding the relationship between the pathophysiology of infectious disease, the biology of the causative agent and the development of therapeutic and diagnostic approaches is dependent on the synthesis of a wide range of types of information. Provision of a comprehensive and integrated disease phenotype knowledgebase has the potential to provide novel and orthogonal sources of information for the understanding of infectious agent pathogenesis, and support for research on disease mechanisms. We have developed PathoPhenoDB, a database containing pathogen-to-phenotype associations. PathoPhenoDB relies on manual curation of pathogen-disease relations, on ontology-based text mining as well as manual curation to associate phenotypes with infectious disease. Using Semantic Web technologies, PathoPhenoDB also links to knowledge about drug resistance mechanisms and drugs used in the treatment of infectious diseases. PathoPhenoDB is accessible at http://patho.phenomebrowser.net/, and the data is freely available through a public SPARQL endpoint.},
  addendum = {IF: 5.54},
  author = {Kafkas, Senay and Abdelhakim, Marwa and Hashish, Yasmeen and Kulmanov, Maxat and Abdellatif, Marwa and Schofield, Paul N and Hoehndorf$^*$, Robert},
  doi = {10.1101/489971},
  journal = {Scientific Data},
  month = {June},
  number = {1},
  pages = {79},
  title = {PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research},
  url = {http://hdl.handle.net/10754/630280},
  volume = {6},
  year = {2019}
}

EL Embeddings: Geometric construction of models for the Description Logic EL++

Maxat Kulmanov, Wang Liu-Wei, Yuan Yan and Robert Hoehndorf

Proceedings of IJCAI 2019 (2019)

Neuro-symbolic AI

@inproceedings{kulmanov2019el,
  abstract = {An embedding is a function that maps entities from one algebraic structure into another while preserving certain characteristics. Embeddings are being used successfully for mapping relational data or text into vector spaces where they can be used for machine learning, similarity search, or similar tasks. We address the problem of finding vector space embeddings for theories in the Description Logic EL++ that are also models of the TBox. To find such embeddings, we define an optimization problem that characterizes the model-theoretic semantics of the operators in EL++ within ℜn, thereby solving the problem of finding an interpretation function for an EL++ theory given a particular domain $\Delta$. Our approach is mainly relevant to large EL++ theories and knowledge bases such as the ontologies and knowledge graphs used in the life sciences. We demonstrate that our method can be used for improved prediction of protein--protein interactions when compared to semantic similarity measures or knowledge graph embedding },
  author = {Maxat Kulmanov and Wang Liu-Wei and Yuan Yan and Robert Hoehndorf$^*$},
  booktitle = {Proceedings of IJCAI 2019},
  month = {August},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optvolume = {},
  series = {IJCAI},
  title = {EL Embeddings: Geometric construction of models for the Description Logic EL++},
  year = {2019}
}

Semi-Supervised Entity Alignment via Knowledge Graph Embedding with Awareness of Degree Difference

Pei, Shichao, Yu, Lu, Hoehndorf, Robert and Zhang, Xiangliang

The World Wide Web Conference, pp. 3130-3136 (2019)

Neuro-symbolic AI

@inproceedings{Pei2019,
  abstract = {Entity alignment associates entities in different knowledge graphs if they are semantically same, and has been successfully used in the knowledge graph construction and connection. Most of the recent solutions for entity alignment are based on knowledge graph embedding, which maps knowledge entities in a low-dimension space where entities are connected with the guidance of prior aligned entity pairs. The study in this paper focuses on two important issues that limit the accuracy of current entity alignment solutions: 1) labeled data of priorly aligned entity pairs are difficult and expensive to acquire, whereas abundant of unlabeled data are not used; and 2) knowledge graph embedding is affected by entity's degree difference, which brings challenges to align high frequent and low frequent entities. We propose a semi-supervised entity alignment method (SEA) to leverage both labeled entities and the abundant unlabeled entity information for the alignment. Furthermore, we improve the knowledge graph embedding with awareness of the degree difference by performing the adversarial training. To evaluate our proposed model, we conduct extensive experiments on real-world datasets. The experimental results show that our model consistently outperforms the state-of-the-art methods with significant improvement on alignment accuracy.},
  acmid = {3313646},
  address = {New York, NY, USA},
  author = {Pei, Shichao and Yu, Lu and Hoehndorf, Robert and Zhang$^*$, Xiangliang},
  booktitle = {The World Wide Web Conference},
  doi = {10.1145/3308558.3313646},
  isbn = {978-1-4503-6674-8},
  keywords = {Entity Alignment, Knowledge Graph, Semi-supervised Learning},
  location = {San Francisco, CA, USA},
  month = {May},
  numpages = {7},
  pages = {3130--3136},
  publisher = {ACM},
  series = {WWW '19},
  title = {Semi-Supervised Entity Alignment via Knowledge Graph Embedding with Awareness of Degree Difference},
  url = {http://doi.acm.org/10.1145/3308558.3313646},
  year = {2019}
}

FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration

Damion M. Dooley, Emma J. Griffiths, Gurinder S. Gosal, Pier L. Buttigieg, Robert Hoehndorf, Matthew C. Lange, Lynn M. Schriml, Fiona S. L. Brinkman and William W. L. Hsiao

Science of Food, vol. 2, pp. 23 (2018)

Applied OntologyOntology engineering

@article{foodon,
  author = {Damion M. Dooley and Emma J. Griffiths and Gurinder S. Gosal and Pier L. Buttigieg and Robert Hoehndorf and Matthew C. Lange and Lynn M. Schriml and Fiona S. L. Brinkman and William W. L. Hsiao$^*$},
  journal = {Science of Food},
  optannote = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  pages = {23},
  title = {FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration},
  volume = {2},
  year = {2018}
}

Notions of similarity for systems biology models

Ron Henkel, Robert Hoehndorf, Tim Kacprowski, Christian Knüpfer, Wolfgang Liebermeister and Dagmar Waltemath

Briefings in Bioinformatics, vol. 19(1), pp. 77-88 (2018)

Semantic similarityApplied Ontology

@article{Henkel2016,
  abstract = {Systems biology models are rapidly increasing in complexity, size and numbers. When building large models, researchers rely on software tools for the retrieval, comparison, combination and merging of models, as well as for version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of 'similarity' may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here we survey existing methods for the comparison of models, introduce quantitative measures for model similarity, and discuss potential applications of combined similarity measures. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on a combination of different model aspects. The six aspects that we define as potentially relevant for similarity are underlying encoding, references to biological entities, quantitative behaviour, qualitative behaviour, mathematical equations and parameters and network structure. We argue that future similarity measures will benefit from combining these model aspects in flexible, problem-specific ways to mimic users' intuition about model similarity, and to support complex model searches in databases.},
  addendum = {IF: 11.62},
  author = {Ron Henkel and Robert Hoehndorf and Tim Kacprowski and Christian Knüpfer and Wolfgang Liebermeister and Dagmar Waltemath$^*$},
  journal = {Briefings in Bioinformatics},
  month = {January},
  number = {1},
  optannote = {},
  optkey = {},
  optnote = {},
  pages = {77--88},
  title = {Notions of similarity for systems biology models},
  url = {http://bib.oxfordjournals.org/content/early/2016/10/11/bib.bbw090.full.pdf},
  volume = {19},
  year = {2018}
}

A Review of Current Standards and the Evolution of Histopathology Nomenclature for Laboratory Animals

Keenan, Charlotte M, McKerlie, Colin, Gkoutos, Georgios V, Ward, Jerrold M, Sundberg, John P, Cesta, Mark F, Schofield, Paul N, Cardiff, Robert, Hoehndorf, Robert and Elmore, Susan A

ILAR Journal, pp. 1-11 (2018)

Applied OntologyOntology engineering

@article{histopathology,
  abstract = {The need for international collaboration in rodent pathology has evolved since the 1970s and was initially driven by the new field of toxicologic pathology. First initiated by the World Health Organization’s International Agency for Research on Cancer for rodents, it has evolved to include pathology of the major species (rats, mice, guinea pigs, nonhuman primates, pigs, dogs, fish, rabbits) used in medical research, safety assessment, and mouse pathology. The collaborative effort today is driven by the needs of the regulatory agencies in multiple countries, and by needs of research involving genetically engineered animals, for “basic” research and for more translational preclinical models of human disease. These efforts led to the establishment of an international rodent pathology nomenclature program. Since that time, multiple collaborations for standardization of laboratory animal pathology nomenclature and diagnostic criteria have been developed, and just a few are described herein. Recently, approaches to a nomenclature that is amenable to sophisticated computation have been made available and implemented for large-scale programs in functional genomics and aging. Most terminologies continue to evolve as the science of human and veterinary pathology continues to develop, but standardization and successful implementation remain critical for scientific communication now as ever in the history of veterinary nosology.},
  addendum = {IF: 2.25},
  author = {Keenan, Charlotte M and McKerlie, Colin and Gkoutos, Georgios V and Ward, Jerrold M and Sundberg, John P and Cesta, Mark F and Schofield, Paul N and Cardiff, Robert and Hoehndorf, Robert and Elmore$^*$, Susan A},
  journal = {ILAR Journal},
  month = {11},
  pages = {1--11},
  title = {A Review of Current Standards and the Evolution of Histopathology Nomenclature for Laboratory Animals},
  year = {2018}
}

OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants

Boudellioua, Imane, Kulmanov, Maxat, Schofield, Paul N, Gkoutos, Georgios V and Hoehndorf, Robert

Scientific Reports, vol. 8, pp. 14681 (2018)

Rare diseaseGenomics

@article{oligopvp,
  abstract = {Purpose: An increasing number of Mendelian disorders have been identified for which two or more variants in one or more genes are required to cause the disease, or significantly modify its severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of variants underlying oligogenic diseases in individual whole exome or whole genome sequences. Methods: Information that links patient phenotypes to databases of gene-phenotype associations observed in clinical research can provide useful information and improve variant prioritization for Mendelian diseases. Additionally, background knowledge about interactions between genes can be utilized to guide and restrict the selection of candidate disease modules. Results: We developed OligoPVP, an algorithm that can be used to identify variants in oligogenic diseases and their interactions, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods. Conclusions: Our results show that OligoPVP can efficiently detect oligogenic interactions using a phenotype-driven approach and identify etiologically important variants in whole genomes.},
  addendum = {IF: 4.00},
  author = {Boudellioua, Imane and Kulmanov, Maxat and Schofield, Paul N and Gkoutos, Georgios V and Hoehndorf$^*$, Robert},
  journal = {Scientific Reports},
  pages = {14681},
  title = {OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants},
  volume = {8},
  year = {2018}
}

Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations

Smaili, Fatima Zohra, Gao, Xin and Hoehndorf, Robert

Bioinformatics, vol. 34(13), pp. i52-i60 (2018)

Neuro-symbolic AIApplied Ontology

Motivation: Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. The structure and information contained in ontologies and their annotations make them valuable for developing machine learning, data analysis and knowledge extraction algorithms; notably, semantic similarity is widely used to identify relations between biological entities, and ontology-based annotations are frequently used as features in machine learning applications. Results: We propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. To evaluate Onto2Vec, we use the gene ontology (GO) and jointly produce dense vector representations of proteins, the GO classes to which they are annotated, and the axioms in GO that constrain these classes. First, we demonstrate that Onto2Vec-generated feature vectors can significantly improve prediction of protein--protein interactions in human and yeast. We then illustrate how Onto2Vec representations provide the means for constructing data-driven, trainable semantic similarity measures that can be used to identify particular relations between proteins. Finally, we use an unsupervised clustering approach to identify protein families based on their Enzyme Commission numbers. Our results demonstrate that Onto2Vec can generate high quality feature vectors from biological entities and ontologies. Onto2Vec has the potential to significantly outperform the state-of-the-art in several predictive applications in which ontologies are involved. Availability and implementation: https://github.com/bio-ontology-research-group/onto2vec

@article{onto2vec,
  abstract = {Motivation: Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. The structure and information contained in ontologies and their annotations make them valuable for developing machine learning, data analysis and knowledge extraction algorithms; notably, semantic similarity is widely used to identify relations between biological entities, and ontology-based annotations are frequently used as features in machine learning applications. Results: We propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. To evaluate Onto2Vec, we use the gene ontology (GO) and jointly produce dense vector representations of proteins, the GO classes to which they are annotated, and the axioms in GO that constrain these classes. First, we demonstrate that Onto2Vec-generated feature vectors can significantly improve prediction of protein--protein interactions in human and yeast. We then illustrate how Onto2Vec representations provide the means for constructing data-driven, trainable semantic similarity measures that can be used to identify particular relations between proteins. Finally, we use an unsupervised clustering approach to identify protein families based on their Enzyme Commission numbers. Our results demonstrate that Onto2Vec can generate high quality feature vectors from biological entities and ontologies. Onto2Vec has the potential to significantly outperform the state-of-the-art in several predictive applications in which ontologies are involved. Availability and implementation: https://github.com/bio-ontology-research-group/onto2vec},
  addendum = {IF: 6.94},
  author = {Smaili, Fatima Zohra and Gao$^*$, Xin and Hoehndorf$^*$, Robert},
  journal = {Bioinformatics},
  month = {July},
  number = {13},
  pages = {i52--i60},
  title = {Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations},
  url = {https://academic.oup.com/bioinformatics/article/34/13/i52/5045776},
  volume = {34},
  year = {2018}
}

OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction

Smaili, Fatima Zohra, Gao, Xin and Hoehndorf, Robert

Bioinformatics, pp. bty933 (2018)

Neuro-symbolic AISemantic similarity

@article{opa2vec,
  abstract = {Motivation:
Ontologies are widely used in biology for data annotation, integration and analysis. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation axioms which provide valuable pieces of information that characterize ontology classes. Annotation axioms commonly used in ontologies include class labels, descriptions or synonyms. Despite being a rich source of semantic information, the ontology meta-data are generally unexploited by ontology-based analysis methods such as semantic similarity measures.
Results:
We propose a novel method, OPA2Vec, to generate vector representations of biological entities in ontologies by combining formal ontology axioms and annotation axioms from the ontology meta-data. We apply a Word2Vec model that has been pre-trained on either a corpus or abstracts or full-text articles to produce feature vectors from our collected data. We validate our method in two different ways: first, we use the obtained vector representations of proteins in a similarity measure to predict protein–protein interaction on two different datasets. Second, we evaluate our method on predicting gene–disease associations based on phenotype similarity by generating vector representations of genes and diseases using a phenotype ontology, and applying the obtained vectors to predict gene–disease associations using mouse model phenotypes. We demonstrate that OPA2Vec significantly outperforms existing methods for predicting gene–disease associations. Using evidence from mouse models, we apply OPA2Vec to identify candidate genes for several thousand rare and orphan diseases. OPA2Vec can be used to produce vector representations of any biomedical entity given any type of biomedical ontology.
Availability and implementation: https://github.com/bio-ontology-research-group/opa2vec},
  addendum = {IF: 6.94},
  author = {Smaili, Fatima Zohra and Gao$^*$, Xin and Hoehndorf$^*$, Robert},
  doi = {10.1093/bioinformatics/bty933},
  journal = {Bioinformatics},
  number = {},
  pages = {bty933},
  title = {OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction},
  url = {http://dx.doi.org/10.1093/bioinformatics/bty933},
  volume = {},
  year = {2018}
}

In silico exploration of Red Sea Bacillus genomes for natural product biosynthetic gene clusters

Othoum, Ghofran and Bougouffa

BMC Genomics, vol. 19(1), pp. 382 (2018)

Microbial communities

@article{Othoum2018,
  abstract = {The increasing spectrum of multidrug-resistant bacteria is a major global public health concern, necessitating discovery of novel antimicrobial agents. Here, members of the genus Bacillus are investigated as a potentially attractive source of novel antibiotics due to their broad spectrum of antimicrobial activities. We specifically focus on a computational analysis of the distinctive biosynthetic potential of Bacillus paralicheniformis strains isolated from the Red Sea, an ecosystem exposed to adverse, highly saline and hot conditions.},
  addendum = {IF: 3.59},
  author = {Othoum, Ghofran
and Bougouffa, Salim
and Razali, Rozaimi
and Bokhari, Ameerah
and Alamoudi, Soha
and Antunes, Andr{\'e}
and Gao, Xin
and Hoehndorf, Robert
and Arold, Stefan T.
and Gojobori, Takashi
and Hirt, Heribert
and Mijakovic, Ivan
and Bajic, Vladimir B.
and Lafi, Feras F.
and Essack$^*$, Magbubah},
  day = {22},
  issn = {1471-2164},
  journal = {BMC Genomics},
  month = {May},
  number = {1},
  pages = {382},
  title = {In silico exploration of Red Sea Bacillus genomes for natural product biosynthetic gene clusters},
  volume = {19},
  year = {2018}
}

The anatomy of phenotype ontologies: principles, properties and applications

Georgios V. Gkoutos, Paul N. Schofield and Robert Hoehndorf

Briefings in Bioinformatics, vol. 19(5), pp. 1008-1021 (2018)

Applied OntologyPhenotype informatics

@article{pato-paper,
  abstract = {The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.},
  addendum = {IF: 11.62},
  author = {Georgios V. Gkoutos and Paul N. Schofield and Robert Hoehndorf$^*$},
  doi = {https://doi.org/10.1093/bib/bbx035},
  journal = {Briefings in Bioinformatics},
  month = {September},
  number = {5},
  optannote = {},
  optkey = {},
  optnote = {},
  pages = {1008--1021},
  title = {The anatomy of phenotype ontologies: principles, properties and applications},
  url = {https://academic.oup.com/bib/article-lookup/doi/10.1093/bib/bbx035},
  volume = {19},
  year = {2018}
}

Ontology-based validation and identification of regulatory phenotypes

Kulmanov, Maxat, Schofield, Paul N, Gkoutos, Georgios V and Hoehndorf, Robert

Bioinformatics, vol. 34(17), pp. i857-i865 (2018)

Applied OntologyPhenotype informatics

@article{rule-phenotype,
  abstract = {Motivation: Function annotations of gene products, and phenotype annotations of genotypes, provide valuable information about molecular mechanisms that can be utilized by computational methods to identify functional and phenotypic relatedness, improve our understanding of disease and pathobiology, and lead to discovery of drug targets. Identifying functions and phenotypes commonly requires experiments which are time-consuming and expensive to carry out; creating the annotations additionally requires a curator to make an assertion based on reported evidence. Support to validate the mutual consistency of functional and phenotype annotations as well as a computational method to predict phenotypes from function annotations, would greatly improve the utility of function annotations. Results: We developed a novel ontology-based method to validate the mutual consistency of function and phenotype annotations. We apply our method to mouse and human annotations, and identify several inconsistencies that can be resolved to improve overall annotation quality. We also apply our method to the rule-based prediction of regulatory phenotypes from functions and demonstrate that we can predict these phenotypes with Fmax of up to 0.647. Availability and implementation: https://github.com/bio-ontology-research-group/phenogocon},
  addendum = {IF: 6.94},
  author = {Kulmanov, Maxat and Schofield, Paul N and Gkoutos, Georgios V and Hoehndorf$^*$, Robert},
  journal = {Bioinformatics},
  month = {September},
  number = {17},
  pages = {i857-i865},
  title = {Ontology-based validation and identification of regulatory phenotypes},
  url = {https://academic.oup.com/bioinformatics/article/34/17/i857/5093216},
  volume = {34},
  year = {2018}
}

Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes

Alshahrani, Mona and Hoehndorf, Robert

Bioinformatics, vol. 34(17), pp. i901-i907 (2018)

Neuro-symbolic AIRare disease

@article{smudge,
  abstract = {Motivation: In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease's (or patient's) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results: We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network. Availability and implementation: https://github.com/bio-ontology-research-group/SmuDGE},
  addendum = {IF: 6.94},
  author = {Alshahrani, Mona and Hoehndorf$^*$, Robert},
  journal = {Bioinformatics},
  month = {September},
  number = {17},
  pages = {i901-i907},
  title = {Semantic Disease Gene Embeddings ({SmuDGE}): phenotype-based disease gene prioritization without phenotypes},
  url = {https://academic.oup.com/bioinformatics/article/34/17/i901/5093225},
  volume = {34},
  year = {2018}
}

Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks

Sohaib Younis, Claus Weiland, Robert Hoehndorf, Stefan Dressler, Thomas Hickler, Bernhard Seeger and Marco Schmidt

Botany Letters, vol. 165(3--4), pp. 377-383 (2018)

Phenotype informaticsBiomedical informatics

@article{Sohaib2018,
  abstract = {Herbaria worldwide are housing a treasure of hundreds of millions of herbarium specimens, which are increasingly being digitized and thereby more accessible to the scientific community. At the same time, deep-learning algorithms are rapidly improving pattern recognition from images and these techniques are more and more being applied to biological objects. In this study, we are using digital images of herbarium specimens in order to identify taxa and traits of these collection objects by applying convolutional neural networks (CNN). Images of the 1000 species most frequently documented by herbarium specimens on GBIF have been downloaded and combined with morphological trait data, preprocessed and divided into training and test datasets for species and trait recognition. Good performance in both domains suggests substantial potential of this approach for supporting taxonomy and natural history collection management. Trait recognition is also promising for applications in functional ecology.},
  addendum = {IF: 1.05},
  author = {Sohaib Younis$^*$ and Claus Weiland and Robert Hoehndorf and Stefan Dressler and Thomas Hickler and Bernhard  Seeger and Marco  Schmidt},
  journal = {Botany Letters},
  number = {3--4},
  pages = {377--383},
  publisher = {Taylor and Francis},
  title = {Taxon and trait recognition from digitized herbarium specimens using deep convolutional neural networks},
  url = {https://doi.org/10.1080/23818107.2018.1446357},
  volume = {165},
  year = {2018}
}

Ontology based mining of pathogen-disease associations from literature

Senay Kafkas and Robert Hoehndorf

Bio-Ontologies COSI (2018)

Biomedical informaticsApplied Ontology

@inproceedings{Kafkas2018-ismb,
  author = {Senay Kafkas and Robert Hoehndorf$^*$},
  booktitle = {Bio-Ontologies COSI},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  title = {Ontology based mining of pathogen-disease associations from literature},
  year = {2018}
}

Ontology-Based Concept Recognition by Using Word Embeddings

Sara Althubaiti, Senay Kafkas and Robert Hoehndorf

Bio-Ontologies COSI (2018)

Neuro-symbolic AIBiomedical informatics

@inproceedings{Sara2018,
  author = {Sara Althubaiti and Senay Kafkas and Robert Hoehndorf$^*$},
  booktitle = {Bio-Ontologies COSI},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  title = {Ontology-Based Concept Recognition by Using Word Embeddings},
  year = {2018}
}

Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings

Maxat Kulmanov and Senay Kafkas and Andreas Karwath and Alexander Malic and Georgios V. Gkoutos and Michel Dumontier and Robert Hoehndorf

Proceedings of the 11th International Conference Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2018, Antwerp, Belgium, December 3-6, 2018. (2018)

Neuro-symbolic AIOntology engineering

@inproceedings{vec2sparql,
  author = {Maxat Kulmanov and
Senay Kafkas and
Andreas Karwath and
Alexander Malic and
Georgios V. Gkoutos and
Michel Dumontier and
Robert Hoehndorf$^*$},
  booktitle = {Proceedings of the 11th International Conference Semantic Web Applications
and Tools for Life Sciences, {SWAT4LS} 2018, Antwerp, Belgium, December
3-6, 2018.},
  title = {Vec2SPARQL: integrating {SPARQL} queries and knowledge graph embeddings},
  year = {2018}
}

Neuro-symbolic representation learning on biological knowledge graphs

Alshahrani, Mona, Khan, Mohammad Asif, Maddouri, Omar, Kinjo, Akira R., Queralt-Rosinach, Núria and Hoehndorf, Robert

Bioinformatics, vol. 33(17), pp. 2723-2730 (2017)

Neuro-symbolic AIOntology engineering

@article{alsharani17,
  abstract = {Motivation: Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. Availability and implementation: https://github.com/bio-ontology-research-group/walking-rdf-and-owl Contact: robert.hoehndorf@kaust.edu.sa},
  addendum = {IF: 6.94},
  author = {Alshahrani, Mona and Khan, Mohammad Asif and Maddouri, Omar and Kinjo, Akira R. and Queralt-Rosinach, Núria and Hoehndorf$^*$, Robert},
  journal = {Bioinformatics},
  number = {17},
  pages = {2723--2730},
  title = {Neuro-symbolic representation learning on biological knowledge graphs},
  url = {http://dx.doi.org/10.1093/bioinformatics/btx275},
  volume = {33},
  year = {2017}
}

Data science and symbolic AI: Synergies, challenges and opportunities

Hoehndorf, Robert and Queralt-Rosinach, Núria

Data Science, vol. 1(1--2), pp. 27-38 (2017)

Neuro-symbolic AI

@article{datascience,
  abstract = {Symbolic approaches to artificial intelligence represent things within a domain of knowledge through physical symbols, combine symbols into symbol expressions, and manipulate symbols and symbol expressions through inference processes. While a large part of Data Science relies on statistics and applies statistical approaches to artificial intelligence, there is an increasing potential for successfully applying symbolic approaches as well. Symbolic representations and symbolic inference are close to human cognitive representations and therefore comprehensible and interpretable; they are widely used to represent data and metadata, and their specific semantic content must be taken into account for analysis of such information; and human communication largely relies on symbols, making symbolic representations a crucial part in the analysis of natural language. Here we discuss the role symbolic representations and inference can play in Data Science, highlight the research challenges from the perspective of the data scientist, and argue that symbolic methods should become a crucial component of the data scientists’ toolbox.},
  author = {Hoehndorf$^*$, Robert and Queralt-Rosinach, Núria},
  doi = {10.3233/ds-170004},
  issn = {2451-8492},
  journal = {Data Science},
  number = {1--2},
  pages = {27--38},
  publisher = {IOS Press},
  title = {Data science and symbolic AI: Synergies, challenges and opportunities},
  url = {http://hdl.handle.net/10754/624879},
  volume = {1},
  year = {2017}
}

DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

Kulmanov, Maxat, Khan, Mohammed Asif and Hoehndorf, Robert

Bioinformatics, vol. 34(4), pp. 660-668 (2017)

Protein functionNeuro-symbolic AI

@article{deepgo,
  abstract = {Motivation: A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. Results: We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Availability and implementation: Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo Contact: robert.hoehndorf@kaust.edu.sa},
  addendum = {IF: 6.94},
  author = {Kulmanov, Maxat and Khan, Mohammed Asif and Hoehndorf$^*$, Robert},
  journal = {Bioinformatics},
  number = {4},
  pages = {660--668},
  title = {DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier},
  url = {http://dx.doi.org/10.1093/bioinformatics/btx624},
  volume = {34},
  year = {2017}
}

DES-TOMATO: A Knowledge Exploration System Focused On Tomato Species

Salhi, Adil, Negrão, Sónia, Essack, Magbubah, Morton, Mitchell J. L., Bougouffa, Salim, Mohamad Razali, Rozaimi, Radovanovic, Aleksandar, Marchand, Benoit, Kulmanov, Maxat, Hoehndorf, Robert, Tester, Mark A. and Bajic, Vladimir B.

Scientific Reports, vol. 7, pp. 5968 (2017)

Applied OntologyOntology engineering

@article{destomato,
  abstract = {Tomato is the most economically important horticultural crop used as a model to study plant biology and particularly fruit development. Knowledge obtained from tomato research initiated improvements in tomato and, being transferrable to other such economically important crops, has led to a surge of tomato-related research and published literature. We developed DES-TOMATO knowledgebase (KB) for exploration of information related to tomato. Information exploration is enabled through terms from 26 dictionaries and combination of these terms. To illustrate the utility of DES-TOMATO, we provide several examples how one can efficiently use this KB to retrieve known or potentially novel information. DES-TOMATO is free for academic and nonprofit users and can be accessed at http://cbrc.kaust.edu.sa/des\_tomato/, using any of the mainstream web browsers, including Firefox, Safari and Chrome.},
  addendum = {IF: 4.00},
  author = {Salhi, Adil and Negrão, Sónia and Essack, Magbubah and Morton, Mitchell J. L. and Bougouffa, Salim and Mohamad Razali, Rozaimi and Radovanovic, Aleksandar and Marchand, Benoit and Kulmanov, Maxat and Hoehndorf, Robert and Tester, Mark A. and Bajic$^*$, Vladimir B.},
  issn = {2045-2322},
  journal = {Scientific Reports},
  pages = {5968},
  publisher = {Springer Nature},
  title = {DES-TOMATO: A Knowledge Exploration System Focused On Tomato Species},
  volume = {7},
  year = {2017}
}

Usage of cell nomenclature in biomedical literature

Kafkas, Şenay and Sarntivijai

BMC Bioinformatics, vol. 18(17), pp. 561 (2017)

Biomedical informaticsOntology engineering

@article{Kafkas2017,
  abstract = {Cell lines and cell types are extensively studied in biomedical research yielding to a significant amount of publications each year. Identifying cell lines and cell types precisely in publications is crucial for science reproducibility and knowledge integration. There are efforts for standardisation of the cell nomenclature based on ontology development to support FAIR principles of the cell knowledge. However, it is important to analyse the usage of cell nomenclature in publications at a large scale for understanding the level of uptake of cell nomenclature in literature by scientists. In this study, we analyse the usage of cell nomenclature, both in Vivo, and in Vitro in biomedical literature by using text mining methods and present our results.},
  addendum = {IF: 3.17},
  author = {Kafkas$^*$, {\c{S}}enay
and Sarntivijai, Sirarat
and Hoehndorf, Robert},
  day = {21},
  doi = {10.1186/s12859-017-1978-0},
  issn = {1471-2105},
  journal = {BMC Bioinformatics},
  month = {Dec},
  number = {17},
  pages = {561},
  title = {Usage of cell nomenclature in biomedical literature},
  url = {https://doi.org/10.1186/s12859-017-1978-0},
  volume = {18},
  year = {2017}
}

Evaluating the effect of annotation size on measures of semantic similarity

Kulmanov, Maxat and Hoehndorf, Robert

Journal of Biomedical Semantics, vol. 8(1), pp. 7 (2017)

Semantic similarityApplied Ontology

@article{Kulmanov2017,
  abstract = {Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products.},
  addendum = {IF: 1.99},
  author = {Kulmanov, Maxat and Hoehndorf$^*$, Robert},
  doi = {10.1186/s13326-017-0119-z},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  month = {January},
  number = {1},
  pages = {7},
  title = {Evaluating the effect of annotation size on measures of semantic similarity},
  url = {http://dx.doi.org/10.1186/s13326-017-0119-z},
  volume = {8},
  year = {2017}
}

In silico screening for candidate chassis strains of free fatty acid-producing cyanobacteria

Motwalli, Olaa and Essack

BMC Genomics, vol. 18(1), pp. 33 (2017)

Microbial communities

@article{Motwalli2017,
  abstract = {Finding a source from which high-energy-density biofuels can be derived at an industrial scale has become an urgent challenge for renewable energy production. Some microorganisms can produce free fatty acids (FFA) as precursors towards such high-energy-density biofuels. In particular, photosynthetic cyanobacteria are capable of directly converting carbon dioxide into FFA. However, current engineered strains need several rounds of engineering to reach the level of production of FFA to be commercially viable; thus new chassis strains that require less engineering are needed. Although more than 120 cyanobacterial genomes are sequenced, the natural potential of these strains for FFA production and excretion has not been systematically estimated.},
  addendum = {IF: 3.59},
  author = {Motwalli, Olaa
and Essack, Magbubah
and Jankovic, Boris R.
and Ji, Boyang
and Liu, Xinyao
and Ansari, Hifzur Rahman
and Hoehndorf, Robert
and Gao, Xin
and Arold, Stefan T.
and Mineta, Katsuhiko
and Archer, John A. C.
and Gojobori, Takashi
and Mijakovic, Ivan
and Bajic$^*$, Vladimir B.},
  doi = {10.1186/s12864-016-3389-4},
  issn = {1471-2164},
  journal = {BMC Genomics},
  number = {1},
  pages = {33},
  title = {In silico screening for candidate chassis strains of free fatty acid-producing cyanobacteria},
  url = {http://dx.doi.org/10.1186/s12864-016-3389-4},
  volume = {18},
  year = {2017}
}

Semantic prioritization of novel causative genomic variants

Boudellioua, Imane AND Mahamad Razali

PLOS Computational Biology, vol. 13(4), pp. e1005500 (2017)

Rare diseaseSemantic similarity

@article{pvp-main,
  abstract = {Author summary We address the problem of how to distinguish which of the many thousands of DNA sequence variants carried by an individual with a rare disease is responsible for the disease phenotypes. This can help clinicians arrive at a diagnosis, but also can be instrumental in improving our understanding of the pathobiology of the disease. Many methods are currently available to help with the problem of determining causative variant, using information about evolutionary conservation and prediction of the functional consequences of the sequence variant. We have developed a novel algorithm (PVP) which augments existing strategies by using the similarity of the patients phenotype to known phenotype-genotype data in human and model organism databases to further rank potential candidate genes. In a retrospective study, we apply PVP to the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism, and find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.},
  addendum = {IF: 4.43},
  author = {Boudellioua, Imane AND Mahamad Razali, Rozaimi B. AND Kulmanov, Maxat AND Hashish, Yasmeen AND Bajic, Vladimir B. AND Goncalves-Serra, Eva AND Schoenmakers, Nadia AND Gkoutos, Georgios V. AND Schofield, Paul N. AND Hoehndorf$^*$, Robert},
  doi = {10.1371/journal.pcbi.1005500},
  journal = {PLOS Computational Biology},
  month = {04},
  number = {4},
  pages = {e1005500},
  publisher = {Public Library of Science},
  title = {Semantic prioritization of novel causative genomic variants},
  url = {https://doi.org/10.1371/journal.pcbi.1005500},
  volume = {13},
  year = {2017}
}

Phenotype-driven discovery of digenic variants in personal genome sequences

Imane Boudellioua, Maxat Kulmanov, Paul N Schofield, Georgios V Gkoutos and Robert Hoehndorf

Proceedings of VarI-SIG (2017)

Rare diseaseGenomicsPhenotype informatics

@inproceedings{varisig2017,
  abstract = {Identification of variants associated with inherited diseases is a major challenge, in particular in the analysis of clinical sequence data from individual patients.  An increasing number of Mendelian diseases have been identified in which two or more variants in multiple genes are required to cause the disease, or significantly modify its severity or phenotype. It is difficult to discover such interactions using existing approaches. Information that links patient phenotypes to databases of gene--phenotype associations observed in clinical and basic research can provide useful information and improve variant prioritization for Mendelian diseases.  PhenomeNET is a computational framework that utilized pan-phenomic data from human and non-human model organisms to prioritize candidate genes in genetically based diseases, and we have recently combined PhenomeNET with genome-wide pathogenicity prediction methods into the PhenomeNET Variant Predictor (PVP) that can be used to prioritize variants in inherited diseases.  Here, we illustrate extensions to PVP that can be used to identify variants in oligogenic diseases and their interactions. We inserted multiple variants known to be associated with digenic disease into synthetic genomes and find that PVP can identify sets of causative variants in a hypothesis-neutral manner. Our results show that PVP can efficiently detect oligogenic interactions using a phenotype-driven approach and identify etiologically important variants in whole genomes.},
  author = {Imane Boudellioua and Maxat Kulmanov and Paul N Schofield and Georgios V Gkoutos and Robert Hoehndorf$^*$},
  booktitle = {Proceedings of VarI-SIG},
  month = {July},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  title = {Phenotype-driven discovery of digenic variants in personal genome sequences},
  year = {2017}
}

FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

Bolleman, Jerven T. and Mungall

Journal of Biomedical Semantics, vol. 7(1), pp. 39 (2016)

Applied OntologyOntology engineering

@article{Bolleman2016,
  abstract = {Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples.},
  addendum = {IF: 1.99},
  author = {Bolleman$^*$, Jerven T.
and Mungall, Christopher J.
and Strozzi, Francesco
and Baran, Joachim
and Dumontier, Michel
and Bonnal, Raoul P. J.
and Buels, Robert
and Hoehndorf, Robert
and Fujisawa, Takatomo
and Katayama, Toshiaki
and Cock, Peter A. J.},
  doi = {10.1186/s13326-016-0067-z},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  month = {April},
  number = {1},
  pages = {39},
  title = {FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation},
  url = {http://dx.doi.org/10.1186/s13326-016-0067-z},
  volume = {7},
  year = {2016}
}

Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining

Boudellioua, Imane, Saidi, Rabie, Hoehndorf, Robert, Martin, Maria J. and Solovyev, Victor

PLoS ONE, vol. 11(7), pp. e0158896 (2016)

Protein functionBiomedical informatics

@article{Boudellioua2016,
  abstract = {The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F₁-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations.},
  addendum = {IF: 2.74},
  author = {Boudellioua, Imane and Saidi, Rabie and Hoehndorf, Robert and Martin, Maria J. and Solovyev$^*$, Victor},
  doi = {10.1371/journal.pone.0158896},
  journal = {PLoS ONE},
  month = {July},
  number = {7},
  pages = {e0158896},
  publisher = {Public Library of Science},
  title = {Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining},
  url = {http://dx.doi.org/10.1371%2Fjournal.pone.0158896},
  volume = {11},
  year = {2016}
}

DermO; an ontology for the description of dermatologic disease

Fisher, Hannah M. and Hoehndorf

Journal of Biomedical Semantics, vol. 7(1), pp. 38 (2016)

Applied OntologyPhenotype informatics

@article{Fisher2016,
  abstract = {There have been repeated initiatives to produce standard nosologies and terminologies for cutaneous disease, some dedicated to the domain and some part of bigger terminologies such as ICD-10. Recently, formally structured terminologies, ontologies, have been widely developed in many areas of biomedical research. Primarily, these address the aim of providing comprehensive working terminologies for domains of knowledge, but because of the knowledge contained in the relationships between terms they can also be used computationally for many purposes.},
  addendum = {IF: 1.99},
  author = {Fisher, Hannah M.
and Hoehndorf, Robert
and Bazelato, Bruno S.
and Dadras, Soheil S.
and King, Lloyd E.
and Gkoutos, Georgios V.
and Sundberg, John P.
and Schofield$^*$, Paul N.},
  doi = {10.1186/s13326-016-0085-x},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  month = {June},
  number = {1},
  pages = {38},
  title = {DermO; an ontology for the description of dermatologic disease},
  url = {http://dx.doi.org/10.1186/s13326-016-0085-x},
  volume = {7},
  year = {2016}
}

The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants

Hoehndorf, Robert and Alshahrani

Journal of Biomedical Semantics, vol. 7(1), pp. 65 (2016)

Applied OntologyPhenotype informatics

@article{Hoehndorf2016,
  abstract = {The systematic analysis of a large number of comparable plant trait data can support investigations into phylogenetics and ecological adaptation, with broad applications in evolutionary biology, agriculture, conservation, and the functioning of ecosystems. Floras, i.e., books collecting the information on all known plant species found within a region, are a potentially rich source of such plant trait data. Floras describe plant traits with a focus on morphology and other traits relevant for species identification in addition to other characteristics of plant species, such as ecological affinities, distribution, economic value, health applications, traditional uses, and so on. However, a key limitation in systematically analyzing information in Floras is the lack of a standardized vocabulary for the described traits as well as the difficulties in extracting structured information from free text.},
  addendum = {IF: 1.99},
  author = {Hoehndorf$^*$, Robert
and Alshahrani, Mona
and Gkoutos, Georgios V.
and Gosline, George
and Groom, Quentin
and Hamann, Thomas
and Kattge, Jens
and de Oliveira, Sylvia Mota
and Schmidt, Marco
and Sierra, Soraya
and Smets, Erik
and Vos, Rutger A.
and Weiland, Claus},
  doi = {10.1186/s13326-016-0107-8},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  month = {November},
  number = {1},
  pages = {65},
  title = {The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants},
  url = {http://dx.doi.org/10.1186/s13326-016-0107-8},
  volume = {7},
  year = {2016}
}

DESM: portal for microbial knowledge exploration systems

Salhi, Adil, Essack, Magbubah, Radovanovic, Aleksandar, Marchand, Benoit, Bougouffa, Salim, Antunes, Andre, Simoes, Marta Filipa, Lafi, Feras F., Motwalli, Olaa A., Bokhari, Ameerah, Malas, Tariq, Amoudi, Soha Al, Othum, Ghofran, Allam, Intikhab, Mineta, Katsuhiko, Gao, Xin, Hoehndorf, Robert, C. Archer, John A., Gojobori, Takashi and Bajic, Vladimir B.

Nucleic Acids Research, vol. 44(D1), pp. D624-D633 (2016)

Microbial communitiesOntology engineering

@article{Salhi2016,
  abstract = {Microorganisms produce an enormous variety of chemical compounds. It is of general interest for microbiology and biotechnology researchers to have means to explore information about molecular and genetic basis of functioning of different microorganisms and their ability for bioproduction. To enable such exploration, we compiled 45 topic-specific knowledgebases (KBs) accessible through DESM portal (www.cbrc.kaust.edu.sa/desm). The KBs contain information derived through text-mining of PubMed information and complemented by information data-mined from various other resources (e.g. ChEBI, Entrez Gene, GO, KOBAS, KEGG, UniPathways, BioGrid). All PubMed records were indexed using 4 538 278 concepts from 29 dictionaries, with 1 638 986 records utilized in KBs. Concepts used are normalized whenever possible. Most of the KBs focus on a particular type of microbial activity, such as production of biocatalysts or nutraceuticals. Others are focused on specific categories of microorganisms, e.g. streptomyces or cyanobacteria. KBs are all structured in a uniform manner and have a standardized user interface. Information exploration is enabled through various searches. Users can explore statistically most significant concepts or pairs of concepts, generate hypotheses, create interactive networks of associated concepts and export results. We believe DESM will be a useful complement to the existing resources to benefit microbiology and biotechnology research.},
  addendum = {IF: 16.97},
  author = {Salhi, Adil and Essack, Magbubah and Radovanovic, Aleksandar and Marchand, Benoit and Bougouffa, Salim and Antunes, Andre and Simoes, Marta Filipa and Lafi, Feras F. and Motwalli, Olaa A. and Bokhari, Ameerah and Malas, Tariq and Amoudi, Soha Al and Othum, Ghofran and Allam, Intikhab and Mineta, Katsuhiko and Gao, Xin and Hoehndorf, Robert and C. Archer, John A. and Gojobori, Takashi and Bajic$^*$, Vladimir B.},
  doi = {10.1093/nar/gkv1147},
  journal = {Nucleic Acids Research},
  number = {D1},
  pages = {D624--D633},
  title = {DESM: portal for microbial knowledge exploration systems},
  url = {http://nar.oxfordjournals.org/content/44/D1/D624.abstract},
  volume = {44},
  year = {2016}
}

Using AberOWL for fast and scalable reasoning over BioPortal ontologies

Slater, Luke and Gkoutos

Journal of Biomedical Semantics, vol. 7(1), pp. 49 (2016)

Biomedical informatics

@article{Slater2016,
  abstract = {Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided.},
  addendum = {IF: 1.99},
  author = {Slater$^*$, Luke
and Gkoutos, Georgios V.
and Schofield, Paul N.
and Hoehndorf, Robert},
  doi = {10.1186/s13326-016-0090-0},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  month = {August},
  number = {1},
  pages = {49},
  title = {Using AberOWL for fast and scalable reasoning over BioPortal ontologies},
  url = {http://dx.doi.org/10.1186/s13326-016-0090-0},
  volume = {7},
  year = {2016}
}

Large-Scale Reasoning over Functions in Biomedical Ontologies

Robert Hoehndorf, Liam Mencel, Georgios V. Gkoutos and Paul N. Schofield

Formal Ontology in Information Systems, vol. 283, pp. 299 - 312 (2016)

Applied OntologyOntology engineering

@inproceedings{Hoehndorf2016fois,
  abstract = {A large number of biomedical resources have been developed to represent the functions of biological entities, and these resources are widely used for data integration and analysis. Expressing functions in biomedical ontologies currently uses formal representation patterns that renders basic reasoning tasks to fall in complexity classes beyond polynomial time, thereby limiting the potential of using knowledge-based methods for data integration, querying or quality control. Here, we propose an alternative representation pattern for expressing knowledge about biological functions, together with a biological and ontological justification, which can be expressed using the description logic EL++ and implemented using the OWL 2 EL profile. To demonstrate the utility of our account of biological functions, we apply it to all proteins contained in the SwissProt database and evaluate its utility with respect to answering complex queries as well with respect to the classification and query times.},
  author = {Robert Hoehndorf$^*$ and Liam Mencel and Georgios V. Gkoutos and Paul N. Schofield},
  booktitle = {Formal Ontology in Information Systems},
  month = {July},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  pages = {299 -- 312},
  publisher = {IOS Press},
  series = {Frontiers in Artificial Intelligence and Applications},
  title = {Large-Scale Reasoning over Functions in Biomedical Ontologies},
  url = {http://ebooks.iospress.nl/publication/44256},
  volume = {283},
  year = {2016}
}

To MIREOT or not to MIREOT? A case study of the impact of using MIREOT in the Experimental Factor Ontology (EFO)

Luke Slater, Georgios V. Gkoutos, Paul N Schofield and Robert Hoehndorf

International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016) (2016)

Ontology engineeringApplied Ontology

@inproceedings{IT702,
  abstract = {MIREOT is a mechanism for the selective re-use of individual ontology classes in other ontologies. Designed to minimise effort and to support orthogonality, it is now in widespread use. The consequences for ontology integrity and automated reasoning of using the MIREOT mechanism have so far not been fully assessed. In this paper, we perform an analysis of the Experimental Factor Ontology (EFO), an ontology which uses the MIREOT process to gather classes from a large range of other ontologies. Our study examines the effect of combining EFO with the ontologies it references by actually importing them into the EFO. We then evaluate the consistency and status of the combined ontologies. Through our investigation, we reveal that EFO in combination with all its referenced ontologies is logically inconsistent. Furthermore, when EFO is individually combined with many of the ontologies it references, we find a large number of unsatisfiable classes. These results demonstrate a potential problem within a major ontological ecosystem, and reveals possible disadvantages to the use of the MIREOT system for developing ontologies.},
  author = {Luke Slater$^*$ and Georgios V. Gkoutos and Paul N Schofield and Robert Hoehndorf},
  booktitle = {International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016)},
  month = {August},
  organization = {ICBO and BioCreative},
  publisher = {ICBO and BioCreative},
  series = {Proceedings of the Joint International Conference on Biological Ontology and BioCreative (2016)},
  title = {To MIREOT or not to MIREOT? A case study of the impact of using MIREOT in the Experimental Factor Ontology (EFO)},
  url = {http://icbo.cgrb.oregonstate.edu/},
  year = {2016}
}

Evaluating the effect of annotation size on measures of semantic similarity

Maxat Kulmanov and Robert Hoehndorf

Proceedings of Bio-Ontologies SIG (2016)

Semantic similarityApplied Ontology

@inproceedings{Kulmanov2016,
  abstract = {Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products. Here, we analyze a large number of semantic similarity measures and the sensitivity of similarity values to the number of annotations of entities as well as to the difference in annotation size. We find that most similarity measures are sensitive to the number of annotations of entities as well as to the difference in annotation size; well-studied and richly annotated entities will usually show higher similarity than entities with only few annotations even in the absence of any biological relation. Our findings have significant impact on the interpretation of results that rely on measures of semantic similarity.},
  author = {Maxat Kulmanov and Robert Hoehndorf$^*$},
  booktitle = {Proceedings of Bio-Ontologies SIG},
  month = {July},
  title = {Evaluating the effect of annotation size on measures of semantic similarity},
  year = {2016}
}

Integrating phenotype ontologies with PhenomeNET

Miguel Rodriguez-Garcia, Georgios V. Gkoutos, Paul N. Schofield and Robert Hoehndorf

Proceedings of Ontology Matching Workshop 2016 (2016)

Rare diseasePhenotype informatics

@inproceedings{Miguel2016,
  abstract = {PhenomeNET is a system for disease gene prioritization that includes as one of its components an ontology designed to integrate phenotype ontologies. While not applicable to matching arbitrary ontologies, PhenomeNET can be used to identify related phenotypes in different species, including human, mouse, zebrafish, nematode worm, fruit fly, and yeast. Here, we apply the PhenomeNET to identify related classes from four phenotype and disease ontologies using automated reasoning. We demonstrate that we can identify a large number of mappings, some of which require automated reasoning and cannot easily be identified through lexical approaches alone.},
  author = {Miguel Rodriguez-Garcia and Georgios V. Gkoutos and Paul N. Schofield and Robert Hoehndorf$^*$},
  booktitle = {Proceedings of Ontology Matching Workshop 2016},
  month = {October},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  title = {Integrating phenotype ontologies with PhenomeNET },
  url = {http://www.dit.unitn.it/~pavel/om2016/papers/oaei16_paper12.pdf},
  year = {2016}
}

Experiences with Aber-OWL, an Ontology Repository with OWL EL Reasoning

Slater, Luke and Rodriguez-Garcia

Ontology Engineering: 12th International Experiences and Directions Workshop on OWL, OWLED 2015, co-located with ISWC 2015, Bethlehem, PA, USA, October 9-10, 2015, Revised Selected Papers, pp. 81-86, In: Tamma, Valentina and Dragoni (Ed.) (2016)

Ontology engineering

@inbook{Slater2016owled,
  abstract = {{Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided.}},
  address = {Cham},
  author = {Slater, Luke
and Rodriguez-Garcia, Miguel
and O'Shea, Keiron
and Schofield, Paul N.
and Gkoutos, Georgios V.
and Hoehndorf, Robert},
  booktitle = {Ontology Engineering: 12th International Experiences and Directions Workshop on OWL, OWLED 2015, co-located with ISWC 2015, Bethlehem, PA, USA, October 9-10, 2015, Revised Selected Papers},
  doi = {10.1007/978-3-319-33245-1_8},
  editor = {Tamma, Valentina
and Dragoni, Mauro
and Gon{\c{c}}alves, Rafael
and {\L}awrynowicz, Agnieszka},
  isbn = {978-3-319-33245-1},
  pages = {81--86},
  publisher = {Springer International Publishing},
  title = {Experiences with Aber-OWL, an Ontology Repository with OWL EL Reasoning},
  url = {http://dx.doi.org/10.1007/978-3-319-33245-1_8},
  year = {2016}
}

SPARQL2OWL: Towards Bridging the Semantic Gap Between RDF and OWL

Mona Alshahrani and Hussein Almashouq and Robert Hoehndorf

Proceedings of the Joint International Conference on Biological Ontology and BioCreative, Corvallis, Oregon, United States, August 1-4, 2016. (2016)

Ontology engineering

@inproceedings{sparql2owl,
  abstract = {Several  large  databases  in  biology  are  now  making  theirinformation available through the Resource Description Framework(RDF). RDF can be used for large datasets and provides agraph-based semantics.  The Web Ontology Language (OWL),another Semantic Web standard, provides a more formal, model-theoretic semantics. While some approaches combine RDF andOWL, for example for querying,  knowledge in RDF and OWLis often expressed differently.  Here,  we propose a method togenerate OWL ontologies from SPARQL queries usingn-aryrelational patterns.  Combined with background knowledge fromontologies, the generated OWL ontologies can be used for expressivequeries and quality control of RDF data.  We implement ourmethod in a a prototype tool available athttps://github.com/bio-ontology-research-group/SPARQL2OWL},
  author = {Mona Alshahrani and
Hussein Almashouq and
Robert Hoehndorf$^*$},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/icbo/AlshahraniAH16},
  booktitle = {Proceedings of the Joint International Conference on Biological Ontology
and BioCreative, Corvallis, Oregon, United States, August 1-4, 2016.},
  month = {August},
  timestamp = {Tue, 07 Feb 2017 15:50:35 +0100},
  title = {{SPARQL2OWL:} Towards Bridging the Semantic Gap Between {RDF} and
{OWL}},
  url = {http://ceur-ws.org/Vol-1747/D102\_ICBO2016.pdf},
  year = {2016}
}

Datamining with Ontologies

Hoehndorf, Robert and Gkoutos

Data Mining Techniques for the Life Sciences, pp. 385-397, In: Carugo, Oliviero and Eisenhaber (Ed.) (2016)

Biomedical informaticsApplied Ontology

@inbook{Hoehndorf2016abc,
  abstract = {The use of ontologies has increased rapidly over the past decade and they now provide a key component of most major databases in biology and biomedicine. Consequently, datamining over these databases benefits from considering the specific structure and content of ontologies, and several methods have been developed to use ontologies in datamining applications. Here, we discuss the principles of ontology structure, and datamining methods that rely on ontologies. The impact of these methods in the biological and biomedical sciences has been profound and is likely to increase as more datasets are becoming available using common, shared ontologies.},
  address = {New York, NY},
  author = {Hoehndorf, Robert
and Gkoutos, Georgios V.
and Schofield, Paul N.},
  booktitle = {Data Mining Techniques for the Life Sciences},
  doi = {10.1007/978-1-4939-3572-7_19},
  editor = {Carugo, Oliviero
and Eisenhaber, Frank},
  isbn = {978-1-4939-3572-7},
  month = {May},
  pages = {385--397},
  publisher = {Springer New York},
  title = {Datamining with Ontologies},
  url = {http://dx.doi.org/10.1007/978-1-4939-3572-7_19},
  year = {2016}
}

Aber-OWL: a framework for ontology-based data access in biology

Robert Hoehndorf, Luke Slater, Paul N Schofield and Georgios V Gkoutos

BMC Bioinformatics, vol. 16, pp. 26 (2015)

Ontology engineeringBiomedical informatics

@article{aberowl,
  abstract = {Background: Many ontologies have been developed in biology and these ontologies increasingly contain large volumes of formalized knowledge commonly expressed in the Web Ontology Language (OWL). Computational access to the knowledge contained within these ontologies relies on the use of automated reasoning. Results: We have developed the Aber-OWL infrastructure that provides reasoning services for bio-ontologies. Aber-OWL consists of an ontology repository, a set of web services and web interfaces that enable ontology-based semantic access to biological data and literature. Aber-OWL is freely available at http://aber-owl.net. Conclusions: Aber-OWL provides a framework for automatically accessing information that is annotated with ontologies or contains terms used to label classes in ontologies. When using Aber-OWL, access to ontologies and data annotated with them is not merely based on class names or identifiers but rather on the knowledge the ontologies contain and the inferences that can be drawn from it.},
  addendum = {IF: 3.17},
  author = {Robert Hoehndorf$^*$ and Luke Slater and Paul N Schofield and Georgios V Gkoutos},
  journal = {BMC Bioinformatics},
  optannote = {},
  optkey = {},
  pages = {26},
  title = {Aber-OWL: a framework for ontology-based data access in biology},
  url = {http://www.biomedcentral.com/1471-2105/16/26/abstract},
  volume = {16},
  year = {2015}
}

GFVO: the Genomic Feature and Variation Ontology

Baran, Joachim, Durgahee, Bibi Sehnaaz Begum, Eilbeck, Karen, Antezana, Erick, Hoehndorf, Robert and Dumontier, Michel

PeerJ, vol. 3, pp. e933 (2015)

Applied OntologyGenomics

@article{Baran2015,
  abstract = {Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO) specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA), GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations.\textbf{Availability and implementation.} The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology’s GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations); examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0) and therefore \textit{de facto} within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.},
  addendum = {IF: 2.38},
  author = {Baran, Joachim and Durgahee, Bibi Sehnaaz Begum and Eilbeck, Karen and Antezana, Erick and Hoehndorf, Robert and Dumontier$^*$, Michel},
  doi = {10.7717/peerj.933},
  issn = {2167-8359},
  journal = {PeerJ},
  keywords = {Bioinformatics, Genomics, Ontology},
  month = {5},
  pages = {e933},
  title = {GFVO: the Genomic Feature and Variation Ontology},
  url = {https://dx.doi.org/10.7717/peerj.933},
  volume = {3},
  year = {2015}
}

Best behaviour? Ontologies and the formal description of animal behaviour

Gkoutos, Georgios V, Hoehndorf, Robert, Tsaprouni, Loukia and Schofield, Paul N

Mammalian Genome, vol. 26(9--10), pp. 540-547 (2015)

Applied OntologyPhenotype informatics

@article{bestbehavior,
  abstract = {The development of ontologies for describing animal behavior has proved to be one of the
most difficult of all scientific knowledge domains. Ranging from neurological processes to
human emotions the range and scope needed for such ontologies is highly challenging, but if
data integration and computational tools such as automated reasoning are to be fully applied
in this important area the underlying principles of these ontologies need to be better
established and development needs detailed coordination. Whilst the state of scientific
knowledge is always paramount in ontology and formal description framework design, this is
a particular problem with neurobehavioural ontologies where our understanding of the
relationship between behaviour and its underlying biophysical basis is currently in its infancy.
In this commentary we discuss some of the fundamental problems in designing and using
behavior ontologies, and present some of the best developed tools in this domain.},
  addendum = {IF: 2.08},
  author = {Gkoutos, Georgios V and Hoehndorf, Robert and Tsaprouni, Loukia and Schofield$^*$, Paul N},
  doi = {10.1007/s00335-015-9590-y},
  issn = {0938-8990},
  journal = {Mammalian Genome},
  language = {English},
  month = {July},
  number = {9--10},
  pages = {540--547},
  publisher = {Springer US},
  title = {Best behaviour? Ontologies and the formal description of animal behaviour},
  url = {http://dx.doi.org/10.1007/s00335-015-9590-y},
  volume = {26},
  year = {2015}
}

Analysis of mammalian gene function through broad-based phenotypic screens across a consortium of mouse clinics

Martin Hrabě de Angelis, George Nicholson, Mohammed Selloum, Jacqueline K White, Hugh Morgan, Ramiro Ramirez-Solis, Tania Sorg, Sara Wells, Helmut Fuchs, Martin Fray, David J Adams, Niels C Adams, Thure Adler, Antonio Aguilar-Pimentel, Dalila Ali-Hadji, Gregory Amann, Philippe André, Sarah Atkins, Aurelie Auburtin, Abdel Ayadi, Julien Becker, Lore Becker, Elodie Bedu, Raffi Bekeredjian, Marie-Christine Birling, Andrew Blake, Joanna Bottomley, Michael R Bowl, Véronique Brault, Dirk H Busch, James N Bussell, Julia Calzada-Wack, Heather Cater, Marie-France Champy, Philippe Charles, Claire Chevalier, Francesco Chiani, Gemma F Codner, Roy Combe, Roger Cox, Emilie Dalloneau, André Dierich, Armida Di Fenza, Brendan Doe, Arnaud Duchon, Oliver Eickelberg, Chris T Esapa, Lahcen El Fertak, Tanja Feigel, Irina Emelyanova, Jeanne Estabel, Jack Favor, Ann Flenniken, Alessia Gambadoro, Lilian Garrett, Hilary Gates, Anna-Karin Gerdin, George Gkoutos, Simon Greenaway, Lisa Glasl, Patrice Goetz, Isabelle Goncalves Da Cruz, Alexander Götz, Jochen Graw, Alain Guimond, Wolfgang Hans, Geoff Hicks, Sabine M Hölter, Heinz Höfler, John M Hancock, Robert Hoehndorf, Tertius Hough, Richard Houghton, Anja Hurt, Boris Ivandic, Hughes Jacobs, Sylvie Jacquot, Nora Jones, Natasha A Karp, Hugo A Katus, Sharon Kitchen, Tanja Klein-Rodewald, Martin Klingenspor, Thomas Klopstock, Valerie Lalanne, Sophie Leblanc, Christoph Lengger, Elise le Marchand, Tonia Ludwig, Aline Lux, Colin McKerlie, Holger Maier, Jean-Louis Mandel, Susan Marschall, Manuel Mark, David G Melvin, Hamid Meziane, Kateryna Micklich, Christophe Mittelhauser, Laurent Monassier, David Moulaert, Stéphanie Muller, Beatrix Naton, Frauke Neff, Patrick M Nolan, Lauryl M J Nutter, Markus Ollert, Guillaume Pavlovic, Natalia S Pellegata, Emilie Peter, Benoit Petit-Demoulière, Amanda Pickard, Christine Podrini, Paul Potter, Laurent Pouilly, Oliver Puk, David Richardson, Stephane Rousseau, Leticia Quintanilla-Fend, Mohamed M Quwailid, Ildiko Racz, Birgit Rathkolb, Fabrice Riet, Janet Rossant, Michel Roux, Jan Rozman, Edward Ryder, Jennifer Salisbury, Luis Santos, Karl-Heinz Schäble, Evelyn Schiller, Anja Schrewe, Holger Schulz, Ralf Steinkamp, Michelle Simon, Michelle Stewart, Claudia Stöger, Tobias Stöger, Minxuan Sun, David Sunter, Lydia Teboul, Isabelle Tilly, Glauco P Tocchini-Valentini, Monica Tost, Irina Treise, Laurent Vasseur, Emilie Velot, Daniela Vogt-Weisenhorn, Christelle Wagner, Alison Walling, Marie Wattenhofer-Donze, Bruno Weber, Olivia Wendling, Henrik Westerberg, Monja Willershäuser, Eckhard Wolf, Anne Wolter, Joe Wood, Wolfgang Wurst, Ali Önder Yildirim, Ramona Zeh, Andreas Zimmer, Annemarie Zimprich, Chris Holmes, Karen P Steel, Yann Herault, Valérie Gailus-Durner, Ann-Marie Mallon and Steve D M Brown

Nature Genetics, vol. 47, pp. 969-978 (2015)

Phenotype informaticsBiomedical informatics

@article{deAngelis2015,
  abstract = {The function of the majority of genes in the mouse and human genomes remains unknown. The mouse embryonic stem cell knockout resource provides a basis for the characterization of relationships between genes and phenotypes. The EUMODIC consortium developed and validated robust methodologies for the broad-based phenotyping of knockouts through a pipeline comprising 20 disease-oriented platforms. We developed new statistical methods for pipeline design and data analysis aimed at detecting reproducible phenotypes with high power. We acquired phenotype data from 449 mutant alleles, representing 320 unique genes, of which half had no previous functional annotation. We captured data from over 27,000 mice, finding that 83\% of the mutant lines are phenodeviant, with 65\% demonstrating pleiotropy. Surprisingly, we found significant differences in phenotype annotation according to zygosity. New phenotypes were uncovered for many genes with previously unknown function, providing a powerful basis for hypothesis generation and further investigation in diverse systems.},
  addendum = {IF: 27.60},
  author = {Martin Hrab{\v{e}} de Angelis and George Nicholson and Mohammed Selloum and Jacqueline K White and Hugh Morgan and Ramiro Ramirez-Solis and Tania Sorg and Sara Wells and Helmut Fuchs and Martin Fray and David J Adams and Niels C Adams and Thure Adler and Antonio Aguilar-Pimentel and Dalila Ali-Hadji and Gregory Amann and Philippe Andr{\'{e}} and Sarah Atkins and Aurelie Auburtin and Abdel Ayadi and Julien Becker and Lore Becker and Elodie Bedu and Raffi Bekeredjian and Marie-Christine Birling and Andrew Blake and Joanna Bottomley and Michael R Bowl and V{\'{e}}ronique Brault and Dirk H Busch and James N Bussell and Julia Calzada-Wack and Heather Cater and Marie-France Champy and Philippe Charles and Claire Chevalier and Francesco Chiani and Gemma F Codner and Roy Combe and Roger Cox and Emilie Dalloneau and Andr{\'{e}} Dierich and Armida Di Fenza and Brendan Doe and Arnaud Duchon and Oliver Eickelberg and Chris T Esapa and Lahcen El Fertak and Tanja Feigel and Irina Emelyanova and Jeanne Estabel and Jack Favor and Ann Flenniken and Alessia Gambadoro and Lilian Garrett and Hilary Gates and Anna-Karin Gerdin and George Gkoutos and Simon Greenaway and Lisa Glasl and Patrice Goetz and Isabelle Goncalves Da Cruz and Alexander G\"{o}tz and Jochen Graw and Alain Guimond and Wolfgang Hans and Geoff Hicks and Sabine M H\"{o}lter and Heinz H\"{o}fler and John M Hancock and Robert Hoehndorf and Tertius Hough and Richard Houghton and Anja Hurt and Boris Ivandic and Hughes Jacobs and Sylvie Jacquot and Nora Jones and Natasha A Karp and Hugo A Katus and Sharon Kitchen and Tanja Klein-Rodewald and Martin Klingenspor and Thomas Klopstock and Valerie Lalanne and Sophie Leblanc and Christoph Lengger and Elise le Marchand and Tonia Ludwig and Aline Lux and Colin McKerlie and Holger Maier and Jean-Louis Mandel and Susan Marschall and Manuel Mark and David G Melvin and Hamid Meziane and Kateryna Micklich and Christophe Mittelhauser and Laurent Monassier and David Moulaert and St{\'{e}}phanie Muller and Beatrix Naton and Frauke Neff and Patrick M Nolan and Lauryl M J Nutter and Markus Ollert and Guillaume Pavlovic and Natalia S Pellegata and Emilie Peter and Benoit Petit-Demouli{\`{e}}re and Amanda Pickard and Christine Podrini and Paul Potter and Laurent Pouilly and Oliver Puk and David Richardson and Stephane Rousseau and Leticia Quintanilla-Fend and Mohamed M Quwailid and Ildiko Racz and Birgit Rathkolb and Fabrice Riet and Janet Rossant and Michel Roux and Jan Rozman and Edward Ryder and Jennifer Salisbury and Luis Santos and Karl-Heinz Sch\"{a}ble and Evelyn Schiller and Anja Schrewe and Holger Schulz and Ralf Steinkamp and Michelle Simon and Michelle Stewart and Claudia St\"{o}ger and Tobias St\"{o}ger and Minxuan Sun and David Sunter and Lydia Teboul and Isabelle Tilly and Glauco P Tocchini-Valentini and Monica Tost and Irina Treise and Laurent Vasseur and Emilie Velot and Daniela Vogt-Weisenhorn and Christelle Wagner and Alison Walling and Marie Wattenhofer-Donze and Bruno Weber and Olivia Wendling and Henrik Westerberg and Monja Willersh\"{a}user and Eckhard Wolf and Anne Wolter and Joe Wood and Wolfgang Wurst and Ali \"{O}nder Yildirim and Ramona Zeh and Andreas Zimmer and Annemarie Zimprich and Chris Holmes and Karen P Steel and Yann Herault and Val{\'{e}}rie Gailus-Durner and Ann-Marie Mallon and Steve D M Brown$^*$},
  journal = {Nature Genetics},
  month = {July},
  pages = {969--978},
  publisher = {Nature Publishing Group},
  title = {Analysis of mammalian gene function through broad-based phenotypic screens across a consortium of mouse clinics},
  url = {http://dx.doi.org/10.1038/ng.3360},
  volume = {47},
  year = {2015}
}

Ranking Adverse Drug Reactions With Crowdsourcing

Gottlieb, Assaf and Hoehndorf

J Med Internet Res, vol. 17(3), pp. e80 (2015)

Drug mechanismsBiomedical informatics

@article{Gottlieb2015,
  abstract = {Background: There is no publicly available resource that provides the relative severity of adverse drug reactions (ADRs). Such a resource would be useful for several applications, including assessment of the risks and benefits of drugs and improvement of patient-centered care. It could also be used to triage predictions of drug adverse events. Objective: The intent of the study was to rank ADRs according to severity. Methods: We used Internet-based crowdsourcing to rank ADRs according to severity. We assigned 126,512 pairwise comparisons of ADRs to 2589 Amazon Mechanical Turk workers and used these comparisons to rank order 2929 ADRs. Results: There is good correlation (rho=.53) between the mortality rates associated with ADRs and their rank. Our ranking highlights severe drug-ADR predictions, such as cardiovascular ADRs for raloxifene and celecoxib. It also triages genes associated with severe ADRs such as epidermal growth-factor receptor (EGFR), associated with glioblastoma multiforme, and SCN1A, associated with epilepsy. Conclusions: ADR ranking lays a first stepping stone in personalized drug risk assessment. Ranking of ADRs using crowdsourcing may have useful clinical and financial implications, and should be further investigated in the context of health care decision making. },
  addendum = {IF: 5.03},
  author = {Gottlieb, Assaf
and Hoehndorf, Robert
and Dumontier, Michel
and Altman$^*$, B. Russ},
  day = {23},
  doi = {10.2196/jmir.3962},
  journal = {J Med Internet Res},
  keywords = {pharmacovigilance; adverse drug reactions; drug side effects; crowdsourcing; patient-centered care; alert fatigue},
  month = {March},
  number = {3},
  pages = {e80},
  title = {Ranking Adverse Drug Reactions With Crowdsourcing},
  url = {http://www.jmir.org/2015/3/e80/},
  volume = {17},
  year = {2015}
}

Similarity-based search of model organism, disease and drug effect phenotypes

Hoehndorf, Robert, Gruenberger, Michael, Gkoutos, Georgios and Schofield, Paul

Journal of Biomedical Semantics, vol. 6(1), pp. 6 (2015)

Semantic similarityRare diseaseDrug mechanisms

@article{Hoehndorf2015phenome2,
  abstract = {BACKGROUND:Semantic similarity measures over phenotype ontologies have been demonstrated to provide a powerful approach for the analysis of model organism phenotypes, the discovery of animal models of human disease, novel pathways, gene functions, druggable therapeutic targets, and determination of pathogenicity.RESULTS:We have developed PhenomeNET 2, a system that enables similarity-based searches over a large repository of phenotypes in real-time. It can be used to identify strains of model organisms that are phenotypically similar to human patients, diseases that are phenotypically similar to model organism phenotypes, or drug effect profiles that are similar to the phenotypes observed in a patient or model organism. PhenomeNET 2 is available at http://aber-owl.net/phenomenet webcite.CONCLUSIONS:Phenotype-similarity searches can provide a powerful tool for the discovery and investigation of molecular mechanisms underlying an observed phenotypic manifestation. PhenomeNET 2 facilitates user-defined similarity searches and allows researchers to analyze their data within a large repository of human, mouse and rat phenotypes.},
  addendum = {IF: 1.99},
  author = {Hoehndorf$^*$, Robert and Gruenberger, Michael and Gkoutos, Georgios and Schofield, Paul},
  doi = {10.1186/s13326-015-0001-9},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  number = {1},
  pages = {6},
  pubmedid = {25763178},
  title = {Similarity-based search of model organism, disease and drug effect phenotypes},
  url = {http://www.jbiomedsem.com/content/6/1/6},
  volume = {6},
  year = {2015}
}

The role of ontologies in biological and biomedical research: a functional perspective

Hoehndorf, Robert, Schofield, Paul N. and Gkoutos, Georgios V.

Briefings in Bioinformatics, vol. 16(6), pp. 1069-1080 (2015)

Applied OntologyBiomedical informatics

@article{Hoehndorf2015role,
  abstract = {Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.},
  addendum = {IF: 11.62},
  author = {Hoehndorf$^*$, Robert and Schofield, Paul N. and Gkoutos, Georgios V.},
  journal = {Briefings in Bioinformatics},
  month = {March},
  number = {6},
  pages = {1069--1080},
  title = {The role of ontologies in biological and biomedical research: a functional perspective},
  url = {https://academic.oup.com/bib/article/16/6/1069/226068},
  volume = {16},
  year = {2015}
}

Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases

Robert Hoehndorf, Paul N Schofield and Georgios V Gkoutos

Scientific Reports, vol. 5, pp. 10888 (2015)

Rare diseaseSemantic similarity

@article{Hoehndorf2015srep,
  abstract = {Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 6,000 diseases. We evaluate our text-mined phenotypes by demonstrating that they can correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that have similar signs and symptoms cluster together, and we use this network to identify closely related diseases based on common etiological, anatomical as well as physiological underpinnings.},
  addendum = {IF: 4.00},
  author = {Robert Hoehndorf$^*$ and Paul N Schofield and Georgios V Gkoutos},
  journal = {Scientific Reports},
  month = {June},
  optannote = {},
  optkey = {},
  optnote = {},
  optnumber = {},
  pages = {10888},
  title = {Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases},
  url = {http://www.nature.com/srep/2015/150608/srep10888/full/srep10888.html},
  volume = {5},
  year = {2015}
}

An ontology approach to comparative phenomics in plants

Oellrich, Anika, Walls, Ramona, Cannon, Ethalinda, Cannon, Steven, Cooper, Laurel, Gardiner, Jack, Gkoutos, Georgios, Harper, Lisa, He, Mingze, Hoehndorf, Robert, Jaiswal, Pankaj, Kalberer, Scott, Lloyd, John, Meinke, David, Menda, Naama, Moore, Laura, Nelson, Rex, Pujar, Anuradha, Lawrence, Carolyn and Huala, Eva

Plant Methods, vol. 11(1), pp. 10 (2015)

Applied OntologyPhenotype informatics

BACKGROUND:Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework.RESULTS:We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.CONCLUSIONS:The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.

@article{Oellrich2015,
  abstract = {BACKGROUND:Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework.RESULTS:We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.CONCLUSIONS:The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.},
  addendum = {IF: 4.27},
  author = {Oellrich, Anika and Walls, Ramona and Cannon, Ethalinda and Cannon, Steven and Cooper, Laurel and Gardiner, Jack and Gkoutos, Georgios and Harper, Lisa and He, Mingze and Hoehndorf, Robert and Jaiswal, Pankaj and Kalberer, Scott and Lloyd, John and Meinke, David and Menda, Naama and Moore, Laura and Nelson, Rex and Pujar, Anuradha and Lawrence, Carolyn and Huala$^*$, Eva},
  doi = {10.1186/s13007-015-0053-y},
  issn = {1746-4811},
  journal = {Plant Methods},
  month = {February},
  note = {Anika Oellrich and Ramona L Walls contributed equally to this work.},
  number = {1},
  pages = {10},
  pubmedid = {25774204},
  title = {An ontology approach to comparative phenomics in plants},
  url = {http://www.plantmethods.com/content/11/1/10},
  volume = {11},
  year = {2015}
}

Using Aber-OWL for fast and scalable reasoning over BioPortal ontologies

Luke Slater, Georgios Gkoutos, Paul N. Schofield and Robert Hoehndorf

Proceedings of International Conference on Biomedical Ontologies (ICBO), pp. 72-76 (2015)

Ontology engineering

@inproceedings{Slater2015,
  abstract = {Reasoning over biomedical ontologies using their OWL semantics
has traditionally been a challenging task due to the high theoretical
complexity of OWL-based automated reasoning. As a consequence,
ontology repositories, as well as most other tools utilizing ontologies,
either provide access to ontologies without use of automated
reasoning, or limit the number of ontologies for which automated
reasoning-based access is provided. We apply the Aber-OWL
infrastructure to provide automated reasoning-based access to all
accessible and consistent ontologies in BioPortal (368 ontologies).
We perform an extensive performance evaluation to determine query
times, both for queries of different complexity as well as for queries
that are performed in parallel over the ontologies. We demonstrate
that, with the exception of a few ontologies, even complex and parallel
queries can now be answered in milliseconds, therefore allowing
automated reasoning to be used on a large scale, to run in parallel,
and with rapid response times.},
  author = {Luke Slater and Georgios Gkoutos and Paul N. Schofield and Robert Hoehndorf},
  booktitle = {Proceedings of International Conference on Biomedical Ontologies (ICBO)},
  month = {July},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  pages = {72-76},
  title = {Using Aber-OWL for fast and scalable reasoning over BioPortal ontologies},
  year = {2015}
}

AberOWL: an ontology portal with OWL EL reasoning

Luke Slater, Georgios Gkoutos, Paul N. Schofield and Robert Hoehndorf

Proceedings of International Conference on Biomedical Ontologies (ICBO), pp. 127-128 (2015)

Ontology engineering

@inproceedings{Slater2015b,
  abstract = {The field of biological and biomedical science quickly generate
large quantities of data and knowledge; often, domain knowledge
is formalised using ontologies expressed in the Web Ontology
Language (OWL). Ontology repositories such as Bioportal and
Ontobee have been an important infrastructural component for
managing ontologies, specifically to search, browse and download
ontologies over the Web. We present the AberOWL system, a novel
ontology repository that allows access to multiple ontologies through
automated reasoning, utilizing parts of the OWL of the ontologies
alongside a web interface and web services. AberOWL contains over
300 ontologies and integrates reasoning over ontologies with access
to literature and SPARQL endpoints.},
  author = {Luke Slater and Georgios Gkoutos and Paul N. Schofield and Robert Hoehndorf},
  booktitle = {Proceedings of International Conference on Biomedical Ontologies (ICBO)},
  month = {July},
  optaddress = {},
  optannote = {},
  opteditor = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  pages = {127-128},
  title = {AberOWL: an ontology portal with OWL EL reasoning},
  year = {2015}
}

The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery

Dumontier, Michel, Baker, Christopher, Baran, Joachim, Callahan, Alison, Chepelev, Leonid, Cruz-Toledo, Jose, Del Rio, Nicholas, Duck, Geraint, Furlong, Laura, Keath, Nichealla, Klassen, Dana, McCusker, James, Queralt-Rosinach, Nuria, Samwald, Matthias, Villanueva-Rosales, Natalia, Wilkinson, Mark and Hoehndorf, Robert

Journal of Biomedical Semantics, vol. 5(1), pp. 14 (2014)

Applied OntologyOntology engineering

@article{sio,
  abstract = {The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org webcite.},
  addendum = {IF: 1.99},
  author = {Dumontier$^*$, Michel and Baker, Christopher and Baran, Joachim and Callahan, Alison and Chepelev, Leonid and Cruz-Toledo, Jose and Del Rio, Nicholas and Duck, Geraint and Furlong, Laura and Keath, Nichealla and Klassen, Dana and McCusker, James and Queralt-Rosinach, Nuria and Samwald, Matthias and Villanueva-Rosales, Natalia and Wilkinson, Mark and Hoehndorf, Robert},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  number = {1},
  pages = {14},
  pubmedid = {24602174},
  title = {The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery},
  url = {http://www.jbiomedsem.com/content/5/1/14},
  volume = {5},
  year = {2014}
}

BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains.

Katayama, Toshiaki and Wilkinson

Journal of biomedical semantics, vol. 5(1), pp. 5 (2014)

Biomedical informatics

@article{Katayama2014,
  abstract = {The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.},
  addendum = {IF: 1.99},
  author = {Katayama, Toshiaki
and Wilkinson, Mark D.
and Aoki-Kinoshita, Kiyoko F.
and Kawa\-shima, Shuichi
and Yamamoto, Yasunori
and Yamaguchi, Atsuko
and Okamoto, Shinobu
and Kawano, Shin
and Kim, Jin-Dong
and Wang, Yue
and Wu, Hongyan
and Kano, Yoshinobu
and Ono, Hiromasa
and Bono, Hidemasa
and Kocbek, Simon
and Aerts, Jan
and Akune, Yukie
and Antezana, Erick
and Arakawa, Kazuharu
and Aranda, Bruno
and Baran, Joachim
and Bolleman, Jerven
and Bonnal, Raoul Jp
and Buttigieg, Pier Luigi
and Campbell, Matthew P.
and Chen, Yi-An
and Chiba, Hirokazu
and Cock, Peter Ja
and Cohen, Kevin B.
and Constantin, Alexandru
and Duck, Geraint
and Dumontier, Michel
and Fujisawa, Takatomo
and Fujiwara, Toyofumi
and Goto, Naohisa
and Hoehndorf, Robert
and Igarashi, Yoshinobu
and Itaya, Hidetoshi
and Ito, Maori
and Iwasaki, Wataru
and Kala, Mat{\'u}
and Katoda, Takeo
and Kim, Taehong
and Kokubu, Anna
and Komiyama, Yusuke
and Kotera, Masaaki
and Laibe, Camille
and Lapp, Hilmar
and L{\"u}tteke, Thomas
and Marshall, M. Scott
and Mori, Takaaki
and Mori, Hiroshi
and Morita, Mizuki
and Murakami, Katsuhiko
and Nakao, Mitsuteru
and Narimatsu, Hisashi
and Nishide, Hiroyo
and Nishimura, Yosuke
and Nystrom-Persson, Johan
and Ogishima, Soichi
and Okamura, Yasunobu
and Okuda, Shujiro
and Oshita, Kazuki
and Packer, Nicki H.
and Prins, Pjotr
and Ranzinger, Rene
and Rocca-Serra, Philippe
and Sansone, Susanna
and Sawaki, Hiromichi
and Shin, Sung-Ho
and Splendiani, Andrea
and Strozzi, Francesco
and Tadaka, Shu
and Toukach, Philip
and Uchiyama, Ikuo
and Umezaki, Masahito
and Vos, Rutger
and Whetzel, Patricia L.
and Yamada, Issaku
and Yamasaki, Chisato
and Yamashita, Riu
and York, William S.
and Zmasek, Christian M.
and Kawamoto, Shoko
and Takagi$^*$, Toshihisa},
  doi = {10.1186/2041-1480-5-5},
  issn = {2041-1480},
  journal = {Journal of biomedical semantics},
  number = {1},
  pages = {5},
  title = {BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains.},
  url = {http://www.jbiomedsem.com/content/5/1/5/abstract},
  volume = {5},
  year = {2014}
}

Thematic series on biomedical ontologies in JBMS: challenges and new directions

Hoehndorf, Robert, Haendel, Melissa, Stevens, Robert and Rebholz-Schuhmann, Dietrich

Journal of Biomedical Semantics, vol. 5(1), pp. 15 (2014)

Applied Ontology

@article{Hoehndorf2014thematicseries,
  abstract = {Over the past 15 years, the biomedical research community has increased its efforts to produce ontologies encoding biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are becoming a central part of biological and biomedical research, a communication channel to publish frequent updates and latest developments on them would be an advantage.Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the latest developments in research on biomedical ontologies and provide a venue for publishing newly developed ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research in biomedical ontologies and intend to work closely together with the research community towards this aim. Researchers and working groups are encouraged to provide feedback on novel developments and special topics to be integrated into the existing publication cycles.},
  addendum = {IF: 1.99},
  author = {Hoehndorf$^*$, Robert and Haendel, Melissa and Stevens, Robert and Rebholz-Schuhmann, Dietrich},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  number = {1},
  pages = {15},
  pubmedid = {24602198},
  title = {Thematic series on biomedical ontologies in JBMS: challenges and new directions},
  url = {http://www.jbiomedsem.com/content/5/1/15},
  volume = {5},
  year = {2014}
}

Analyzing gene expression data in mice with the Neuro Behavior Ontology

Hoehndorf, R., Hancock, J. M., Hardy, N. W., Mallon, A. M., Schofield, P. N. and Gkoutos, G. V.

Mamm Genome, vol. 25(1-2), pp. 32-40 (2014)

Applied OntologyBiomedical informatics

@article{Hoehndorf2013nbo,
  abstract = {We have applied the Neuro Behavior Ontology (NBO), an ontology for the annotation of behavioral gene functions and behavioral phenotypes, to the annotation of more than 1,000 genes in the mouse that are known to play a role in behavior. These annotations can be explored by researchers interested in genes involved in particular behaviors and used computationally to provide insights into the behavioral phenotypes resulting from differences in gene expression. We developed the OntoFUNC tool and have applied it to enrichment analyses over the NBO to provide high-level behavioral interpretations of gene expression datasets. The resulting increase in the number of gene annotations facilitates the identification of behavioral or neurologic processes by assisting the formulation of hypotheses about the relationships between gene, processes, and phenotypic manifestations resulting from behavioral observations.},
  addendum = {IF: 2.08},
  author = {Hoehndorf$^*$, R. and Hancock, J. M. and Hardy, N. W. and Mallon, A. M. and Schofield, P. N. and Gkoutos$^*$, G. V.},
  journal = {Mamm Genome},
  number = {1-2},
  pages = {32-40},
  title = {Analyzing gene expression data in mice with the Neuro Behavior Ontology},
  volume = {25},
  year = {2014}
}

Enriched biodiversity data as a resource and service

Rutger Vos, Jordan Biserkov, Bachir Balech, Niall Beard, Matthew Blissett, Christian Brenninkmeijer, Tom van Dooren, David Eades, George Gosline, Quentin Groom, Thomas Hamann, Hannes Hettling, Robert Hoehndorf, Ayco Holleman, Peter Hovenkamp, Patricia Kelbert, David King, Don Kirkup, Youri Lammers, Thibaut DeMeulemeester, Daniel Mietchen, Jeremy Miller, Ross Mounce, Nicola Nicolson, Rod Page, Aleksandra Pawlik, Serrano Pereira, Lyubomir Penev, Kevin Richards, Guido Sautter, David Shorthouse, Marko Tähtinen, Claus Weiland, Alan Williams and Soraya Sierra

Biodiversity Data Journal, vol. 2, pp. e1125 (2014)

Applied OntologyBiomedical informatics

Background: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists. One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain. Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts.

@article{Biosphere,
  abstract = {Background: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon.

Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists.

One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain.

Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts.},
  addendum = {IF: 1.33},
  author = {Rutger Vos$^*$ and Jordan Biserkov and Bachir Balech and Niall Beard and Matthew Blissett and Christian Brenninkmeijer and Tom van Dooren and David Eades and George Gosline and Quentin Groom and Thomas Hamann and Hannes Hettling and Robert Hoehndorf and Ayco Holleman and Peter Hovenkamp and Patricia Kelbert and David King and Don Kirkup and Youri Lammers and Thibaut DeMeulemeester and Daniel Mietchen and Jeremy Miller and Ross Mounce and Nicola Nicolson and Rod Page and Aleksandra Pawlik and Serrano Pereira and Lyubomir Penev and Kevin Richards and Guido Sautter and David Shorthouse and Marko Tähtinen and Claus Weiland and Alan Williams and Soraya Sierra},
  doi = {10.3897/BDJ.2.e1125},
  journal = {Biodiversity Data Journal},
  month = {jun},
  pages = {e1125},
  publisher = {Pensoft Publishers},
  title = {Enriched biodiversity data as a resource and service},
  url = {http://dx.doi.org/10.3897/BDJ.2.e1125},
  volume = {2},
  year = {2014}
}

An integrative, translational approach to understanding rare and orphan genetically based diseases

Hoehndorf, Robert, Schofield, Paul N. and Gkoutos, Georgios V.

Interface Focus, vol. 3(2) (2013)

Rare diseaseOntology engineering

@article{Hoehndorf2013orphanet,
  abstract = {PhenomeNet is an approach for integrating phenotypes across species and identifying candidate genes for genetic diseases based on the similarity between a disease and animal model phenotypes. In contrast to 'guilt-by-association' approaches, PhenomeNet relies exclusively on the comparison of phenotypes to suggest candidate genes, and can, therefore, be applied to study the molecular basis of rare and orphan diseases for which the molecular basis is unknown. In addition to disease phenotypes from the Online Mendelian Inheritance in Man (OMIM) database, we have now integrated the clinical signs from Orphanet into PhenomeNet. We demonstrate that our approach can efficiently identify known candidate genes for genetic diseases in Orphanet and OMIM. Furthermore, we find evidence that mutations in the HIP1 gene might cause Bassoe syndrome, a rare disorder with unknown genetic aetiology. Our results demonstrate that integration and computational analysis of human disease and animal model phenotypes using PhenomeNet has the potential to reveal novel insights into the pathobiology underlying genetic diseases.},
  addendum = {IF: 3.09},
  author = {Hoehndorf$^*$, Robert and Schofield, Paul N. and Gkoutos, Georgios V.},
  doi = {10.1098/rsfs.2012.0055},
  journal = {Interface Focus},
  number = {2},
  title = {An integrative, translational approach to understanding rare and orphan genetically based diseases},
  url = {http://rsfs.royalsocietypublishing.org/content/3/2/20120055.abstract},
  volume = {3},
  year = {2013}
}

Representing physiological processes and their participants with PhysioMaps

Cook, Daniel, Neal, Maxwell, Hoehndorf, Robert, Gkoutos, Georgios and Gennari, John

Journal of Biomedical Semantics, vol. 4(Suppl 1), pp. S2 (2013)

Applied Ontology

@article{Cook2013,
  abstract = {BACKGROUND:As the number and size of biological knowledge resources for physiology grows, researchers need improved tools for searching and integrating knowledge and physiological models. Unfortunately, current resources--databases, simulation models, and knowledge bases, for example--are only occasionally and idiosyncratically explicit about the semantics of the biological entities and processes that they describe.RESULTS:We present a formal approach, based on the semantics of biophysics as represented in the Ontology of Physics for Biology, that divides physiological knowledge into three partitions: structural knowledge, process knowledge and biophysical knowledge. We then computationally integrate these partitions across multiple structural and biophysical domains as computable ontologies by which such knowledge can be archived, reused, and displayed. Our key result is the semi-automatic parsing of biosimulation model code into PhysioMaps that can be displayed and interrogated for qualitative responses to hypothetical perturbations.CONCLUSIONS:Strong, explicit semantics of biophysics can provide a formal, computational basis for integrating physiological knowledge in a manner that supports visualization of the physiological content of biosimulation models across spatial scales and biophysical domains.},
  addendum = {IF: 1.99},
  author = {Cook$^*$, Daniel and Neal, Maxwell and Hoehndorf, Robert and Gkoutos, Georgios and Gennari, John},
  doi = {10.1186/2041-1480-4-S1-S2},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  number = {Suppl 1},
  pages = {S2},
  title = {Representing physiological processes and their participants with PhysioMaps},
  url = {http://www.jbiomedsem.com/content/4/S1/S2},
  volume = {4},
  year = {2013}
}

Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources

Rebholz-Schuhmann, Dietrich, Kafkas, Senay, Kim, Jee-Hyub, Li, Chen, Jimeno Yepes, Antonio, Hoehndorf, Robert, Backofen, Rolf and Lewin, Ian

Journal of Biomedical Semantics, vol. 4(1), pp. 28 (2013)

Biomedical informatics

Motivation The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features. Ideally all three resources, i.e.~corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them. Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other. We systematically compare different PGN taggers against publicly available corpora and analyze the impact of the included lexical resource in their performance. In particular, we determine the performance gains through false positive filtering, which contributes to the disambiguation of identified PGNs.RESULTS:In general, machine learning approaches (ML-Tag) for PGN tagging show higher F1-measure performance against the BioCreative-II and Jnlpba GSCs (exact matching), whereas the lexicon based approaches (LexTag) in combination with disambiguation methods show better results on FsuPrge and PennBio. The ML-Tag solutions balance precision and recall, whereas the LexTag solutions have different precision and recall profiles at the same F1-measure across all corpora. Higher recall is achieved with larger lexical resources, which also introduce more noise (false positive results). The ML-Tag solutions certainly perform best, if the test corpus is from the same GSC as the training corpus. As expected, the false negative errors characterize the test corpora and - on the other hand - the profiles of the false positive mistakes characterize the tagging solutions. Lex-Tag solutions that are based on a large terminological resource in combination with false positive filtering produce better results, which, in addition, provide concept identifiers from a knowledge source in contrast to ML-Tag solutions.CONCLUSION:The standard ML-Tag solutions achieve high performance, but not across all corpora, and thus should be trained using several different corpora to reduce possible biases. The LexTag solutions have different profiles for their precision and recall performance, but with similar F1-measure. This result is surprising and suggests that they cover a portion of the most common naming standards, but cope differently with the term variability across the corpora. The false positive filtering applied to LexTag solutions does improve the results by increasing their precision without compromising significantly their recall. The harmonisation of the annotation schemes in combination with standardized lexical resources in the tagging solutions will enable their comparability and will pave the way for a shared standard.

@article{Rebholz2013,
  abstract = {Motivation The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features. Ideally all three resources, i.e.~corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them. Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other. We systematically compare different PGN taggers against publicly available corpora and analyze the impact of the included lexical resource in their performance. In particular, we determine the performance gains through false positive filtering, which contributes to the disambiguation of identified PGNs.RESULTS:In general, machine learning approaches (ML-Tag) for PGN tagging show higher F1-measure performance against the BioCreative-II and Jnlpba GSCs (exact matching), whereas the lexicon based approaches (LexTag) in combination with disambiguation methods show better results on FsuPrge and PennBio. The ML-Tag solutions balance precision and recall, whereas the LexTag solutions have different precision and recall profiles at the same F1-measure across all corpora. Higher recall is achieved with larger lexical resources, which also introduce more noise (false positive results). The ML-Tag solutions certainly perform best, if the test corpus is from the same GSC as the training corpus. As expected, the false negative errors characterize the test corpora and - on the other hand - the profiles of the false positive mistakes characterize the tagging solutions. Lex-Tag solutions that are based on a large terminological resource in combination with false positive filtering produce better results, which, in addition, provide concept identifiers from a knowledge source in contrast to ML-Tag solutions.CONCLUSION:The standard ML-Tag solutions achieve high performance, but not across all corpora, and thus should be trained using several different corpora to reduce possible biases. The LexTag solutions have different profiles for their precision and recall performance, but with similar F1-measure. This result is surprising and suggests that they cover a portion of the most common naming standards, but cope differently with the term variability across the corpora. The false positive filtering applied to LexTag solutions does improve the results by increasing their precision without compromising significantly their recall. The harmonisation of the annotation schemes in combination with standardized lexical resources in the tagging solutions will enable their comparability and will pave the way for a shared standard.},
  addendum = {IF: 1.99},
  author = {Rebholz-Schuhmann$^*$, Dietrich and Kafkas, Senay and Kim, Jee-Hyub and Li, Chen and Jimeno Yepes, Antonio and Hoehndorf, Robert and Backofen, Rolf and Lewin, Ian},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  month = {October},
  number = {1},
  pages = {28},
  title = {Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources},
  url = {http://www.jbiomedsem.com/content/4/1/28},
  volume = {4},
  year = {2013}
}

Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)

Dietrich Rebholz-Schuhmann, Jee-Hyub Kim, Ying Yan, Abhishek Dixit, Caroline Friteyre, Robert Hoehndorf, Rolf Backofen and Ian Lewin

PLoS ONE, vol. 8(10), pp. e75185 (2013)

Biomedical informaticsApplied Ontology

Motivation: Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical "term space" (the "Lexeome"), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). Result: This study compiles a resource for lexical terms of biomedical interest in a standard format (called "LexEBI"), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. Conclusion: LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources.

@article{lexebi,
  abstract = {Motivation: Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical "term space" (the "Lexeome"), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). Result: This study compiles a resource for lexical terms of biomedical interest in a standard format (called "LexEBI"), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. Conclusion: LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources.},
  addendum = {IF: 2.74},
  author = {Dietrich Rebholz-Schuhmann$^*$ and Jee-Hyub Kim and Ying Yan and Abhishek Dixit and Caroline Friteyre and Robert Hoehndorf and Rolf Backofen and Ian Lewin},
  journal = {PLoS ONE},
  month = {October},
  number = {10},
  pages = {e75185},
  publisher = {Public Library of Science},
  title = {Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)},
  url = {http://dx.doi.org/10.1371%2Fjournal.pone.0075185},
  volume = {8},
  year = {2013}
}

Semantic Systems Biology: Formal Knowledge Representation in Systems Biology for Model Construction, Retrieval, Validation and Discovery

Dumontier, Michel, Chepelev, Leonid L and Hoehndorf, Robert

Systems Biology, pp. 355-373 (2013)

Applied OntologyDrug mechanisms

@incollection{dumontier2013semantic,
  abstract = {With the publication of the human genome, scientists worldwide opened champagne and let out a collective cheer for progress in biology. After all, the untold number of interactions of tens of thousands of genes, a greater number of their products and product derivatives, and tens of thousands of chemicals came much closer to complete characterization. Paradoxically however, while individual efforts produced important biological results, an integrated view of biology from systems perspective seemed ever more distant due to the complexity of data integration from multiple knowledge representation forms, formalisms, modeling paradigms, and conflicting scientific statements. To address this, semantic technologies have risen over the past decade with the promise of truly unifying biological knowledge and allowing cross-domain queries and model integration. In this chapter, we shall examine Semantic Web technologies and their applications to build, publish, query, discover, compare, validate, reason about, and evaluate models and knowledge in Systems Biology. We shall specifically address biological ontologies, open data repositories, modeling and annotation tools, and selected promising applications of Semantic Systems Biology. We firmly believe that it shall soon be possible to completely close the gap between facts, models, and results, and to fully apply the accrued models and facts to evaluate biological hypotheses on a system level, discovering meaning within the vast collection of biological knowledge and taking Systems Biology research to a new, unprecedented level.},
  author = {Dumontier$^*$, Michel and Chepelev, Leonid L and Hoehndorf, Robert},
  booktitle = {Systems Biology},
  pages = {355--373},
  publisher = {Springer Netherlands},
  title = {Semantic Systems Biology: Formal Knowledge Representation in Systems Biology for Model Construction, Retrieval, Validation and Discovery},
  year = {2013}
}

Mouse model phenotypes provide information about human drug targets

Hoehndorf, Robert, Hiebert, Tanya, Hardy, Nigel W., Schofield, Paul N., Gkoutos, Georgios V. and Dumontier, Michel

Bioinformatics (2013)

Drug mechanismsRare disease

@article{Hoehndorf2013drugs,
  abstract = {Motivation: Methods for computational drug target identification utilize information from diverse information sources to predict or prioritize drug targets for known drugs. One set of resources that has been relatively neglected for drug repurposing are animal model phenotypes.Results: We investigate the use of mouse model phenotypes for drug target identification. To achieve this goal, we first integrate mouse model phenotypes and drug effects, and then systematically compare the phenotypic similarity between mouse models and drug effect profiles. We find a high similarity between phenotypes resulting from loss-of-function mutations and drug effects resulting from the inhibition of a protein through a drug action, and demonstrate how this approach can be used to suggest candidate drug targets.Availability and implementation: Analysis code and supplementary data files are available on the project website at https://drugeffects.googlecode.com.Contact: roh25@aber.ac.uk},
  addendum = {IF: 6.94},
  author = {Hoehndorf$^*$, Robert and Hiebert, Tanya and Hardy, Nigel W. and Schofield, Paul N. and Gkoutos, Georgios V. and Dumontier$^*$, Michel},
  journal = {Bioinformatics},
  month = {October},
  title = {Mouse model phenotypes provide information about human drug targets},
  url = {http://bioinformatics.oxfordjournals.org/content/early/2013/10/23/bioinformatics.btt613.abstract},
  year = {2013}
}

Linking PharmGKB to phenotype studies and animal models of disease for drug repurposing

Robert Hoehndorf, Anika Oellrich, Dietrich Rebholz-Schuhmann, Paul N. Schofield and Georgios V. Gkoutos

Pacific Symposium on Biocomputing (PSB), pp. 388-399 (2012)

Drug mechanismsRare diseaseApplied Ontology

@article{Hoehndorf2012psb,
  abstract = {  The investigation of phenotypes in model organisms has the potential
to reveal the molecular mechanisms underlying disease. The
large-scale comparative analysis of phenotypes across species can
reveal novel associations between genotypes and diseases. We use the
PhenomeNET network of phenotypic similarity to suggest
genotype--disease association, combine them with drug--gene
associations available from the PharmGKB database, and infer novel
associations between drugs and diseases.  We evaluate and quantify
our results based on our method's capability to reproduce known
drug--disease associations. We find and discuss evidence that
levonorgestrel, tretinoin and estradiol are associated with cystic
fibrosis ($p<2.65\cdot 10^{-6}$, $p<0.002$ and $p<0.031$, Wilcoxon
signed-rank test, Bonferroni correction) and that ibuprofen may be
active in chronic lymphocytic leukemia ($p<2.63\cdot 10^{-23}$,
Wilcoxon signed-rank test, Bonferroni correction). To enable access
to our results, we implement a web server and make our raw data
freely available.  Our results are the first steps in implementing
an integrated system for the analysis and prediction of
drug--disease associations for rare and orphan diseases for which
the molecular basis is not known.},
  author = {Robert Hoehndorf and Anika Oellrich and Dietrich Rebholz-Schuhmann and Paul N. Schofield and Georgios V. Gkoutos},
  date = {2012},
  journal = {Pacific Symposium on Biocomputing (PSB)},
  optannote = {},
  optkey = {},
  optmonth = {},
  optnumber = {},
  optvolume = {},
  pages = {388--399},
  title = {Linking PharmGKB to phenotype studies and animal models of disease for drug repurposing},
  year = {2012}
}

Mouse genetic and phenotypic resources for human genetics

Schofield, Paul N., Hoehndorf, Robert and Gkoutos, Georgios V.

Human Mutation (2012)

Rare diseasePhenotype informatics

@article{Schofield2012,
  abstract = {The use of model organisms to provide information on gene function has proved to be a powerful approach to our understanding of both human disease and fundamental mammalian biology. Large-scale community projects using mice, based on forward and reverse genetics, and now the pan-genomic phenotyping efforts of the International Mouse Phenotyping Consortium (IMPC), are generating resources on an unprecedented scale which will be extremely valuable to human genetics and medicine. We discuss the nature and availability of data, mice and ES cells from these large-scale programmes, the use of these resources to help prioritise and validate candidate genes in human genetic association studies, and how they can improve our understanding of the underlying pathobiology of human disease.},
  addendum = {IF: 4.12},
  author = {Schofield$^*$, Paul N. and Hoehndorf, Robert and Gkoutos,
Georgios V.},
  date = {2012},
  journal = {Human Mutation},
  publisher = {Wiley Subscription Services, Inc., A Wiley Company},
  title = {Mouse genetic and phenotypic resources for human genetics},
  url = {http://onlinelibrary.wiley.com/doi/10.1002/humu.22077/abstract},
  year = {2012}
}

Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL

Jupp, Simon, Stevens, Robert and Hoehndorf, Robert

Journal of Biomedical Semantics, vol. 3(Suppl 1), pp. S3 (2012)

Applied OntologyProtein functionOntology engineering

MOTIVATION:Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product's function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology's representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible.RESULTS:To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed.CONCLUSION:This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of the gene products in the GOAL ontology. OWL in combination with automated reasoning can be effectively used to query across ontologies to ask biologically rich questions. We have demonstrated that automated reasoning can be used to deliver practical on-line querying support for the ontology annotations available for the mouse.AVAILABILITY:The GOAL Web page is to be found at http://owl.cs.manchester.ac.uk/goal webcite.

@article{Jupp2012,
  abstract = {MOTIVATION:Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product's function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology's representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible.RESULTS:To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed.CONCLUSION:This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of the gene products in the GOAL ontology. OWL in combination with automated reasoning can be effectively used to query across ontologies to ask biologically rich questions. We have demonstrated that automated reasoning can be used to deliver practical on-line querying support for the ontology annotations available for the mouse.AVAILABILITY:The GOAL Web page is to be found at http://owl.cs.manchester.ac.uk/goal webcite.},
  addendum = {IF: 1.99},
  author = {Jupp$^*$, Simon and Stevens, Robert and Hoehndorf, Robert},
  date = {2012},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  number = {Suppl 1},
  pages = {S3},
  pubmedid = {22541594},
  title = {Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL},
  url = {http://www.jbiomedsem.com/supplements/3/S1/S3},
  volume = {3},
  year = {2012}
}

Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology

Hoehndorf, Robert, Harris, Midori A., Herre, Heinrich, Rustici, Gabriella and Gkoutos, Georgios V.

Bioinformatics, vol. 28(13), pp. 1783-1789 (2012)

Applied OntologyPhenotype informaticsOntology engineering

@article{Hoehndorf2012cpo,
  abstract = {Motivation: The systematic observation of phenotypes has become a crucial tool of functional genomics, and several large international projects are currently underway to identify and characterize the phenotypes that are associated with genotypes in several species. To integrate phenotype descriptions within and across species, phenotype ontologies have been developed. Applying ontologies to unify phenotype descriptions in the domain of physiology has been a particular challenge due to the high complexity of the underlying domain.Results: In this study, we present the outline of a theory and its implementation for an ontology of physiology-related phenotypes. We provide a formal description of process attributes and relate them to the attributes of their temporal parts and participants. We apply our theory to create the Cellular Phenotype Ontology (CPO). The CPO is an ontology of morphological and physiological phenotypic characteristics of cells, cell components and cellular processes. Its prime application is to provide terms and uniform definition patterns for the annotation of cellular phenotypes. The CPO can be used for the annotation of observed abnormalities in domains, such as systems microscopy, in which cellular abnormalities are observed and for which no phenotype ontology has been created.Availability and implementation: The CPO and the source code we generated to create the CPO are freely available on http://cell-phenotype.googlecode.com.Contact: rh497@cam.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.},
  addendum = {IF: 6.94},
  author = {Hoehndorf$^*$, Robert and Harris, Midori A. and Herre, Heinrich and Rustici, Gabriella and Gkoutos, Georgios V.},
  doi = {10.1093/bioinformatics/bts250},
  journal = {Bioinformatics},
  number = {13},
  pages = {1783-1789},
  title = {Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology},
  url = {http://bioinformatics.oxfordjournals.org/content/28/13/1783.abstract},
  volume = {28},
  year = {2012}
}

An infrastructure for ontology-based information systems in biomedicine: RICORDO case study

Wimalaratne, Sarala M., Grenon, Pierre, Hoehndorf, Robert, Gkoutos, Georgios V. and de Bono, Bernard

Bioinformatics, vol. 28(3), pp. 448-450 (2012)

Ontology engineeringBiomedical informatics

@article{Wimalaratne2012,
  abstract = {Summary: The article presents an infrastructure for supporting the semantic interoperability of biomedical resources based on the management (storing and inference-based querying) of their ontology-based annotations. This infrastructure consists of: (i) a repository to store and query ontology-based annotations; (ii) a knowledge base server with an inference engine to support the storage of and reasoning over ontologies used in the annotation of resources; (iii) a set of applications and services allowing interaction with the integrated repository and knowledge base. The infrastructure is being prototyped and developed and evaluated by the RICORDO project in support of the knowledge management of biomedical resources, including physiology and pharmacology models and associated clinical data.Availability and implementation: The RICORDO toolkit and its source code are freely available from http://ricordo.eu/relevant-resources.Contact: sarala@ebi.ac.uk},
  addendum = {IF: 6.94},
  author = {Wimalaratne$^*$, Sarala M. and Grenon, Pierre and Hoehndorf, Robert and Gkoutos, Georgios V. and de Bono, Bernard},
  date = {2012},
  journal = {Bioinformatics},
  number = {3},
  pages = {448-450},
  title = {An infrastructure for ontology-based information systems in biomedicine: RICORDO case study},
  url = {http://bioinformatics.oxfordjournals.org/content/28/3/448.abstract},
  volume = {28},
  year = {2012}
}

Identifying aberrant pathways through integrated analysis of knowledge in pharmacogenomics

Hoehndorf, Robert, Dumontier, Michel and Gkoutos, Georgios V.

Bioinformatics, vol. 28(16), pp. 2169-2175 (2012)

Drug mechanismsBiomedical informatics

@article{Hoehndorf2012pharmgkb,
  abstract = {Motivation: Many complex diseases are the result of
abnormal pathway functions instead of single
abnormalities. Disease diagnosis and intervention
strategies must target these pathways while
minimizing the interference with normal
physiological processes. Large scale identification
of disease pathways and chemicals that may be used
to perturb them requires the integration of
information about drugs, genes, diseases and
pathways. This information is currently distributed
over several pharmacogenomics databases. An
integrated analysis of the information in these
databases can reveal disease pathways and facilitate
novel biomedical analyses. Results: We demonstrate
how to integrate pharmacogenomics databases through
integration of the biomedical ontologies that are
used as meta-data in these databases. The additional
background knowledge in these ontologies can then be
used to enable novel analyses. We identify disease
pathways using a novel multi-ontology enrichment
analysis over the Human Disease Ontology, and we
identify significant associations between chemicals
and pathways using an enrichment analysis over a
chemical ontology. The drug-pathway and
disease-pathway associations are a valuable resource
for research in disease and drug mechanisms and can
be used to improve computational drug
repurposing. Contact: rh497@cam.ac.uk},
  addendum = {IF: 6.94},
  author = {Hoehndorf$^*$, Robert and Dumontier, Michel and Gkoutos,
Georgios V.},
  day = {15},
  journal = {Bioinformatics},
  month = {August},
  number = {16},
  pages = {2169--2175},
  title = {Identifying aberrant pathways through integrated
analysis of knowledge in pharmacogenomics},
  url = {http://bioinformatics.oxfordjournals.org/content/28/16/2169.long?ijkey=nM7YSnmyflyuDx5&keytype=ref},
  volume = {28},
  year = {2012}
}

Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology

Oellrich, Anika, Gkoutos, Georgios V., Hoehndorf, Robert and Rebholz-Schuhmann, Dietrich

Journal of Biomedical Semantics, vol. 3(Suppl 2), pp. S1 (2012)

Phenotype informaticsSemantic similarity

@article{Oellrich2012,
  abstract = {Researchers use animal studies to better understand human diseases. In recent years, large-scale phenotype studies such as Phenoscape and EuroPhenome have been initiated to identify genetic causes of a species' phenome. Species-specific phenotype ontologies are required to capture and report about all findings and to automatically infer results relevant to human diseases. The integration of the different phenotype ontologies into a coherent framework is necessary to achieve interoperability for cross-species research. Here, we investigate the quality and completeness of two different methods to align the Human Phenotype Ontology and the Mammalian Phenotype Ontology. The first method combines lexical matching with inference over the ontologies' taxonomic structures, while the second method uses a mapping algorithm based on the formal definitions of the ontologies. Neither method could map all concepts. Despite the formal definitions method provides mappings for more concepts than does the lexical matching method, it does not outperform the lexical matching in a biological use case. Our results suggest that combining both approaches will yield a better mappings in terms of completeness, specificity and application purposes.},
  addendum = {1.99},
  author = {Oellrich$^*$, Anika and Gkoutos, Georgios V. and Hoehndorf, Robert and Rebholz-Schuhmann, Dietrich},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  number = {Suppl 2},
  pages = {S1},
  title = {Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology},
  url = {http://www.jbiomedsem.com/content/3/S2/S1},
  volume = {3},
  year = {2012}
}

Towards improving phenotype representation in OWL

Loebe, Frank, Stumpf, Frank, Hoehndorf, Robert and Herre, Heinrich

Journal of Biomedical Semantics, vol. 3(Suppl 2), pp. S5 (2012)

Applied OntologyPhenotype informatics

@article{Loebe2012,
  abstract = {BACKGROUND:Phenotype ontologies are used in species-specific databases for the annotation of mutagenesis experiments and to characterize human diseases. The Entity-Quality (EQ) formalism is a means to describe complex phenotypes based on one or more affected entities and a quality. EQ-based definitions have been developed for many phenotype ontologies, including the Human and Mammalian Phenotype ontologies.METHODS:We analyze formalizations of complex phenotype descriptions in the Web Ontology Language (OWL) that are based on the EQ model, identify several representational challenges and analyze potential solutions to address these challenges.RESULTS:In particular, we suggest a novel, role-based approach to represent relational qualities such as concentration of iron in spleen, discuss its ontological foundation in the General Formal Ontology (GFO) and evaluate its representation in OWL and the benefits it can bring to the representation of phenotype annotations.CONCLUSION:Our analysis of OWL-based representations of phenotypes can contribute to improving consistency and expressiveness of formal phenotype descriptions.},
  addendum = {1.99},
  author = {Loebe$^*$, Frank and Stumpf, Frank and Hoehndorf, Robert and Herre, Heinrich},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  number = {Suppl 2},
  pages = {S5},
  title = {Towards improving phenotype representation in OWL},
  url = {http://www.jbiomedsem.com/content/3/S2/S5},
  volume = {3},
  year = {2012}
}

Ontology-based cross-species integration and analysis of Saccharomyces cerevisiae phenotypes

Gkoutos, Georgios V. and Hoehndorf, Robert

Journal of Biomedical Semantics, vol. 3(Suppl 2), pp. S6 (2012)

Applied OntologyPhenotype informatics

@article{Gkoutos2012yeast,
  abstract = {Ontologies are widely used in the biomedical community for annotation and integration of databases. Formal definitions can relate classes from different ontologies and thereby integrate data across different levels of granularity, domains and species. We have applied this methodology to the Ascomycete Phenotype Ontology (APO), enabling the reuse of various orthogonal ontologies and we have converted the phenotype associated data found in the SGD following our proposed patterns. We have integrated the resulting data in the cross-species phenotype network PhenomeNET, and we make both the cross-species integration of yeast phenotypes and a similarity-based comparison of yeast phenotypes across species available in the PhenomeBrowser. Furthermore, we utilize our definitions and the yeast phenotype annotations to suggest novel functional annotations of gene products in yeast.},
  addendum = {1.99},
  author = {Gkoutos$^*$, Georgios V. and Hoehndorf, Robert},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  number = {Suppl 2},
  pages = {S6},
  title = {Ontology-based cross-species integration and analysis of Saccharomyces cerevisiae phenotypes},
  url = {http://www.jbiomedsem.com/content/3/S2/S6},
  volume = {3},
  year = {2012}
}

Evaluation of research in biomedical ontologies

Hoehndorf, Robert, Dumontier, Michel and Gkoutos, Georgios V.

Briefings in Bioinformatics (2012)

Applied OntologyBiomedical informatics

@article{Hoehndorf2012eval,
  abstract = {Ontologies are now pervasive in biomedicine, where they serve as a means to standardize terminology, to enable access to domain knowledge, to verify data consistency and to facilitate integrative analyses over heterogeneous biomedical data. For this purpose, research on biomedical ontologies applies theories and methods from diverse disciplines such as information management, knowledge representation, cognitive science, linguistics and philosophy. Depending on the desired applications in which ontologies are being applied, the evaluation of research in biomedical ontologies must follow different strategies. Here, we provide a classification of research problems in which ontologies are being applied, focusing on the use of ontologies in basic and translational research, and we demonstrate how research results in biomedical ontologies can be evaluated. The evaluation strategies depend on the desired application and measure the success of using an ontology for a particular biomedical problem. For many applications, the success can be quantified, thereby facilitating the objective evaluation and comparison of research in biomedical ontology. The objective, quantifiable comparison of research results based on scientific applications opens up the possibility for systematically improving the utility of ontologies in biomedical research.},
  addendum = {IF: 11.62},
  author = {Hoehndorf$^*$, Robert and Dumontier, Michel and Gkoutos, Georgios V.},
  journal = {Briefings in Bioinformatics},
  month = {September},
  title = {Evaluation of research in biomedical ontologies},
  url = {http://bib.oxfordjournals.org/content/early/2012/09/07/bib.bbs053.abstract},
  year = {2012}
}

The Units Ontology: a tool for integrating units of measurement in science

Gkoutos, Georgios V., Schofield, Paul N. and Hoehndorf, Robert

Database, vol. 2012 (2012)

Applied OntologyOntology engineering

@article{Gkoutos2012units,
  abstract = {Units are basic scientific tools that render meaning to numerical data. Their standardization and formalization caters for the report, exchange, process, reproducibility and integration of quantitative measurements. Ontologies are means that facilitate the integration of data and knowledge allowing interoperability and semantic information processing between diverse biomedical resources and domains. Here, we present the Units Ontology (UO), an ontology currently being used in many scientific resources for the standardized description of units of measurements.},
  addendum = {IF: 3.66},
  author = {Gkoutos$^*$, Georgios V. and Schofield, Paul N. and Hoehndorf, Robert},
  journal = {Database},
  title = {The Units Ontology: a tool for integrating units of measurement in science},
  url = {http://database.oxfordjournals.org/content/2012/bas033.abstract},
  volume = {2012},
  year = {2012}
}

Text-mining solutions for biomedical research: enabling integrative biology

Dietrich Rebholz-Schuhmann, Anika Oellrich and Robert Hoehndorf

Nature Reviews Genetics, vol. 13(12), pp. 829-839 (2012)

Biomedical informatics

@article{Hoehndorf2012nrg,
  abstract = {In response to the unbridled growth of information in literature and biomedical databases, researchers require efficient means of handling and extracting information. As well as providing background information for research, scientific publications can be processed to transform textual information into database content or complex networks and can be integrated with existing knowledge resources to suggest novel hypotheses. Information extraction and text data analysis can be particularly relevant and helpful in genetics and biomedical research, in which up-to-date information about complex processes involving genes, proteins and phenotypes is crucial. Here we explore the latest advancements in automated literature analysis and its contribution to innovative research approaches.},
  addendum = {IF: 33.13},
  author = {Dietrich Rebholz-Schuhmann$^*$ and Anika Oellrich and Robert Hoehndorf},
  journal = {Nature Reviews Genetics},
  month = {December},
  number = {12},
  optannote = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  pages = {829--839},
  title = {Text-mining solutions for biomedical research: enabling integrative biology},
  url = {http://www.nature.com/nrg/journal/v13/n12/full/nrg3337.html},
  volume = {13},
  year = {2012}
}

Chapter Four - The Neurobehavior Ontology: An Ontology for Annotation and Integration of Behavior and Behavioral Phenotypes

Georgios V. Gkoutos, Paul N. Schofield and Robert Hoehndorf

Bioinformatics of Behavior: Part 1, vol. 103, pp. 69 - 87, In: Elissa J. Chesler and Melissa A. Haendel (Eds.) (2012)

Applied OntologyPhenotype informaticsOntology engineering

@incollection{Gkoutos2012behavior,
  abstract = {Abstract
In recent years, considerable advances have been made toward our understanding of the genetic architecture of behavior and the physical, mental, and environmental influences that underpin behavioral processes. The provision of a method for recording behavior-related phenomena is necessary to enable integrative and comparative analyses of data and knowledge about behavior. The neurobehavior ontology facilitates the systematic representation of behavior and behavioral phenotypes, thereby improving the unification and integration behavioral data in neuroscience research.},
  author = {Georgios V. Gkoutos$^*$ and Paul N. Schofield and Robert Hoehndorf$^*$},
  booktitle = {Bioinformatics of Behavior: Part 1},
  doi = {10.1016/B978-0-12-388408-4.00004-6},
  editor = {Elissa J. Chesler and Melissa A. Haendel},
  issn = {0074-7742},
  keywords = {Behavior},
  pages = {69 - 87},
  publisher = {Academic Press},
  series = {International Review of Neurobiology},
  title = {Chapter Four - The Neurobehavior Ontology: An Ontology for Annotation and Integration of Behavior and Behavioral Phenotypes},
  url = {http://www.sciencedirect.com/science/article/pii/B9780123884084000046},
  volume = {103},
  year = {2012}
}

Argumentation to Represent and Reason over Biological Systems

Adam Wyner, Luke Riley, Robert Hoehndorf and Samuel Croset

Proceedings of the 3rd International Conference on Information Technology in Bio- and Medical Informatics (ITBAM 2012) (2012)

Applied Ontology

@inproceedings{Wyner2012,
  author = {Adam Wyner and Luke Riley and Robert Hoehndorf and Samuel Croset},
  booktitle = {Proceedings of the 3rd International Conference on Information Technology in Bio- and Medical Informatics (ITBAM 2012)},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  title = {Argumentation to Represent and Reason over Biological Systems},
  year = {2012}
}

A translational medicine approach to orphan diseases

Robert Hoehndorf and Georgios V. Gkoutos

Proceedings of the Virtual Physiological Human Conference 2012 (VPH2012) (2012)

Rare diseaseBiomedical informatics

@inproceedings{Hoehndorf2012vph1,
  author = {Robert Hoehndorf and Georgios V. Gkoutos},
  booktitle = {Proceedings of the Virtual Physiological Human Conference 2012 (VPH2012)},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  title = {A translational medicine approach to orphan diseases},
  year = {2012}
}

Integration of knowledge for personalized medicine: a pharmacogenomics case-study

Robert Hoehndorf, Michel Dumontier and Georgios V. Gkoutos

Proceedings of the Virtual Physiological Human Conference 2012 (VPH2012) (2012)

Drug mechanismsOntology engineering

@inproceedings{Hoehndorf2012vph2,
  author = {Robert Hoehndorf and Michel Dumontier and Georgios V. Gkoutos},
  booktitle = {Proceedings of the Virtual Physiological Human Conference 2012 (VPH2012)},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  title = {Integration of knowledge for personalized medicine: a pharmacogenomics case-study},
  year = {2012}
}

A common layer of interoperability for biomedical ontologies based on OWL EL

Robert Hoehndorf, Michel Dumontier, Anika Oellrich, Sarala Wimalaratne, Dietrich Rebholz-Schuhmann, Paul N. Schofield and Georgios V. Gkoutos

Bioinformatics, vol. 27(7), pp. 1001-1008 (2011)

Ontology engineeringApplied Ontology

@article{elvira,
  abstract = {Motivation: Ontologies are essential in biomedical research due to their ability to semantically integrate content from different scientific databases and resources. Their application improves capabilities for querying and mining biological knowledge. An increasing number of ontologies is being developed for this purpose, and considerable effort is invested into formally defining them in order to represent their semantics explicitly. However, current biomedical ontologies do not facilitate data integration and interoperability yet, since reasoning over these ontologies is very complex and cannot be performed efficiently or is even impossible. We propose the use of less expressive subsets of ontology representation languages to enable efficient reasoning and achieve the goal of genuine interoperability between ontologies. Results: We present and evaluate EL Vira, a framework that transforms OWL ontologies into the OWL EL subset, thereby enabling the use of tractable reasoning. We illustrate which OWL constructs and inferences are kept and lost following the conversion and demonstrate the performance gain of reasoning indicated by the significant reduction of processing time. We applied EL Vira to the open biomedical ontologies and provide a repository of ontologies resulting from this conversion. EL Vira creates a common layer of ontological interoperability that, for the first time, enables the creation of software solutions that can employ biomedical ontologies to perform inferences and answer complex queries to support scientific analyses. Availability and implementation: The EL Vira software is available from http://el-vira.googlecode.com and converted OBO ontologies and their mappings are available from http://bioonto.gen.cam.ac.uk/el-ont. Contact: rh497@cam.ac.uk},
  addendum = {IF: 6.94},
  author = {Robert Hoehndorf$^*$ and Michel Dumontier and Anika Oellrich and Sarala Wimalaratne and Dietrich Rebholz-Schuhmann and Paul N. Schofield and Georgios V. Gkoutos},
  date = {2011},
  day = {1},
  journal = {Bioinformatics},
  month = {April},
  number = {7},
  optannote = {},
  optkey = {},
  pages = {1001--1008},
  title = {A common layer of interoperability for biomedical ontologies based on OWL EL},
  url = {http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btr058?ijkey=JQwZvdFzmKzubQe&keytype=ref},
  volume = {27},
  year = {2011}
}

The RNA Ontology (RNAO): An Ontology for Integrating RNA Sequence and Structure Data

Robert Hoehndorf, Colin Batchelor, Thomas Bittner, Michel Dumontier, Karen Eilbeck, Rob Knight, Chris J. Mungall, Jane S. Richardson, Jesse Stombaugh, Eric Westhof and Craig L. Zirbel and Neocles B. Leontis

Applied Ontology, vol. 6(1), pp. 53-89 (2011)

Applied OntologyBiomedical informatics

@article{rnao,
  abstract = {Biomedical Ontologies integrate diverse biomedical data and enable intelligent data-mining and help translate basic research into useful clinical knowledge. We present the RNA Ontology (RNAO), an ontology for integrating diverse RNA data, including RNA sequences and sequence alignments, three-dimensional structures, and biochemical and functional data. For example, individual atomic resolution RNA structures have broader significance as representatives of classes of homologous molecules, which can differ significantly in sequence while sharing core structural features and common roles or functions. Thus, structural data gain value by being linked to homologous sequences in genomic data and databases of sequence alignments. Likewise, the value of genomic data is enhanced by annotation of shared structural features, especially when these can be linked to specific functions. Moreover, the significance of biochemical, functional and mutational analyses of RNA molecules are most fully understood when linked to molecular structures and phylogenies. To achieve these goals, RNAO provides logically rigorous definitions of the components of RNA primary, secondary and tertiary structure and the relations between these entities. RNAO is being developed to comply with the developing standards of the Open Biomedical Ontologies (OBO) Consortium. The RNAO can be accessed at http://code.google.com/p/rnao/.},
  addendum = {IF: 0.85},
  author = {Robert Hoehndorf and Colin Batchelor and Thomas Bittner and Michel
Dumontier and Karen Eilbeck and Rob Knight and Chris J. Mungall and Jane
S. Richardson and Jesse Stombaugh and Eric Westhof and Craig L. Zirbel and
Neocles B. Leontis$^*$},
  date = {2011},
  journal = {Applied Ontology},
  month = {January},
  number = {1},
  optannote = {},
  optkey = {},
  pages = {53--89},
  title = {The RNA Ontology (RNAO): An Ontology for Integrating RNA Sequence and Structure Data},
  url = {http://iospress.metapress.com/content/p301p5vx3p687105/},
  volume = {6},
  year = {2011}
}

PhenomeNET: a whole-phenome approach to disease gene discovery

Hoehndorf, Robert, Schofield, Paul N. and Gkoutos, Georgios V.

Nucleic Acids Research, vol. 39(18), pp. e119 (2011)

Rare diseaseSemantic similarityPhenotype informatics

@article{Hoehndorf2011phenome,
  abstract = {Phenotypes are investigated in model organisms to understand and reveal
the molecular mechanisms underlying disease. Phenotype ontologies
were developed to capture and compare phenotypes within the context
of a single species. Recently, these ontologies were augmented with
formal class definitions that may be utilized to integrate phenotypic
data and enable the direct comparison of phenotypes between different
species. We have developed a method to transform phenotype ontologies
into a formal representation, combine phenotype ontologies with anatomy
ontologies, and apply a measure of semantic similarity to construct
the PhenomeNET cross-species phenotype network. We demonstrate that
PhenomeNET can identify orthologous genes, genes involved in the
same pathway and gene–disease associations through the comparison
of mutant phenotypes. We provide evidence that the Adam19 and Fgf15
genes in mice are involved in the tetralogy of Fallot, and, using
zebrafish phenotypes, propose the hypothesis that the mammalian homologs
of Cx36.7 and Nkx2.5 lie in a pathway controlling cardiac morphogenesis
and electrical conductivity which, when defective, cause the tetralogy
of Fallot phenotype. Our method implements a whole-phenome approach
toward disease gene discovery and can be applied to prioritize genes
for rare and orphan diseases for which the molecular basis is unknown.},
  addendum = {IF: 11.50},
  author = {Hoehndorf$^*$, Robert and Schofield, Paul N. and Gkoutos, Georgios V.},
  date = {2011},
  journal = {Nucleic Acids Research},
  month = {July},
  number = {18},
  pages = {e119},
  title = {PhenomeNET: a whole-phenome approach to disease gene discovery},
  url = {http://nar.oxfordjournals.org/content/39/18/e119},
  volume = {39},
  year = {2011}
}

Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning

Robert Hoehndorf, Michel Dumontier, Anika Oellrich and Dietrich Rebholz-Schuhmann and Paul N. Schofield and Georgios V. Gkoutos

PLOS ONE, vol. 6(7), pp. e22006 (2011)

Ontology engineeringApplied Ontology

@article{Hoehndorf2011incon,
  abstract = {Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.},
  addendum = {IF: 2.74},
  author = {Robert Hoehndorf$^*$ and Michel Dumontier and Anika Oellrich and
Dietrich Rebholz-Schuhmann and Paul N. Schofield and
Georgios V. Gkoutos},
  date = {2011},
  journal = {PLOS ONE},
  month = {July},
  number = {7},
  optannote = {},
  optkey = {},
  pages = {e22006},
  title = {Interoperability between biomedical ontologies
through relation expansion, upper-level ontologies
and automatic reasoning},
  url = {http://dx.doi.org/10.1371/journal.pone.0022006},
  volume = {6},
  year = {2011}
}

Integrating systems biology models and biomedical ontologies

Hoehndorf, Robert, Dumontier, Michel, Gennari, John H., Wimalaratne, Sarala, de Bono, Bernard, Cook, Daniel L. and Gkoutos, Georgios V.

BMC Systems Biology, vol. 5(1), pp. 124+ (2011)

Biomedical informatics

@article{Hoehndorf2011models,
  abstract = {BACKGROUND: Systems biology is an approach to biology that emphasizes the structure and dynamic behavior of biological systems and the interactions that occur within them. To succeed, systems biology crucially depends on the accessibility and integration of data across domains and levels of granularity. Biomedical ontologies were developed to facilitate such an integration of data and are often used to annotate biosimulation models in systems biology. RESULTS: We provide a framework to integrate representations of in silico systems biology with those of in vivo biology as described by biomedical ontologies, and demonstrate this framework using the Systems Biology Markup Language. We developed the SBML Harvester software that automatically converts annotated SBML models into OWL, and we apply our software to those biosimulation models that are contained in the BioModels Database. We utilize the resulting knowledge base for complex biological queries that can bridge levels of granularity, verify models based on the biological phenomenon they represent and provide a means to establish a basic qualitative layer on which to express the semantics of biosimulation models. CONCLUSIONS: We establish an information flow between biomedical ontologies and biosimulation models, and we demonstrate that the integration of annotated biosimulation models and biomedical ontologies enables the verification of models as well as expressive queries. Establishing a bi-directional information flow between systems biology and biomedical ontologies has the potential to enable large-scale analyses of biological systems that span levels of granularity from molecules to organisms.},
  addendum = {IF: 2.05},
  author = {Hoehndorf$^*$, Robert and Dumontier, Michel and Gennari, John H. and Wimalaratne, Sarala and de Bono, Bernard and Cook, Daniel L. and Gkoutos, Georgios V.},
  citeulike-article-id = {9645286},
  citeulike-linkout-0 = {http://dx.doi.org/10.1186/1752-0509-5-124},
  date = {2011},
  doi = {10.1186/1752-0509-5-124},
  issn = {1752-0509},
  journal = {BMC Systems Biology},
  keywords = {biology, flow, information, model, ontology, sbml, systems},
  month = {August},
  number = {1},
  pages = {124+},
  posted-at = {2011-08-11 14:40:30},
  priority = {0},
  title = {Integrating systems biology models and biomedical ontologies},
  url = {http://www.biomedcentral.com/1752-0509/5/124},
  volume = {5},
  year = {2011}
}

The RICORDO approach to semantic interoperability for biomedical data and models: strategy, standards and solutions.

de Bono, Bernard, Hoehndorf, Robert, Wimalaratne, Sarala, Gkoutos, Georgios V. and Grenon, Pierre

BMC Research Notes, vol. 4(1), pp. 313 (2011)

Ontology engineeringBiomedical informatics

@article{ricordovision,
  abstract = {BACKGROUND:The practice and research of medicine generates considerable quantities of data and model resources (DMRs). Although in principle biomedical resources are re-usable, in practice few can currently be shared. In particular, the clinical communities in physiology and pharmacology research, as well as medical education, (i.e. PPME communities) are facing considerable operational and technical obstacles in sharing data and models.FINDINGS:We outline the efforts of the PPME communities to achieve automated semantic interoperability for clinical resource documentation in collaboration with the RICORDO project. Current community practices in resource documentation and knowledge management are overviewed. Furthermore, requirements and improvements sought by the PPME communities to current documentation practices are discussed. The RICORDO plan and effort in creating a representational framework and associated open software toolkit for the automated management of PPME metadata resources is also described. CONCLUSIONS:RICORDO is providing the PPME community with tools to effect, share and reason over clinical resource annotations. This work is contributing to the semantic interoperability of DMRs through ontology-based annotation by (i) supporting more effective navigation and re-use of clinical DMRs, as well as (ii) sustaining interoperability operations based on the criterion of biological similarity. Operations facilitated by RICORDO will range from automated dataset matching to model merging and managing complex simulation workflows. In effect, RICORDO is contributing to community standards for resource sharing and interoperability},
  author = {de Bono$^*$, Bernard and Hoehndorf, Robert and Wimalaratne, Sarala and Gkoutos, Georgios V. and Grenon, Pierre},
  date = {2011},
  issn = {1756-0500},
  journal = {BMC Research Notes},
  number = {1},
  pages = {313},
  pubmedid = {21878109},
  title = {The RICORDO approach to semantic interoperability for biomedical data and models: strategy, standards and solutions.},
  url = {http://www.biomedcentral.com/1756-0500/4/313/},
  volume = {4},
  year = {2011}
}

PIDO: The Primary Immunodeficiency Disease Ontology

Adams, Nico, Hoehndorf, Robert, Gkoutos, Georgios V., Hansen, Gesine and Hennig, Christian

Bioinformatics (2011)

Applied OntologyRare disease

Motivation: Primary Immunodeficiency Diseases (PIDs) are Mendelian conditions of high phenotypic complexity and low incidence. They usually manifest in toddlers and infants, although they can also occur much later in life. Information about PIDs is often widely scattered throughout the clinical as well as the research literature and hard to find for both generalists as well as experienced clinicians. Semantic Web technologies coupled to clinical information systems can go some way towards addressing this problem. Ontologies are a central component of such a system, containing and centralizing knowledge about primary immunodeficiencies in both a human- and computer-comprehensible form. The development of an ontology of PIDs is therefore a central step towards developing informatics tools, which can support the clinician in the diagnosis and treatment of these diseases.Results: We present PIDO, the Primary Immunodeficiency Disease Ontology. PIDO characterises PIDs in terms of the phenotypes commonly observed by clinicians during a diagnosis process. Phenotype terms in PIDO are formally defined using complex definitions based on qualities, functions, processes, and structures. We provide mappings to biomedical reference ontologies to ensure interoperability with ontologies in other domains. Based on PIDO, we developed the PIDFinder, an ontology-driven software prototype that can facilitate clinical decision support. PIDO connects immunological knowledge across resources within a common framework and thereby enables translational research and the development of medical applications for the domain of immunology and primary immunodeficiency diseases.Availability: The Primary Immunodeficiency Disease Ontology is available under a Creative Commons Attribution 3.0 (CC-BY 3.0) licence at http://code.google.com/p/pido/ The most recent public release of the ontology can always be found at http://purl.org/scimantica/pido/owl/pid.owl. An instance of the PIDFinder software can be found at http://pidfinder.appspot.comContact: nico.adams@csiro.au

@article{Adams2011,
  abstract = {Motivation: Primary Immunodeficiency Diseases (PIDs) are Mendelian conditions of high phenotypic complexity and low incidence. They usually manifest in toddlers and infants, although they can also occur much later in life. Information about PIDs is often widely scattered throughout the clinical as well as the research literature and hard to find for both generalists as well as experienced clinicians. Semantic Web technologies coupled to clinical information systems can go some way towards addressing this problem. Ontologies are a central component of such a system, containing and centralizing knowledge about primary immunodeficiencies in both a human- and computer-comprehensible form. The development of an ontology of PIDs is therefore a central step towards developing informatics tools, which can support the clinician in the diagnosis and treatment of these diseases.Results: We present PIDO, the Primary Immunodeficiency Disease Ontology. PIDO characterises PIDs in terms of the phenotypes commonly observed by clinicians during a diagnosis process. Phenotype terms in PIDO are formally defined using complex definitions based on qualities, functions, processes, and structures. We provide mappings to biomedical reference ontologies to ensure interoperability with ontologies in other domains. Based on PIDO, we developed the PIDFinder, an ontology-driven software prototype that can facilitate clinical decision support. PIDO connects immunological knowledge across resources within a common framework and thereby enables translational research and the development of medical applications for the domain of immunology and primary immunodeficiency diseases.Availability: The Primary Immunodeficiency Disease Ontology is available under a Creative Commons Attribution 3.0 (CC-BY 3.0) licence at http://code.google.com/p/pido/ The most recent public release of the ontology can always be found at http://purl.org/scimantica/pido/owl/pid.owl. An instance of the PIDFinder software can be found at http://pidfinder.appspot.comContact: nico.adams@csiro.au},
  addendum = {IF: 6.94},
  author = {Adams$^*$, Nico and Hoehndorf, Robert and Gkoutos, Georgios V. and Hansen, Gesine and Hennig, Christian},
  date = {2011},
  journal = {Bioinformatics},
  month = {September},
  title = {PIDO: The Primary Immunodeficiency Disease Ontology},
  url = {http://bioinformatics.oxfordjournals.org/content/early/2011/09/22/bioinformatics.btr531.abstract},
  year = {2011}
}

Ontology design patterns to disambiguate relations between genes and gene products in GENIA

Hoehndorf, Robert, Ngonga Ngomo, Axel-Cyrille, Pyysalo, Sampo, Ohta, Tomoko, Oellrich, Anika and Rebholz-Schuhmann, Dietrich

Journal of Biomedical Semantics, vol. 2(Suppl 5), pp. S1 (2011)

Applied OntologyOntology engineeringBiomedical informatics

@article{Hoehndorf2011genia,
  abstract = {MOTIVATION:Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences.RESULTS:We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications.AVAILABILITY:Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/ webcite.},
  addendum = {IF: 1.99},
  author = {Hoehndorf$^*$, Robert and Ngonga Ngomo, Axel-Cyrille and Pyysalo, Sampo and Ohta, Tomoko and Oellrich, Anika and Rebholz-Schuhmann, Dietrich},
  date = {2011},
  doi = {10.1186/2041-1480-2-S5-S1},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  number = {Suppl 5},
  pages = {S1},
  title = {Ontology design patterns to disambiguate relations between genes and gene products in GENIA},
  url = {http://www.jbiomedsem.com/content/2/S5/S1},
  volume = {2},
  year = {2011}
}

New approaches to the representation and analysis of phenotype knowledge in human diseases and their animal models

Schofield, Paul N., Sundberg, John P., Hoehndorf, Robert and Gkoutos, Georgios V.

Briefings in Functional Genomics, vol. 10(5), pp. 258-265 (2011)

Applied OntologyPhenotype informaticsRare disease

@article{Schofield2011,
  abstract = {The systematic investigation of the phenotypes associated with genotypes in model organisms holds the promise of revealing genotype–phenotype relations directly and without additional, intermediate inferences. Large-scale projects are now underway to catalog the complete phenome of a species, notably the mouse. With the increasing amount of phenotype information becoming available, a major challenge that biology faces today is the systematic analysis of this information and the translation of research results across species and into an improved understanding of human disease. The challenge is to integrate and combine phenotype descriptions within a species and to systematically relate them to phenotype descriptions in other species, in order to form a comprehensive understanding of the relations between those phenotypes and the genotypes involved in human disease. We distinguish between two major approaches for comparative phenotype analyses: the first relies on evolutionary relations to bridge the species gap, while the other approach compares phenotypes directly. In particular, the direct comparison of phenotypes relies heavily on the quality and coherence of phenotype and disease databases. We discuss major achievements and future challenges for these databases in light of their potential to contribute to the understanding of the molecular mechanisms underlying human disease. In particular, we discuss how the use of ontologies and automated reasoning can significantly contribute to the analysis of phenotypes and demonstrate their potential for enabling translational research.},
  addendum = {2.94},
  author = {Schofield, Paul N. and Sundberg, John P. and Hoehndorf, Robert and Gkoutos, Georgios V.},
  date = {2011},
  journal = {Briefings in Functional Genomics},
  number = {5},
  pages = {258-265},
  title = {New approaches to the representation and analysis of phenotype knowledge in human diseases and their animal models},
  url = {http://bfg.oxfordjournals.org/content/10/5/258.abstract},
  volume = {10},
  year = {2011}
}

OBML - Ontologies in Biomedicine and Life Sciences

Herre, Heinrich, Hoehndorf, Robert, Kelso, Janet, Loebe, Frank and Schulz, Stefan

Journal of Biomedical Semantics, vol. 2(Suppl 4), pp. I1 (2011)

Applied OntologyBiomedical informatics

The OBML 2010 workshop, held at the University of Mannheim on September 9-10, 2010, is the 2nd in a series of meetings organized by the Working Group "Ontologies in Biomedicine and Life Sciences" of the German Society of Computer Science (GI) and the German Society of Medical Informatics, Biometry and Epidemiology (GMDS). Integrating, processing and applying the rapidly expanding information generated in the life sciences -- from public health to clinical care and molecular biology -- is one of the most challenging problems that research in these fields is facing today. As the amounts of experimental data, clinical information and scientific knowledge increase, there is a growing need to promote interoperability of these resources, support formal analyses, and to pre-process knowledge for further use in problem solving and hypothesis formulation.The OBML workshop series pursues the aim of gathering scientists who research topics related to life science ontologies, to exchange ideas, discuss new results and establish relationships. The OBML group promotes the collaboration between ontologists, computer scientists, bio-informaticians and applied logicians, as well as the cooperation with physicians, biologists, biochemists and biometricians, and supports the establishment of this new discipline in research and teaching. Research topics of OBML 2010 included medical informatics, Semantic Web applications, formal ontology, bio-ontologies, knowledge representation as well as the wide range of applications of biomedical ontologies to science and medicine. A total of 14 papers were presented, and from these we selected four manuscripts for inclusion in this special issue.An interdisciplinary audience from all areas related to biomedical ontologies attended OBML 2010. In the future, OBML will continue as an annual meeting that aims to bridge the gap between theory and application of ontologies in the life sciences. The next event emphasizes the special topic of the ontology of phenotypes, in Berlin, Germany on October 6-7, 2011.

@article{Hoehndorf2010obml,
  abstract = {The OBML 2010 workshop, held at the University of Mannheim on September 9-10, 2010, is the 2nd in a series of meetings organized by the Working Group "Ontologies in Biomedicine and Life Sciences" of the German Society of Computer Science (GI) and the German Society of Medical Informatics, Biometry and Epidemiology (GMDS). Integrating, processing and applying the rapidly expanding information generated in the life sciences -- from public health to clinical care and molecular biology -- is one of the most challenging problems that research in these fields is facing today. As the amounts of experimental data, clinical information and scientific knowledge increase, there is a growing need to promote interoperability of these resources, support formal analyses, and to pre-process knowledge for further use in problem solving and hypothesis formulation.The OBML workshop series pursues the aim of gathering scientists who research topics related to life science ontologies, to exchange ideas, discuss new results and establish relationships. The OBML group promotes the collaboration between ontologists, computer scientists, bio-informaticians and applied logicians, as well as the cooperation with physicians, biologists, biochemists and biometricians, and supports the establishment of this new discipline in research and teaching. Research topics of OBML 2010 included medical informatics, Semantic Web applications, formal ontology, bio-ontologies, knowledge representation as well as the wide range of applications of biomedical ontologies to science and medicine. A total of 14 papers were presented, and from these we selected four manuscripts for inclusion in this special issue.An interdisciplinary audience from all areas related to biomedical ontologies attended OBML 2010. In the future, OBML will continue as an annual meeting that aims to bridge the gap between theory and application of ontologies in the life sciences. The next event emphasizes the special topic of the ontology of phenotypes, in Berlin, Germany on October 6-7, 2011.},
  addendum = {IF: 1.99},
  author = {Herre$^*$, Heinrich and Hoehndorf, Robert and Kelso, Janet and Loebe, Frank and Schulz, Stefan},
  date = {2011},
  doi = {10.1186/2041-1480-2-S4-I1},
  issn = {2041-1480},
  journal = {Journal of Biomedical Semantics},
  month = {August},
  number = {Suppl 4},
  pages = {I1},
  pubmedid = {21996496},
  title = {OBML - Ontologies in Biomedicine and Life Sciences},
  url = {http://www.jbiomedsem.com/content/2/S4/I1},
  volume = {2},
  year = {2011}
}

Higgs bosons, mars missions, and unicorn delusions: How to deal with terms of dubious reference in scientific ontologies

Stefan Schulz, Mathias Brochhausen and Robert Hoehndorf

Proceedings of the Second International Conference on Biomedical Ontology (2011)

Applied Ontology

@inproceedings{unicorn1,
  abstract = {Realist ontologies claim to represent what exists. Scientific discourse, however, often contains non-referring terms when describing hypotheses, plans, or ideas. We present a framework in which a realist ontology is embedded in an description logics theory, which is indifferent regarding the existence of class members, and which may include representational units for representing various kinds of non-referring terms. Using a taxonomy of terminological units we are able to distinguish between different kinds of classes in the description logics theory and to identify classes as unsatisfiable, which are put as the extensions of non-referring terms. We also demonstrate how discourse using non-referring terms can be represented without departing from the principle of realist ontologies. An example OWL file can be downloaded from: http://purl.org/steschu/misc/ICBO2011.},
  author = {Stefan Schulz and Mathias Brochhausen and Robert Hoehndorf},
  booktitle = {Proceedings of the Second International Conference on Biomedical Ontology},
  day = {30},
  month = {July},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  title = {Higgs bosons, mars missions, and unicorn delusions: How to deal with terms of dubious reference in scientific ontologies},
  year = {2011}
}

Investigation of the fundamental strategy for interoperability of description of biological measurements

Hiroshi Masuya, Georgios V. Gkoutos, Nobuhiko Tanaka, Kazunori Waki, Yoshihiro Okuda, Tatsuya Kushida, Norio Kobayashi, Koji Doi, Kouji Kozaki, Robert Hoehndorf, Shigeharu Wakana, Tetsuro Toyoda and Riichiro Mizoguchi

Proceedings of the Second International Conference on Biomedical Ontology (2011)

Applied OntologyOntology engineering

@inproceedings{icbo2,
  abstract = {Aiming the facilitation of the advanced integration of measurement data across various biological experiments, we have investigated the fundamental methodology to expand the Phenotypic Quality Ontology (PATO) commonly used for descriptions of biological phenotypes with the framework of the Yet Another More Advanced Top-level Ontology (YAMATO). The mapping of ontology terms of PATO to YAMATO’s framework represents several advanced aspects such as the introduction of the classification of quality values to represent scales of measurements, the distinction of the different contexts of the comparison of ordinal values, and the establishment of the interoperability of quality description formalisms between different top-level ontologies. In this study, we propose a logical base to integrate cross-species and cross-experimental annotations of biological measurements.},
  author = {Hiroshi Masuya and Georgios V. Gkoutos and Nobuhiko Tanaka and Kazunori Waki and Yoshihiro Okuda and Tatsuya Kushida and Norio Kobayashi and Koji Doi and Kouji Kozaki and Robert Hoehndorf and Shigeharu Wakana and Tetsuro Toyoda and Riichiro Mizoguchi},
  booktitle = {Proceedings of the Second International Conference on Biomedical Ontology},
  day = {29},
  month = {July},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optmonth = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  title = {Investigation of the fundamental strategy for interoperability of description of biological measurements},
  year = {2011}
}

Exploring Gene Ontology Annotations with OWL

Simon Jupp, Robert Stevens and Robert Hoehndorf

Proceedings of the 13th Bio-Ontology Meeting (2011)

Applied OntologyProtein function

@inproceedings{goannos,
  abstract = {Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other activities. Tools, such as AmiGO, allow exploration of genes based on their GO annotations. This human driven exploration and querying of GO is obviously useful, but by taking advantage of the ontological representation we can use these annotations to create a rich polyhierarchy of gene products for enhanced querying. This also opens up possibilities for exploring GO annotations (GOA) for redundancies and defects in annotations.
To do this we have created a set of OWL classes for mouse genes and their GOA. Each gene is represented as a class, with appropriate relationships to the GO aspects with which it has been annotated. We then use defined classes to query these gene product classes and to build a complex hierarchy. This standard use of OWL affords a rich interaction with GO annotations to give a fine partitioning of the gene products in the ontology.},
  author = {Simon Jupp and Robert Stevens and Robert Hoehndorf},
  booktitle = {Proceedings of the 13th Bio-Ontology Meeting},
  day = {15},
  month = {July},
  optaddress = {},
  optannote = {},
  optcrossref = {},
  opteditor = {},
  optkey = {},
  optnote = {},
  optnumber = {},
  optorganization = {},
  optpages = {},
  optpublisher = {},
  optseries = {},
  optvolume = {},
  title = {Exploring Gene Ontology Annotations with OWL},
  url = {https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxU-_gdp879tYTdiODFjY2EtNjQ2My00ZWQwLWFkMjUtMjAyNGY1YzcyNGZj&hl=en_US},
  year = {2011}
}

Ontology-based cross-species integration and analysis of Saccharomyces cerevisiae phenotypes

Georgios V. Gkoutos and Robert Hoehndorf

Proceedings of the 3rd Workshop for Ontologies in Biomedicine and Life sciences (OBML) (2011)

Applied OntologyPhenotype informatics

@inproceedings{obml2011h1,
  abstract = {Ontologies are widely used in the biomedical community for
annotation and integration of databases. Formal definitions can relate
classes from different ontologies and thereby integrate data across
different levels of granularity, domains and species. We have applied
this methodology to the Ascomycete Phenotype Ontology (APO),
enabling the reuse of various orthogonal ontologies and we have
converted the phenotype associated data found in the SGD following
our proposed patterns. We have integrated the resulting data to
a cross-species phenotype network termed PhenomeNET and we
make both the cross-species integration of yeast phenotypes and
a similarity-based comparison of yeast phenotypes across species
available in the PhenomeBrowser.},
  author = {Georgios V. Gkoutos and Robert Hoehndorf},
  booktitle = {Proceedings of the 3rd Workshop for Ontologies in Biomedicine and Life sciences (OBML)},
  month = {October},
  title = {Ontology-based cross-species integration and analysis of Saccharomyces cerevisiae phenotypes},
  year = {2011}
}

Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology

Anika Oellrich, Robert Hoehndorf, Georgios V. Gkoutos and Dietrich Rebholz-Schuhmann

Proceedings of the 3rd Workshop for Ontologies in Biomedicine and Life sciences (OBML) (2011)

Phenotype informaticsSemantic similarity

@inproceedings{obml2011h2,
  abstract = {Researchers use animal studies to better understand human diseases.
In recent years, large-scale phenotype studies such as Phenoscape and
EuroPhenome have been initiated to identify genetic causes of a species
phenome. Species-specific phenotype ontologies are required to capture
and report about all findings and to automatically infer results relevant to
human diseases. The integration of the different phenotype ontologies into a
coherent framework is necessary to achieve interoperability for cross-species
research.
Here, we investigate the quality and completeness of two different methods
to align the Human Phenotype Ontology and the Mammalian Phenotype
Ontology. The first method combines lexical matching with inference over the
ontologies taxonomic structures, while the second method uses a mapping
algorithm based on the formal definitions from the ontologies. Neither method
could map all concepts. Despite the formal definitions method provides
mappings for more concepts than does the lexical matching method, it does
not outperform the lexical matching in a biological use case. Our results
suggest that combining both approaches will yield to better mappings in terms
of completeness, specificity and application purposes.},
  author = {Anika Oellrich and Robert Hoehndorf and Georgios V. Gkoutos and Dietrich Rebholz-Schuhmann},
  booktitle = {Proceedings of the 3rd Workshop for Ontologies in Biomedicine and Life sciences (OBML)},
  month = {October},
  title = {Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology},
  year = {2011}
}

Towards Improving Phenotype Representation in OWL

Frank Loebe, Frank Stumpf, Robert Hoehndorf and Heinrich Herre

Proceedings of the 3rd Workshop for Ontologies in Biomedicine and Life sciences (OBML) (2011)

Applied OntologyPhenotype informatics

@inproceedings{obml2011h3,
  abstract = {Phenotype ontologies are used in species-specific databases for
the annotation of mutagenesis experiments and to characterize hu-
man diseases. The Entity-Quality (EQ) formalism is a means to
describe complex phenotypes based on one or more affected en-
tities and a quality. EQ-based definitions have been developed for
many phenotype ontologies, including the Human and Mammalian
Phenotype ontologies. We analyze the OWL-based formalizations of
complex phenotype descriptions based on the EQ model, identify
several representational challenges and analyze potential solutions
to address these challenges. In particular, we suggest a novel,
role-based approach to represent relational qualities such as Con-
centration of calcium in blood, discuss its ontological foundation in
the General Formal Ontology (GFO) and evaluate its representation in
OWL and the benefits it can bring to the representation of phenotype
annotations. Our analysis of OWL-based representation of pheno-
types can contribute to improving consistency and expressiveness of
formal phenotype descriptions.},
  author = {Frank Loebe and Frank Stumpf and Robert Hoehndorf and Heinrich Herre},
  booktitle = {Proceedings of the 3rd Workshop for Ontologies in Biomedicine and Life sciences (OBML)},
  month = {October},
  title = {Towards Improving Phenotype Representation in OWL},
  year = {2011}
}

Statistical tests for associations between two directed acyclic graphs.

Hoehndorf, Robert, Ngonga Ngomo, Axel-Cyrille, Dannemann, Michael and Kelso, Janet

PloS ONE, vol. 5(6), pp. e10996+ (2010)

Biomedical informatics

@article{h3,
  abstract = {Biological data, and particularly annotation data, are increasingly being represented in directed acyclic graphs (DAGs). However, while relevant biological information is implicit in the links between multiple domains, annotations from these different domains are usually represented in distinct, unconnected DAGs, making links between the domains represented difficult to determine. We develop a novel family of general statistical tests for the discovery of strong associations between two directed acyclic graphs. Our method takes the topology of the input graphs and the specificity and relevance of associations between nodes into consideration. We apply our method to the extraction of associations between biomedical ontologies in an extensive use-case. Through a manual and an automatic evaluation, we show that our tests discover biologically relevant relations. The suite of statistical tests we develop for this purpose is implemented and freely available for download.},
  addendum = {IF: 2.74},
  author = {Hoehndorf$^*$, Robert and Ngonga Ngomo, Axel-Cyrille and Dannemann, Michael and Kelso, Janet},
  citeulike-article-id = {7333223},
  citeulike-linkout-1 = {http://view.ncbi.nlm.nih.gov/pubmed/20585388},
  citeulike-linkout-2 = {http://www.hubmed.org/display.cgi?uids=20585388},
  date = {2010},
  day = {16},
  issn = {1932-6203},
  journal = {PloS ONE},
  keywords = {association, biomedical, gene, graph, mining, ontology, statistical, test, text},
  month = {June},
  number = {6},
  pages = {e10996+},
  posted-at = {2010-10-13 11:58:41},
  priority = {2},
  publisher = {Public Library of Science},
  title = {Statistical tests for associations between two directed acyclic graphs.},
  url = {http://dx.doi.org/10.1371/journal.pone.0010996},
  volume = {5},
  year = {2010}
}

Applying the functional abnormality ontology pattern to anatomical functions.

Hoehndorf, Robert, Ngonga Ngomo, Axel-Cyrille and Kelso, Janet

Journal of biomedical semantics, vol. 1(1), pp. 4+ (2010)

Biomedical informatics

@article{h7,
  abstract = {ABSTRACT: BACKGROUND: Several biomedical ontologies cover the domain of biological functions, including molecular and cellular functions. However, there is currently no publicly available ontology of anatomical functions.Consequently, no explicit relation between anatomical structures and their functions is expressed in the anatomy ontologies that are available for various species. Such an explicit relation between anatomical structures and their functions would be useful both for defining the classes of the anatomy and the phenotype ontologies accurately. RESULTS: We provide an ontological analysis of functions and functional abnormalities. From this analysis, we derive an approach to the automatic extraction of anatomical functions from existing ontologies which uses a combination of natural language processing, graph-based analysis of the ontologies and formal inferences. Additionally, we introduce a new relation to link material objects to processes that realize the function of these objects. This relation is introduced to avoid a needless duplication of processes already covered by the Gene Ontology in a new ontology of anatomical functions. CONCLUSIONS: Ontological considerations on the nature of functional abnormalities and their representation in current phenotype ontologies show that we can extract a skeleton for an ontology of anatomical functions by using a combination of process, phenotype and anatomy ontologies automatically. We identify several limitations of the current ontologies that still need to be addressed to ensure a consistent and complete representation of anatomical functions and their abnormalities. AVAILABILITY: The source code and results of our analysis are available at http://bioonto.de.},
  addendum = {IF: 1.99},
  author = {Hoehndorf$^*$, Robert and Ngonga Ngomo, Axel-Cyrille and Kelso, Janet},
  citeulike-article-id = {6936564},
  citeulike-linkout-1 = {http://view.ncbi.nlm.nih.gov/pubmed/20618982},
  citeulike-linkout-2 = {http://www.hubmed.org/display.cgi?uids=20618982},
  date = {2010},
  issn = {2041-1480},
  journal = {Journal of biomedical semantics},
  keywords = {abnormality, design, function, hpo, human, mammalian, mpo, ontology, pattern, phenotype},
  month = {March},
  number = {1},
  pages = {4+},
  posted-at = {2010-10-13 11:50:31},
  priority = {2},
  title = {Applying the functional abnormality ontology pattern to anatomical functions.},
  url = {http://dx.doi.org/10.1186/2041-1480-1-4},
  volume = {1},
  year = {2010}
}

Relations as patterns: bridging the gap between OBO and OWL.

Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Kelso, Janet, Rebholz-Schuhmann, Dietrich and Herre, Heinrich

BMC Bioinformatics, vol. 11(1), pp. 441+ (2010)

Biomedical informatics

BACKGROUND: most biomedical ontologies are represented in the OBO Flatfile Format, which is an easy-to-use graph-based ontology language. The semantics of the OBO Flatfile Format 1.2 enforces a strict predetermined interpretation of relationship statements between classes. It does not allow flexible specifications that provide better approximations of the intuitive understanding of the considered relations. If relations cannot be accurately expressed then ontologies built upon them may contain false assertions and hence lead to false inferences. Ontologies in the OBO Foundry must formalize the semantics of relations according to the OBO Relationship Ontology (RO). Therefore, being able to accurately express the intended meaning of relations is of crucial importance. Since the Web Ontology Language (OWL) is an expressive language with a formal semantics, it is suitable to de ne the meaning of relations accurately. RESULTS: we developed a method to provide definition patterns for relations between classes using OWL and describe a novel implementation of the RO based on this method. We implemented our extension in software that converts ontologies in the OBO Flatfile Format to OWL, and also provide a prototype to extract relational patterns from OWL ontologies using automated reasoning. The conversion software is freely available at http://bioonto.de/obo2owl, and can be accessed via a web interface. CONCLUSIONS: explicitly defining relations permits their use in reasoning software and leads to a more flexible and powerful way of representing biomedical ontologies. Using the extended langua0067e and semantics avoids several mistakes commonly made in formalizing biomedical ontologies, and can be used to automatically detect inconsistencies. The use of our method enables the use of graph-based ontologies in OWL, and makes complex OWL ontologies accessible in a graph-based form. Thereby, our method provides the means to gradually move the representation of biomedical ontologies into formal knowledge representation languages that incorporates an explicit semantics. Our method facilitates the use of OWL-based software in the back-end while ontology curators may continue to develop ontologies with an OBO-style front-end.

@article{h20,
  abstract = {BACKGROUND: most biomedical ontologies are represented in the OBO Flatfile Format, which is an easy-to-use graph-based ontology language. The semantics of the OBO Flatfile Format 1.2 enforces a strict predetermined interpretation of relationship statements between classes. It does not allow flexible specifications that provide better approximations of the intuitive understanding of the considered relations. If relations cannot be accurately expressed then ontologies built upon them may contain false assertions and hence lead to false inferences. Ontologies in the OBO Foundry must formalize the semantics of relations according to the OBO Relationship Ontology (RO). Therefore, being able to accurately express the intended meaning of relations is of crucial importance. Since the Web Ontology Language (OWL) is an expressive language with a formal semantics, it is suitable to de ne the meaning of relations accurately. RESULTS: we developed a method to provide definition patterns for relations between classes using OWL and describe a novel implementation of the RO based on this method. We implemented our extension in software that converts ontologies in the OBO Flatfile Format to OWL, and also provide a prototype to extract relational patterns from OWL ontologies using automated reasoning. The conversion software is freely available at http://bioonto.de/obo2owl, and can be accessed via a web interface. CONCLUSIONS: explicitly defining relations permits their use in reasoning software and leads to a more flexible and powerful way of representing biomedical ontologies. Using the extended langua0067e and semantics avoids several mistakes commonly made in formalizing biomedical ontologies, and can be used to automatically detect inconsistencies. The use of our method enables the use of graph-based ontologies in OWL, and makes complex OWL ontologies accessible in a graph-based form. Thereby, our method provides the means to gradually move the representation of biomedical ontologies into formal knowledge representation languages that incorporates an explicit semantics. Our method facilitates the use of OWL-based software in the back-end while ontology curators may continue to develop ontologies with an OBO-style front-end.},
  addendum = {IF: 3.24},
  author = {Hoehndorf$^*$, Robert and Oellrich, Anika and Dumontier, Michel and Kelso, Janet and Rebholz-Schuhmann, Dietrich and Herre, Heinrich},
  citeulike-article-id = {7751138},
  citeulike-linkout-0 = {http://dx.doi.org/10.1186/1471-2105-11-441},
  citeulike-linkout-1 = {http://view.ncbi.nlm.nih.gov/pubmed/20807438},
  citeulike-linkout-2 = {http://www.hubmed.org/display.cgi?uids=20807438},
  date = {2010},
  issn = {1471-2105},
  journal = {BMC Bioinformatics},
  keywords = {obo, ontology, owl, pattern, relations},
  month = {August},
  number = {1},
  pages = {441+},
  posted-at = {2010-09-01 14:58:09},
  priority = {2},
  title = {Relations as patterns: bridging the gap between OBO and OWL.},
  url = {http://dx.doi.org/10.1186/1471-2105-11-441},
  volume = {11},
  year = {2010}
}

Interoperability between phenotype and anatomy ontologies

Hoehndorf, Robert, Oellrich, Anika and Rebholz-Schuhmann, Dietrich

Bioinformatics, vol. 26(24), pp. 3112-3118 (2010)

Applied OntologyOntology engineeringPhenotype informatics

@article{h29,
  abstract = {Motivation: Phenotypic information is important for the analysis of the molecular mechanisms underlying disease. A formal ontological representation of phenotypic information can help to identify, interpret and infer phenotypic traits based on experimental findings. The methods that are currently used to represent data and information about phenotypes fail to make the semantics of the phenotypic trait explicit and do not interoperate with ontologies of anatomy and other domains. Therefore, valuable resources for the analysis of phenotype studies remain unconnected and inaccessible to automated analysis and reasoning.Results: We provide a framework to formalize phenotypic descriptions and make their semantics explicit. Based on this formalization, we provide the means to integrate phenotypic descriptions with ontologies of other domains, in particular anatomy and physiology. We demonstrate how our framework leads to the capability to represent disease phenotypes, perform powerful queries that were not possible before and infer additional knowledge.Availability: http://bioonto.de/pmwiki.php/Main/PheneOntology Contact: rh497@cam.ac.uk},
  addendum = {IF: 6.94},
  author = {Hoehndorf$^*$, Robert and Oellrich, Anika and Rebholz-Schuhmann, Dietrich},
  date = {2010},
  day = {22},
  journal = {Bioinformatics},
  month = {October},
  number = {24},
  pages = {3112--3118},
  title = {Interoperability between phenotype and anatomy ontologies},
  url = {http://bioinformatics.oxfordjournals.org/content/early/2010/10/22/bioinformatics.btq578.abstract},
  volume = {26},
  year = {2010}
}

Applying ontology design patterns to the implementation of relations in GENIA

Hoehndorf, Robert, Ngonga Ngomo, Axel-Cyrille, Pyysalo, Sampo, Ohta, Tomoko, Oellrich, Anika and Rebholz-Schuhmann, Dietrich

Proceedings of the Fourth Symposium on Semantic Mining in Biomedicine (SMBM 2010) (2010)

Biomedical informatics

@inproceedings{h1,
  abstract = {Motivation: Annotated reference corpora such as the GENIA corpus play an important role in biomedical infor-mation extraction. A semantic annotation of the natural language texts in these reference corpora using formalontologies and logic is challenging due to the ambiguous use of natural language and natural language semantics.Providing formal definitions and axioms for these relations would offer the means for developing consistent andverifiable annotation guidelines and allow for the automatic verification of annotations as well as enabling thediscovery of new information through deductive inferences.Results: We developed a formal ontology of relations based on the relations used in the recent GENIA corpusannotations. For this purpose, we selected existing axiom systems based on the desired properties of the relationswithin the domain and provided new axioms for several relations. To apply this ontology of relations to thesemantic annotation of natural language texts, we developed and implemented two ontology design patterns. Weprovide an implementation of the ontology of relations in the Web Ontology Language (OWL). By combining theimplementation of the design patterns and that of the relation ontology, we also provide a software applicationto convert annotated GENIA abstracts into OWL ontologies. In this way, we make these ontologies amenable forautomated verification, deductive inferences and other knowledge-based applications.Availability: Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.Contact: rh497@cam.ac.uk},
  author = {Hoehndorf, Robert and Ngonga Ngomo, Axel-Cyrille and Pyysalo, Sampo and Ohta, Tomoko and Oellrich, Anika and Rebholz-Schuhmann, Dietrich},
  booktitle = {Proceedings of the Fourth Symposium on Semantic Mining in Biomedicine (SMBM 2010)},
  citeulike-article-id = {8006326},
  keywords = {annotation, corpus, design, genia, mining, ontology, patterns, relation, text},
  month = {October},
  posted-at = {2010-10-13 12:04:17},
  priority = {2},
  title = {Applying ontology design patterns to the implementation of relations in GENIA},
  year = {2010}
}

The Ontology of Primary Immunodeficiency Diseases (PIDs): Using PIDs to Rethink the Ontology of Phenotypes

Adams, Nico, Hennig, Christian, Hoehndorf, Robert, Oellrich, Anika, Rebholz-Schuhmann, Dietrich and Hansen, Gesine

Proceedings of the 2nd Workshop for Ontologies in Biomedicine and Life sciences (OBML) (2010)

Biomedical informatics

@inproceedings{h2,
  abstract = {Primary immunodeficiency diseases (PIDs) are the consequence of
genetic disorders and usually manifest themselves in very young patients.
Because of their rarity, they are notoriously difficult to diagnose both for
general practitioners and clinicians. In this paper, we present the foundations
of an ontology of PIDs, which will be at the heart of an expert system
designed to assist the clinician in the diagnosis of these diseases. To achieve
this, the PIDOntology characterises Primary Immunodefieciencies in terms
of Phenotypes. While there are a number of different ontologies already
available that allow the description of phenotypes and phenotypic qualities,
these have a number of associated ontological problems, which we will also
address as part of this paper. We use the subtype of Hyper-IgE Syndrome
caused by a STAT3 defects as an example of a primary immunodeficiency
and show how the clinical phenotype of the disease can be modeled in terms
of other phenotypes by introducing the notion of the  "phene". Furthermore, we
develop patterns for different types of phenes and show, that these patterns
can be mapped onto more traditional entity-quality statements, which are the
current state of the art in phenotypic modeling.},
  author = {Adams, Nico and Hennig, Christian and Hoehndorf, Robert and Oellrich, Anika and Rebholz-Schuhmann, Dietrich and Hansen, Gesine},
  booktitle = {Proceedings of the 2nd Workshop for Ontologies in Biomedicine and Life sciences (OBML)},
  citeulike-article-id = {8006322},
  keywords = {disease, immunodeficiency, ontology, pato, phene, phenotype, pid},
  month = {September},
  posted-at = {2010-10-13 12:00:37},
  priority = {2},
  title = {The Ontology of Primary Immunodeficiency Diseases (PIDs): Using PIDs to Rethink the Ontology of Phenotypes},
  year = {2010}
}

Relational patterns in OWL and their application to OBO

Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Kelso, Janet, Herre, Heinrich and Rebholz-Schuhmann, Dietrich

OWL: Experiences and Directions (OWLED) (2010)

Biomedical informatics

@inproceedings{h4,
  abstract = {Directed acyclic graphs are commonly used to represent on-tologies in the biomedical domain. They provide an intuitive means toformalize relations that hold between ontological categories. However,their semantics is usually not explicit. We provide a semantics for a partof the OBO Flatfile Format by extending OWL with a method to expressrelational patterns. These patterns are OWL axioms with variables forclasses. The variables can only be filled with named classes. Addition-ally, we provide a semantics for open patterns in OWL. Our method isapplicable to the OBO Flatfile Format, and provides a means to designOWL ontologies using complex ontology design patterns. Therefore, itleads not only to an integration of the OBO Flatfile Format and OWL,but extends OWL with an intuitive interface for designing ontologies us-ing complex definition patterns. A prototypic implementation and testresults are available at http://bioonto.de/obo2owl.},
  author = {Hoehndorf, Robert and Oellrich, Anika and Dumontier, Michel and Kelso, Janet and Herre, Heinrich and Rebholz-Schuhmann, Dietrich},
  booktitle = {OWL: Experiences and Directions (OWLED)},
  citeulike-article-id = {8006318},
  keywords = {conversion, obo2owl, owl2obo, patterns, relations, semantic, web},
  month = {June},
  posted-at = {2010-10-13 11:57:47},
  priority = {2},
  title = {Relational patterns in OWL and their application to OBO},
  year = {2010}
}

OWLDEF: Integrating OBO and OWL

Hoehndorf, Robert, Oellrich, Anika, Dumontier, Michel, Janet Kelso, Herre, Heinrich and Rebholz-Schuhmann, Dietrich

Proceedings of the 13th Annual Bio-Ontologies Meeting (2010)

Biomedical informatics

@inproceedings{h5,
  abstract = {An integration of the OBO Flatfile Format and the Web Ontology Language
OBOF with OWL while maintaining the semantics for relations
provided by the RO.
(OWL) would enable automated reasoning, inferences and consistency
checking of biomedical ontologies and support the development and
maintenance of ontologies developed in the OBO Flatfile Format. So far, the
translation of relations in the OBO language to OWL is performed according
to a single rigid pattern and in violation of the relation definitions of the
OBO Relationship Ontology. We extend both the OBO Flatfile Format and
the Manchester OWL Syntax to accommodate relation definitions. Based on
these extensions, we implemented and evaluated two software applications.
The first converts the OBO Flatfile Format to an OWL representation. The
second uses automated inferences to convert OWL ontologies back to a
representation in the OBO Flatfile Format. The OWLDEF method is generally
applicaple whenever ontologies are developed primarily using patterns and
not a detailled knowledge representation language. The tools and libraries
we developed for the OWLDEF method are available from http://bioonto.
de/obo2owl.},
  author = {Hoehndorf, Robert and Oellrich, Anika and Dumontier, Michel and Janet Kelso and Herre, Heinrich and Rebholz-Schuhmann, Dietrich},
  booktitle = {Proceedings of the 13th Annual Bio-Ontologies Meeting},
  citeulike-article-id = {8006312},
  keywords = {biomedical, conversion, obo, obo2owl, owl, patterns, relations},
  month = {July},
  posted-at = {2010-10-13 11:54:11},
  priority = {2},
  title = {OWLDEF: Integrating OBO and OWL},
  year = {2010}
}

Realism for scientific ontologies

Dumontier, Michel and Hoehndorf, Robert

Formal Ontology in Information Systems, Proceedings of the Sixth International Conference, FOIS 2010, vol. 209, pp. 387-399, In: Antony Galton and Riichiro Mizoguchi (Eds.) (2010)

Biomedical informatics

@inproceedings{h8,
  abstract = {Science aims to develop an accurate understanding of reality through a variety of rigorously empirical and formal methods. Ontologies are used to formalize the meaning of terms within a domain of discourse. The Basic Formal Ontology (BFO) is an ontology of particular importance in the biomedical domains, where it provides the top-level for numerous ontologies, including those admitted as part of the OBO Foundry collection. The BFO requires that all classes in an ontology are actually instantiated in reality. Despite the fact that it is hard to show whether entities of some kind exist or do not exist in reality (especially for unobservable entities like elementary particles), this criterion fails to satisfy the need of scientists to communicate their findings and theories unambiguously. We discuss the problems that arise due to the BFO's realism criterion and suggest viable alternatives.},
  address = {Amsterdam, The Netherlands, The Netherlands},
  author = {Dumontier, Michel and Hoehndorf, Robert},
  booktitle = {Formal Ontology in Information Systems, Proceedings of the
Sixth International Conference, FOIS 2010},
  citeulike-article-id = {7802110},
  citeulike-linkout-0 = {http://portal.acm.org/citation.cfm?id=1804755},
  editor = {Antony Galton and Riichiro Mizoguchi},
  isbn = {978-1-60750-534-1},
  keywords = {biomedical, of, ontology, philosophy, realism, science, wissenschaftstheorie},
  location = {Toronto, Canada},
  month = {May},
  pages = {387-399},
  posted-at = {2010-10-13 11:49:13},
  priority = {2},
  publisher = {IOS Press},
  series = {Frontiers in Artificial Intelligence and Applications},
  title = {Realism for scientific ontologies},
  volume = {209},
  year = {2010}
}

Ontologies in Biology

Kelso, Janet, Hoehndorf, Robert and Prüfer, Kay

Theory and Applications of Ontology: Computer Applications, pp. 347-371, In: Poli, Roberto, Healy, Michael and Kameas, Achilles (Eds.) (2010)

Biomedical informatics

@incollection{h9,
  abstract = {In recent years ontologies have come to play an increasingly important role in the biomedical domain. Primary applications have been the formalisation of community knowledge in molecular biology, and the provision of a shared vocabulary for the annotation of the growing amount of biological data being generated. Ontologies now play a key role in the analysis and reporting of biological data and act as the basis for new biological services being hosted by various GRID projects. More formal methods from ontology theory are gradually being adopted, and have made the existing ontologies more robust. These approaches will continue to extend the number of potential applications for ontologies in the biomedical domain.},
  author = {Kelso, Janet and Hoehndorf, Robert and Pr\"{u}fer, Kay},
  booktitle = {Theory and Applications of Ontology: Computer Applications},
  citeulike-article-id = {8006304},
  citeulike-linkout-0 = {http://dx.doi.org/10.1007/978-90-481-8847-5\_15},
  comment = {10.1007/978-90-481-8847-5\_15},
  editor = {Poli, Roberto and Healy, Michael and Kameas, Achilles},
  keywords = {biology, ontology, review},
  month = {July},
  pages = {347-371},
  posted-at = {2010-10-13 11:47:57},
  priority = {2},
  publisher = {Springer Netherlands},
  title = {Ontologies in Biology},
  url = {http://dx.doi.org/10.1007/978-90-481-8847-5\_15},
  year = {2010}
}

The ontology of biological sequences.

Hoehndorf, Robert, Kelso, Janet and Herre, Heinrich

BMC Bioinformatics, vol. 10(1), pp. 377+ (2009)

Biomedical informatics

@article{h10,
  abstract = {BACKGROUND: Biological sequences play a major role in molecular and computational biology. They are studied as information-bearing entities that make up DNA, RNA or proteins. The Sequence Ontology, which is part of the OBO Foundry, contains descriptions and definitions of sequences and their properties. Yet the most basic question about sequences remains unanswered: what kind of entity is a biological sequence? An answer to this question benefits formal ontologies that use the notion of biological sequences and analyses in computational biology alike. RESULTS: We provide both an ontological analysis of biological sequences and a formal representation that can be used in knowledge-based applications and other ontologies. We distinguish three distinct kinds of entities that can be referred to as "biological sequence": chains of molecules, syntactic representations such as those in biological databases, and the abstract information-bearing entities. For use in knowledge-based applications and inclusion in biomedical ontologies, we implemented the developed axiom system for use in automated theorem proving. CONCLUSION: Axioms are necessary to achieve the main goal of ontologies: to formally specify the meaning of terms used within a domain. The axiom system for the ontology of biological sequences is the first elaborate axiom system for an OBO Foundry ontology and can serve as starting point for the development of more formal ontologies and ultimately of knowledge-based applications.},
  addendum = {IF: 3.24},
  author = {Hoehndorf$^*$, Robert and Kelso, Janet and Herre, Heinrich},
  citeulike-article-id = {6132013},
  citeulike-linkout-0 = {http://dx.doi.org/10.1186/1471-2105-10-377},
  citeulike-linkout-1 = {http://view.ncbi.nlm.nih.gov/pubmed/19919720},
  citeulike-linkout-2 = {http://www.hubmed.org/display.cgi?uids=19919720},
  date = {2009},
  day = {18},
  issn = {1471-2105},
  journal = {BMC Bioinformatics},
  keywords = {axiom, biological, connectedness, first-order, logic, molecule, ontology, second-order, sequence, system},
  month = {November},
  number = {1},
  pages = {377+},
  posted-at = {2010-10-13 11:43:34},
  priority = {2},
  title = {The ontology of biological sequences.},
  url = {http://dx.doi.org/10.1186/1471-2105-10-377},
  volume = {10},
  year = {2009}
}

BOWiki: an ontology-based wiki for annotation of data and integration of knowledge in biology.

Hoehndorf, Robert, Bacher, Joshua, Backhaus, Michael, Gregorio, Sergio E., Loebe, Frank, Prüfer, Kay, Uciteli, Alexandr, Visagie, Johann, Herre, Heinrich and Kelso, Janet

BMC Bioinformatics, vol. 10 Suppl 5(Suppl 5), pp. S5+ (2009)

Biomedical informatics

@article{h15,
  abstract = {MOTIVATION: Ontology development and the annotation of biological data using ontologies are time-consuming exercises that currently require input from expert curators. Open, collaborative platforms for biological data annotation enable the wider scientific community to become involved in developing and maintaining such resources. However, this openness raises concerns regarding the quality and correctness of the information added to these knowledge bases. The combination of a collaborative web-based platform with logic-based approaches and Semantic Web technology can be used to address some of these challenges and concerns. RESULTS: We have developed the BOWiki, a web-based system that includes a biological core ontology. The core ontology provides background knowledge about biological types and relations. Against this background, an automated reasoner assesses the consistency of new information added to the knowledge base. The system provides a platform for research communities to integrate information and annotate data collaboratively. AVAILABILITY: The BOWiki and supplementary material is available at http://www.bowiki.net/. The source code is available under the GNU GPL from http://onto.eva.mpg.de/trac/BoWiki.},
  addendum = {IF: 3.24},
  author = {Hoehndorf$^*$, Robert and Bacher, Joshua and Backhaus, Michael and Gregorio, Sergio E. and Loebe, Frank and Pr\"{u}fer, Kay and Uciteli, Alexandr and Visagie, Johann and Herre, Heinrich and Kelso$^*$, Janet},
  citeulike-article-id = {4525735},
  citeulike-linkout-0 = {http://dx.doi.org/10.1186/1471-2105-10-S5-S5},
  citeulike-linkout-1 = {http://view.ncbi.nlm.nih.gov/pubmed/19426462},
  citeulike-linkout-2 = {http://www.hubmed.org/display.cgi?uids=19426462},
  date = {2009},
  issn = {1471-2105},
  journal = {BMC Bioinformatics},
  keywords = {biomedicine, bowiki, data, database, integration, semantic, wiki},
  number = {Suppl 5},
  pages = {S5+},
  posted-at = {2010-10-13 11:32:22},
  priority = {2},
  title = {BOWiki: an ontology-based wiki for annotation of data and integration of knowledge in biology.},
  url = {http://dx.doi.org/10.1186/1471-2105-10-S5-S5},
  volume = {10 Suppl 5},
  year = {2009}
}

The application of an ontology design pattern for functional abnormalities to phenotype ontologies and the extraction of an ontology of anatomical functions

Hoehndorf, Robert, Ngonga Ngomo, Axel-Cyrille and Kelso, Janet

Proceedings of the The 3rd International Symposium on Languages in Biology and Medicine (2009)

Biomedical informatics

@inproceedings{h11,
  abstract = {Functions play an important role throughoutbiology. Although molecular functions are cov-ered in the Gene Ontology, there is currently nopublicly available ontology of anatomical func-tions. Ontological considerations on the natureof functional abnormalities and their represen-tation in current phenotype ontologies showthat we can automatically extract a skeletonfor such an ontology of anatomical functionsby using a combination of process, phenotypeand anatomy ontologies. We provide an onto-logical analysis of the nature of functions andfunctional abnormalities. From this analysis,we derive an approach to the automatic ex-traction of anatomical functions from existingontologies using a combination of natural lan-guage processing, graph-based analysis of theontologies and formal inferences. Alternatively,we introduce a new relation to relate materialobjects to processes that realize the functionof the object to avoid a needless duplication ofprocesses already present in the Gene Ontol-ogy in a new ontology of anatomical functions.We discuss several limitations of the currentontologies that still need to be addressed to en-sure a consistent and complete representationof anatomical functions and functional abnor-malities.},
  author = {Hoehndorf, Robert and Ngonga Ngomo, Axel-Cyrille and Kelso, Janet},
  booktitle = {Proceedings of the The 3rd International Symposium on Languages in Biology and Medicine},
  citeulike-article-id = {8006292},
  keywords = {abnormality, disease, function, hpo, human, mouse, mpo, ontology, phenotype},
  month = {October},
  posted-at = {2010-10-13 11:42:23},
  priority = {2},
  title = {The application of an ontology design pattern for functional abnormalities to phenotype ontologies and the extraction of an ontology of anatomical functions},
  year = {2009}
}

Developing Consistent and Modular Software Models with Ontologies

Hoehndorf, Robert, Ngonga Ngomo, Axel-Cyrille and Herre, Heinrich

SoMeT, pp. 399-412 (2009)

Biomedical informatics

@inproceedings{h12,
  abstract = {The development and verification of software models that are applica-
ble across multiple domains remains a difficult problem. We propose a novel ap-
proach to model-driven software development based on ontologies and Semantic
Web technology. Our approach uses three ontologies to define software models: a
task ontology, a domain ontology and a top-level ontology. The task ontology serves
as the conceptual model for the software, the domain ontology provides domain-
specific knowledge and the top-level ontology integrates the task and domain on-
tologies. Our method allows the verification of these models both for consistency
and ontological adequacy. This verification can be performed both at development
and runtime. Domain ontologies are replaceable modules, which enables the com-
parison and application of the models built using our method across multiple do-
mains. We demonstrate the viability of our approach through the design and im-
plementation of a semantic wiki and a social tagging system, and compare it with
model-driven software development to illustrate its benefits.},
  author = {Hoehndorf, Robert and Ngonga Ngomo, Axel-Cyrille and Herre, Heinrich},
  booktitle = {SoMeT},
  citeulike-article-id = {8006283},
  keywords = {based, conceptual, development, modelling, ontology, programming, software},
  month = {September},
  pages = {399-412},
  posted-at = {2010-10-13 11:38:47},
  priority = {2},
  title = {Developing Consistent and Modular Software Models with Ontologies},
  year = {2009}
}

Contributions to the formal ontology of functions and dispositions: an application of non-monotonic reasoning

Hoehndorf, Robert, Kelso, Janet and Herre, Heinrich

Proceedings of the 12th Annual Bio-Ontologies Meeting (2009)

Biomedical informatics

@inproceedings{h13,
  abstract = {We introduce a basic ontology of functions and dispositions. The
theory we suggest is compatible both with major philosophical
theories of biological functions and with most top-level
ontologies. The particular focus of the suggested formalism is on
the inference of causal relationships from functionality and the
explicit formalization of the normative character of functions using
non-monotonic forms of knowledge representation.},
  author = {Hoehndorf, Robert and Kelso, Janet and Herre, Heinrich},
  booktitle = {Proceedings of the 12th Annual Bio-Ontologies Meeting},
  citeulike-article-id = {8006280},
  day = {28},
  keywords = {disposition, functions, malfunctionings, non-monotonic, reasoning},
  month = {June},
  posted-at = {2010-10-13 11:37:30},
  priority = {2},
  title = {Contributions to the formal ontology of functions and dispositions: an application of non-monotonic reasoning},
  year = {2009}
}

A Formal Ontology of Sequences

Hoehndorf, Robert, Kelso, Janet and Herre, Heinrich

Nature Precedings, no. 713 (2009)

Biomedical informatics

@inproceedings{h14,
  abstract = {The Sequence Ontology is an OBO Foundry ontology that provides categories of sequences and sequence features that are applied to the annotation of genomes.  To facilitate interoperability with other domain ontologies and to provide a foundation for automated inference, we provide here an axiom system for the Sequence and Junction categories in first- and second-order predicate logics.},
  author = {Hoehndorf, Robert and Kelso, Janet and Herre, Heinrich},
  booktitle = {Proceedings of the First International Conference on Biomedical Ontologies (ICBO)},
  citeulike-article-id = {6416228},
  citeulike-linkout-0 = {http://dx.doi.org/10.1038/npre.2009.3537.1},
  citeulike-linkout-1 = {http://precedings.nature.com/documents/3537/version/1},
  journal = {Nature Precedings},
  keywords = {axiom, first-order, icbo, logic, ontology, second-order, sequence, system},
  month = {July},
  number = {713},
  posted-at = {2010-10-13 11:33:22},
  priority = {2},
  publisher = {Nature Publishing Group},
  title = {A Formal Ontology of Sequences},
  url = {http://dx.doi.org/10.1038/npre.2009.3537.1},
  year = {2009}
}

GFO-Bio: A biomedical core ontology

Hoehndorf, Robert, Loebe, Frank, Poli, Roberto, Kelso, Janet and Herre, Heinrich

Applied Ontology, vol. 3(4), pp. 219-227 (2008)

Biomedical informatics

@article{h23,
  abstract = {The rapid increase in the number and use of biological ontologies necessitates developing systems for their integration. In this paper we present a core ontology for biology, and outline its application for integrating biological domain ontologies. Our ontology rests on a foundational ontology, which offers higher-order categories and a theory of levels of reality. The core ontology is implemented in two separate components, each of which adheres to OWL-DL. These can be used independently with efficient DL reasoners, but they will be most effective when used together, which necessitates working with an OWL-Full ontology. The ontology is freely available from our website at: http://bioonto.de/pmwiki.php/Main/GFO-Bio.},
  addendum = {IF: 0.85},
  author = {Hoehndorf, Robert and Loebe, Frank and Poli, Roberto and Kelso, Janet and Herre, Heinrich},
  citeulike-article-id = {3307557},
  date = {2008},
  journal = {Applied Ontology},
  keywords = {biological, core, integration, obo, ontology, top-level, upper-level},
  month = {December},
  number = {4},
  pages = {219-227},
  posted-at = {2008-09-21 10:11:21},
  priority = {0},
  title = {GFO-Bio: A biomedical core ontology},
  volume = {3},
  year = {2008}
}

Towards Ontological Interpretations for Improved Text Mining

Hoehndorf, Robert, Ngonga Ngomo, Axel-Cyrille and Dannemann, Michael

Proceedings of the Third International Symposium on Semantic Mining in Biomedicine (SMBM 2008), Turku, Finland, pp. 165-166, In: Salakoski, Tapio, Rebholz-Schuhmann, Dietrich and Pyysalo, Sampo (Eds.) (2008)

Biomedical informatics

@inproceedings{h16,
  author = {Hoehndorf, Robert and Ngonga Ngomo, Axel-Cyrille and Dannemann, Michael},
  booktitle = {Proceedings of the Third International Symposium on Semantic Mining in Biomedicine (SMBM 2008), Turku, Finland},
  citeulike-article-id = {8006267},
  citeulike-linkout-0 = {http://mars.cs.utu.fi/smbm2008/files/smbm2008proceedings/smbmpaper\_wopp1.pdf},
  editor = {Salakoski, Tapio and Rebholz-Schuhmann, Dietrich and Pyysalo, Sampo},
  keywords = {abductive, biomedical, inference, mining, ontology, reasoning, text},
  month = {September},
  pages = {165-166},
  posted-at = {2010-10-13 11:30:12},
  priority = {2},
  publisher = {Turku Centre for Computer Science (TUCS)},
  title = {Towards Ontological Interpretations for Improved Text Mining},
  url = {http://mars.cs.utu.fi/smbm2008/files/smbm2008proceedings/smbmpaper\_wopp1.pdf},
  year = {2008}
}

From Terms to Categories: Testing the Significance of Co-occurrences between Ontological Categories

Hoehndorf, Robert, Ngonga Ngomo, Axel-Cyrille, Dannemann, Michael and Kelso, Janet

Proceedings of the Third International Symposium on Semantic Mining in Biomedicine (SMBM 2008), Turku, Finland, pp. 53-60, In: Salakoski, Tapio, Rebholz-Schuhmann, Dietrich and Pyysalo, Sampo (Eds.) (2008)

Biomedical informatics

@inproceedings{h17,
  abstract = {The co-occurrence of terms in a text corpus
may indicate the presence of a relation
between the referents of these terms. We
expect co-occurrence-based methods to
identify association relations that cannot be
found using static patterns. We developed
a new method to identify associations
between ontological categories in text
using the co-occurrence of terms that
designate these categories. We use the
taxonomic structure of the ontologies to
cumulate the number of co-occurrences
of terms designating categories. Based
on these cumulated values, we designed
a novel family of statistical tests to
identify associated categories.
These
tests take both co-occurrence specificity
and relevance into consideration.
We
applied our method to a 2.2 GB text
corpus containing fulltext articles and
used Gene Ontology's biological process
ontology and the Celltype Ontology. The
software and results can be found at http:
//bioonto.de/pmwiki.php/Main/
ExtractingBiologicalRelations.},
  author = {Hoehndorf, Robert and Ngonga Ngomo, Axel-Cyrille and Dannemann, Michael and Kelso, Janet},
  booktitle = {Proceedings of the Third International Symposium on Semantic Mining in Biomedicine (SMBM 2008), Turku, Finland},
  citeulike-article-id = {8006264},
  citeulike-linkout-0 = {http://mars.cs.utu.fi/smbm2008/files/smbm2008proceedings/smbmpaper\_2.pdf},
  editor = {Salakoski, Tapio and Rebholz-Schuhmann, Dietrich and Pyysalo, Sampo},
  keywords = {mining, ontology, significance, statistics, testing, text},
  month = {September},
  pages = {53-60},
  posted-at = {2010-10-13 11:28:18},
  priority = {2},
  publisher = {Turku Centre for Computer Science (TUCS)},
  title = {From Terms to Categories: Testing the Significance of Co-occurrences between Ontological Categories},
  url = {http://mars.cs.utu.fi/smbm2008/files/smbm2008proceedings/smbmpaper\_2.pdf},
  year = {2008}
}

BOWiki: An ontology-based wiki for annotation of data and integration of knowledge in biology

Hoehndorf, Robert, Bacher, Joshua, Backhaus, Michael, Gregorio, Sergio E., Loebe, Frank, Prüfer, Kay, Uciteli, Alexandr, Visagie, Johann, Herre, Heinrich and Kelso, Janet

Proceedings of the 11th Annual Bio-Ontologies Meeting, In: Lord, Phillip, Shah, Nigam, Sansone, Susanna-Assunta and Cockerill, Matthew (Eds.) (2008)

Biomedical informatics

Ontology development and the annotation of biological data using ontologies are time-consuming exercises that currently requires input from expert curators. Open, collaborative platforms for biological data annotation enable the wider scientific community to become involved in developing and maintaining such resources. However, this openness raises concerns regarding the quality and correctness of the information added to these knowledge bases. The combination of a collaborative web-based platform with logic-based approaches and Semantic Web technology can be used to address some of these challenges and concerns. We have developed the BOWiki, a web-based system that includes a biological core ontology. The core ontology provides background knowledge about biological types and relations. Against this background, an automated reasoner assesses the consistency of new information added to the knowledge base. The system provides a platform for research communities such as wikis for the description, discussion and annotation of the functions of genes and gene products [Wang, 2006, Hoehndorf et al., 2006, Giles, 2007]. However, an open approach like wikis frequently raises concerns regarding the quality of the information captured. The information represented in the wiki should adhere to particular quality criteria such as internal consistency (the wiki content does not contain contradictory information) and consistency with biological background knowledge (the wiki content should be semantically correct). To address some of these concerns, logic-based tools can be employed. We have developed the BOWiki, a wiki system that uses a core ontology together with an automated reasoner to maintain a consistent knowledge base. It is specifically targeted at small- to medium-sized communities. to collaboratively integrate information and annotate data. The BOWiki and supplementary material is available at http://www. bowiki.net/. The source code is available under the GNU GPL from http://onto.eva.mpg.de/trac/BoWiki. Contact: bowiki-users@lists.informatik.uni-leipzig.de

@inproceedings{h18,
  abstract = {Ontology development and the annotation of biological data using
ontologies are time-consuming exercises that currently requires input from
expert curators. Open, collaborative platforms for biological data annotation
enable the wider scientific community to become involved in developing
and maintaining such resources. However, this openness raises concerns
regarding the quality and correctness of the information added to these
knowledge bases. The combination of a collaborative web-based platform
with logic-based approaches and Semantic Web technology can be used to
address some of these challenges and concerns.
We have developed the BOWiki, a web-based system that includes a
biological core ontology. The core ontology provides background knowledge
about biological types and relations. Against this background, an automated
reasoner assesses the consistency of new information added to the
knowledge base. The system provides a platform for research communities
such as wikis for the description, discussion and annotation of the
functions of genes and gene products [Wang, 2006, Hoehndorf et al.,
2006, Giles, 2007].
However, an open approach like wikis frequently raises concerns
regarding the quality of the information captured. The information
represented in the wiki should adhere to particular quality
criteria such as internal consistency (the wiki content does not
contain contradictory information) and consistency with biological
background knowledge (the wiki content should be semantically
correct). To address some of these concerns, logic-based tools can
be employed.
We have developed the BOWiki, a wiki system that uses a
core ontology together with an automated reasoner to maintain a
consistent knowledge base. It is specifically targeted at small- to
medium-sized communities.
to collaboratively integrate information and annotate data.
The BOWiki and supplementary material is available at http://www.
bowiki.net/. The source code is available under the GNU GPL from
http://onto.eva.mpg.de/trac/BoWiki.
Contact: bowiki-users@lists.informatik.uni-leipzig.de},
  author = {Hoehndorf, Robert and Bacher, Joshua and Backhaus, Michael and Gregorio, Sergio E. and Loebe, Frank and Pr\"{u}fer, Kay and Uciteli, Alexandr and Visagie, Johann and Herre, Heinrich and Kelso, Janet},
  booktitle = {Proceedings of the 11th Annual Bio-Ontologies Meeting},
  citeulike-article-id = {8006257},
  editor = {Lord, Phillip and Shah, Nigam and Sansone, Susanna-Assunta and Cockerill, Matthew},
  keywords = {annotation, bowiki, ontology, semantic, wiki},
  month = {June},
  posted-at = {2010-10-13 11:25:24},
  priority = {2},
  title = {BOWiki: An ontology-based wiki for annotation of data and integration of knowledge in biology},
  year = {2008}
}

Representing default knowledge in biomedical ontologies: Application to the integration of anatomy and phenotype ontologies

Hoehndorf, Robert, Loebe, Frank, Kelso, Janet and Herre, Heinrich

BMC Bioinformatics, vol. 8(1) (2007)

Biomedical informatics

@article{h25,
  abstract = {BACKGROUND:Current efforts within the biomedical ontology community focus on achieving interoperability between various biomedical ontologies that cover a range of diverse domains. Achieving this interoperability will contribute to the creation of a rich knowledge base that can be used for querying, as well as generating and testing novel hypotheses. The OBO Foundry principles, as applied to a number of biomedical ontologies, are designed to facilitate this interoperability. However, semantic extensions are required to meet the OBO Foundry interoperability goals. Inconsistencies may arise when ontologies of properties - mostly phenotype ontologies - are combined with ontologies taking a canonical view of a domain - such as many anatomical ontologies. Currently, there is no support for a correct and consistent integration of such ontologies.RESULTS:We have developed a methodology for accurately representing canonical domain ontologies within the OBO Foundry. This is achieved by adding an extension to the semantics for relationships in the biomedical ontologies that allows for treating canonical information as default. Conclusions drawn from default knowledge may be revoked when additional information becomes available. We show how this extension can be used to achieve interoperability between ontologies, and further allows for the inclusion of more knowledge within them. We apply the formalism to ontologies of mouse anatomy and mammalian phenotypes in order to demonstrate the approach.CONCLUSION:Biomedical ontologies require a new class of relations that can be used in conjunction with default knowledge, thereby extending those currently in use. The inclusion of default knowledge is necessary in order to ensure interoperability between ontologies.},
  addendum = {IF: 3.24},
  author = {Hoehndorf$^*$, Robert and Loebe, Frank and Kelso, Janet and Herre, Heinrich},
  citeulike-article-id = {3307395},
  citeulike-linkout-0 = {http://dx.doi.org/10.1186/1471-2105-8-377},
  date = {2007},
  journal = {BMC Bioinformatics},
  keywords = {anatomy, integration, logic, mouse, nonmonotonic, ontology, pato, phenotype},
  month = {October},
  number = {1},
  posted-at = {2008-09-21 10:04:40},
  priority = {0},
  title = {Representing default knowledge in biomedical ontologies: Application to the integration of anatomy and phenotype ontologies},
  url = {http://dx.doi.org/10.1186/1471-2105-8-377},
  volume = {8},
  year = {2007}
}

BOWiki - a collaborative annotation and ontology curation framework

Backhaus, Michael, Kelso, Janet, Bacher, Joshua, Herre, Heinrich, Hoehndorf, Robert and Loebe and Visagie, Johann

Proceedings of Workshop on Social and Collaborative Construction of Structured Knowledge (2007)

Biomedical informatics

@inproceedings{h27,
  abstract = {As the amount of data being generated in biology has
increased, a major challenge has been how to store
and represent this data in a way that makes it
easily accessible to researchers from diverse
domains. Understanding the relationship between
genotype and phenotype is a major focus of
biological research. Various approaches to providing
the link between genes and their functions have been
undertaken - most require significant and dedicated
manual curation. Advances in web technologies make
possible an alternative route for the construction
of such knowledge bases - large-scale community
collaboration. We describe here a system, the
BOWiki, for the collaborative annotation of gene
information. We argue that a semantic wiki
provides the functionality required for this
project since this can capitalize on the existing
representations in biological ontologies. We
describe our implementation and show how formal
ontologies could be used to increase the usability
of the software through type-checking and automatic
reasoning.},
  author = {Backhaus, Michael and Kelso, Janet and Bacher,
Joshua and Herre, Heinrich and Hoehndorf, Robert and
Loebe, Frank and Visagie, Johann},
  booktitle = {Proceedings of Workshop on Social and Collaborative
Construction of Structured Knowledge},
  citeulike-article-id = {2349127},
  keywords = {ontology, semantic, wiki},
  month = {May},
  posted-at = {2008-02-07 13:40:08},
  priority = {2},
  title = {BOWiki - a collaborative annotation and ontology
curation framework},
  year = {2007}
}

A top-level ontology of functions and its application in the Open Biomedical Ontologies.

Burek, Patryk, Hoehndorf, Robert, Loebe, Frank, Visagie, Johann, Herre, Heinrich and Kelso, Janet

Bioinformatics, vol. 22(14), pp. e66-e73 (2006)

Biomedical informatics

@article{h26,
  abstract = {MOTIVATION: A clear understanding of functions in biology is a key component in accurate modelling of molecular, cellular and organismal biology. Using the existing biomedical ontologies it has been impossible to capture the complexity of the community's knowledge about biological functions. RESULTS: We present here a top-level ontological framework for representing knowledge about biological functions. This framework lends greater accuracy, power and expressiveness to biomedical ontologies by providing a means to capture existing functional knowledge in a more formal manner. An initial major application of the ontology of functions is the provision of a principled way in which to curate functional knowledge and annotations in biomedical ontologies. Further potential applications include the facilitation of ontology interoperability and automated reasoning. A major advantage of the proposed implementation is that it is an extension to existing biomedical ontologies, and can be applied without substantial changes to these domain ontologies. AVAILABILITY: The Ontology of Functions (OF) can be downloaded in OWL format from http://onto.eva.mpg.de/. Additionally, a UML profile and supplementary information and guides for using the OF can be accessed from the same website. CONTACT: bioonto@lists.informatik.uni-leipzig.de.},
  addendum = {IF: 6.94},
  address = {Department of Computer Science, Faculty of Mathematics and Computer Science, University of Leipzig Augustusplatz 10-11, 04109 Leipzig.},
  author = {Burek, Patryk and Hoehndorf, Robert and Loebe, Frank and Visagie, Johann and Herre, Heinrich and Kelso$^*$, Janet},
  citeulike-article-id = {3307367},
  date = {2006},
  journal = {Bioinformatics},
  keywords = {anatomy-phenotype, classification, function, oi, ontology, protein, searle},
  month = {July},
  number = {14},
  pages = {e66-e73},
  posted-at = {2008-09-21 10:04:39},
  priority = {0},
  title = {A top-level ontology of functions and its application in the Open Biomedical Ontologies.},
  url = {http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/14/e66},
  volume = {22},
  year = {2006}
}

The design of a wiki-based curation system for the Ontology of Functions

Hoehndorf, Robert, Prüfer, Kay, Backhaus, Michael, Visagie, Johann and Kelso, Janet

Proceedings of the Joint BioLINK and 9th Bio-Ontologies Meeting (2006)

Biomedical informatics

@inproceedings{h19,
  abstract = {Recently, studies argued that statistical and linguistic methods can be applied to extract information from biomedical ontologies, and represent the identified relations in the top-level ontology Ontology of Functions (OF). However, human intervention is required in order to clear noise from the generated data. A simple platform for collaborative work is needed. We propose here the use of a semantic wiki to represent relations between terms. We provide a relationship model for this semantic wiki, and add a core ontology as top-level type system to this wiki. We then discuss a design for the implementation of a platform for the curation of the OF, thereby enabling the community to curate the results of automatic extraction methods, and to add and correct ontology and annotation information. The aim of this collaborative effort is to provide a means to extend and correct the numerous ontologies which are used to describe biological functions in the OF.},
  author = {Hoehndorf, Robert and Pr\"{u}fer, Kay and Backhaus, Michael and Visagie, Johann and Kelso, Janet},
  booktitle = {Proceedings of the Joint BioLINK and 9th Bio-Ontologies Meeting},
  citeulike-article-id = {8006249},
  citeulike-linkout-0 = {http://onto.eva.mpg.de/publication/2006/HPBVK06/},
  keywords = {bowiki, curation, functions, ontology},
  month = {July},
  posted-at = {2010-10-13 11:20:19},
  priority = {2},
  title = {The design of a wiki-based curation system for the Ontology of Functions},
  url = {http://onto.eva.mpg.de/publication/2006/HPBVK06/},
  year = {2006}
}

A proposal for a gene functions wiki

Hoehndorf, Robert, Prüfer, Kay, Backhaus, Michael, Herre, Heinrich, Kelso, Janet, Loebe, Frank and Visagie, Johann

OTM Workshops 2006, no. 4277, pp. 669-678, In: Meersman, R., Tari, Z. and Herrero, P. (Eds.) (2006)

Biomedical informatics

@inproceedings{h21,
  abstract = {Large knowledge bases integrating dierent domains canprovide a foundation for new applications in biology such as data miningor automated reasoning. The traditional approach to the constructionof such knowledge bases is manual and therefore extremely time consuming.The ubiquity of the internet now makes large-scale communitycollaboration for the construction of knowledge bases, such as the successfulonline encyclopedia  "Wikipedia", possible. We propose an extension of this model to the collaborative annotationof molecular data. We argue that a semantic wiki provides the functionalityrequired for this project since this can capitalize on the existingrepresentations in biological ontologies. We discuss the use of a differentrelationship model than the one provided by RDF and OWL torepresent the semantic data. We argue that this leads to a more intuitiveand correct way to enter semantic content in the wiki. Furthermore, weshow how formal ontologies could be used to increase the usability ofthe software through type-checking and automatic reasoning.},
  author = {Hoehndorf, Robert and Pr\"{u}fer, Kay and Backhaus, Michael and Herre, Heinrich and Kelso, Janet and Loebe, Frank and Visagie, Johann},
  booktitle = {OTM Workshops 2006},
  citeulike-article-id = {3307617},
  citeulike-linkout-1 = {http://onto.eva.mpg.de/publication/2006/HPBHKLV06a},
  comment = {in print},
  editor = {Meersman, R. and Tari, Z. and Herrero, P.},
  keywords = {biology, bowiki, ontology, semantic, wiki},
  month = {November},
  number = {4277},
  pages = {669-678},
  posted-at = {2008-09-21 10:13:30},
  priority = {0},
  publisher = {Springer-Verlag},
  series = {LNCS},
  title = {A proposal for a gene functions wiki},
  year = {2006}
}

Interactively Exploring Graph Coloring Algorithms in a Bilingual Web Platform with Gamification

Maha Alrashed, Lujain Alharbi, Omamah Talal Al-Muhammadi, Salha Bahadiq, Robert Hoehndorf and Liam Mencel

Proceedings of EdMedia: World Conference on Educational Media and Technology 2017, pp. 298-302, In: Joyce P. Johnston (Ed.) ( 2017 )

Biomedical informatics

@inproceedings{AlraAlhaTala2017ll,
  abstract = { Graph coloring is a concept in graph theory that has many real world applications, such as scheduling and map coloring, thus making it an essential part of a computer science curriculum. Most graph theory courses are taught using standard methods such as with textbooks or a blackboard. Such methods introduce graph theory without providing the student with an adequate understanding of the process and the computational complexity of this NP-complete problem, which in return, hinders their ability to understand, implement, and develop graph coloring algorithms. In this paper, we describe the conceptual design of a bilingual web platform developed to provide students with interactive exploration of different graph coloring algorithms. },
  address = { Washington, DC },
  author = { Maha Alrashed and Lujain Alharbi and Omamah Talal Al-Muhammadi and Salha Bahadiq and Robert Hoehndorf and Liam Mencel },
  booktitle = { Proceedings of EdMedia: World Conference on Educational Media and Technology 2017 },
  editor = { Joyce P. Johnston },
  month = { June },
  pages = { 298--302 },
  publisher = { Association for the Advancement of Computing in Education (AACE) },
  title = { Interactively Exploring Graph Coloring Algorithms in a Bilingual Web Platform with Gamification },
  url = { https://www.learntechlib.org/p/178329 },
  year = { 2017 }
}