Bio-IT competencies and where to find them

November 4, 2021

Recently, Bio-IT took some steps towards the standardisation of its activities description. In the context of the Data Sciences strategy at EMBL and in particular in WorkStream 1 - “Internal and external training & support”, we collected a glossary of competencies that we are going to use for different purposes.

Bio-IT competencies will be used in training materials and consulting

Bio-IT competencies

This glossary includes two types of terms: the skills, representing expertise, abilities to use tools, and the topics, matters of knowledge. All kinds of Bio-IT activities (from courses to gatherings to platforms and tools) can support you in learning/applying some of them. The glossary was developed for two main aims:

Classify the past and upcoming training events, as well as the related training materials. This will allow us to build an interface to consult, filter and navigate our training offer. In addition, by collecting the community requests, we will also be able to consistently compare the training demand to the training offer, and adjust it accordingly.
Classify the expertise of the community members listed in the grassroots project. Grassroots groups experts that volunteered to provide assistance and consulting on a wide range of topics. Also in this case, a consistent and structured presentation of the expertise will support users navigating this data and ultimately looking for consultation. We are working on re-engineering the grassroots project as a whole, to have more details about this please check the related repository. We welcome contributions!

Following, the list of skills and topics, including a short descriptions.

Skills

Skill	Description
Using specialised research software	Using specialised research software through graphical or programmatic user interfaces
Exploratory data analysis and visualisation	Experimental design, hypothesis testing, clustering and exploration using PCA, etc.
Programming languages	Python, R, Julia, Matlab, software engineering concepts, data structures and algorithms
Statistics and machine learning	Training and evaluating various regression and classification models, including deep learning
Image analysis	Recognizing and analysing objects in 2D, 3D, time-lapse image data acquired with various techniques, such as EM, LM, etc.
Cluster computing	To parallelize and scale your scripts and programs to many cores available through EMBL clusters
Command-line computing	Writing UNIX shell scripts, performing simple file manipulations and text searches using e.g. regular expressions
Computational workflow management	To automate complex analyses and make them reproducible using e.g., Galaxy, Snakemake, CWL, or NextFlow
Cloud computing	To run your software and scripts in the cloud using docker, singularity, openstack, or kubernetes
GPU computing	Massively parallel computing for machine learning, structural analyses, etc.
Data management and curation	Using spreadsheets, designing and querying databases, R-tidyverse, FAIR principles, etc.
Software project management	Versioning your code using git, organising software development with multiple contributors, agile development, etc.
Biological networks analysis	Using GO/pathway/drug/disease enrichment analysis, multi-omics integration, visualization, graph theory, Cytoscape, etc.
Biological modelling	E.g. physics-inspired models of enzyme or drug kinetics, metabolism, developmental processes, ecosystems, etc.
Text mining	Text mining for automatic information retrieval from large bodies of literature using natural language processing, controlled vocabularies, ontologies, etc.
Benchmarking bioinformatics tools	Setting up meaningful simulations and evaluations
Web technologies	Developing websites using HTML/CSS, content management systems, Apache, Gitlab Pages, and others
System administration	Setting up server operating systems, virtualization, clusters and schedulers

Topics

Topic	Description
Transcriptomics	Methods to determine and analyse the complete set of RNA transcripts that are produced by the genome
Genomics and comparative genomics	Tools to assess genome assembly and quality, to compare genome across species
Proteomics and protein analysis	Systematic identification, quantification and description of the proteome of a biological system
Bioimaging	Methods to visualise biological processes in real time
Cancer genomics and personalised medicine	Comparison of DNA sequence and gene expression between tumour cells and host cells, methods to study individual responses to drugs
Metagenomics and other meta-omics	Tools to study genetic material (and its derivatives) recovered from environmental sample
Structural biology	Study of the molecular structure and dynamics of biological macromolecules, and their relationship to their function
Metabolomics	Large-scale study of small molecules, i.e. metabolites, within cells, biofluids, tissues or organisms
Neurobiology and neuroinformatics	Computational models and tools for sharing, integration, and analysis of experimental data related to the nervous system

Follow the Bio-IT blog to get updates on how our small glossary will be used!

Photo by Markus Spiske on Unsplash