Bio-IT competencies and where to find them
Recently, Bio-IT took some steps towards the standardisation of its activities description. In the context of the Data Sciences strategy at EMBL and in particular in WorkStream 1 - “Internal and external training & support”, we collected a glossary of competencies that we are going to use for different purposes.
Bio-IT competencies
This glossary includes two types of terms: the skills, representing expertise, abilities to use tools, and the topics, matters of knowledge. All kinds of Bio-IT activities (from courses to gatherings to platforms and tools) can support you in learning/applying some of them. The glossary was developed for two main aims:
- Classify the past and upcoming training events, as well as the related training materials. This will allow us to build an interface to consult, filter and navigate our training offer. In addition, by collecting the community requests, we will also be able to consistently compare the training demand to the training offer, and adjust it accordingly.
- Classify the expertise of the community members listed in the grassroots project. Grassroots groups experts that volunteered to provide assistance and consulting on a wide range of topics. Also in this case, a consistent and structured presentation of the expertise will support users navigating this data and ultimately looking for consultation. We are working on re-engineering the grassroots project as a whole, to have more details about this please check the related repository. We welcome contributions!
Following, the list of skills and topics, including a short descriptions.
Skill | Description |
---|---|
Using specialised research software | Using specialised research software through graphical or programmatic user interfaces |
Exploratory data analysis and visualisation | Experimental design, hypothesis testing, clustering and exploration using PCA, etc. |
Programming languages | Python, R, Julia, Matlab, software engineering concepts, data structures and algorithms |
Statistics and machine learning | Training and evaluating various regression and classification models, including deep learning |
Image analysis | Recognizing and analysing objects in 2D, 3D, time-lapse image data acquired with various techniques, such as EM, LM, etc. |
Cluster computing | To parallelize and scale your scripts and programs to many cores available through EMBL clusters |
Command-line computing | Writing UNIX shell scripts, performing simple file manipulations and text searches using e.g. regular expressions |
Computational workflow management | To automate complex analyses and make them reproducible using e.g., Galaxy, Snakemake, CWL, or NextFlow |
Cloud computing | To run your software and scripts in the cloud using docker, singularity, openstack, or kubernetes |
GPU computing | Massively parallel computing for machine learning, structural analyses, etc. |
Data management and curation | Using spreadsheets, designing and querying databases, R-tidyverse, FAIR principles, etc. |
Software project management | Versioning your code using git, organising software development with multiple contributors, agile development, etc. |
Biological networks analysis | Using GO/pathway/drug/disease enrichment analysis, multi-omics integration, visualization, graph theory, Cytoscape, etc. |
Biological modelling | E.g. physics-inspired models of enzyme or drug kinetics, metabolism, developmental processes, ecosystems, etc. |
Text mining | Text mining for automatic information retrieval from large bodies of literature using natural language processing, controlled vocabularies, ontologies, etc. |
Benchmarking bioinformatics tools | Setting up meaningful simulations and evaluations |
Web technologies | Developing websites using HTML/CSS, content management systems, Apache, Gitlab Pages, and others |
System administration | Setting up server operating systems, virtualization, clusters and schedulers |
Topic | Description |
---|---|
Transcriptomics | Methods to determine and analyse the complete set of RNA transcripts that are produced by the genome |
Genomics and comparative genomics | Tools to assess genome assembly and quality, to compare genome across species |
Proteomics and protein analysis | Systematic identification, quantification and description of the proteome of a biological system |
Bioimaging | Methods to visualise biological processes in real time |
Cancer genomics and personalised medicine | Comparison of DNA sequence and gene expression between tumour cells and host cells, methods to study individual responses to drugs |
Metagenomics and other meta-omics | Tools to study genetic material (and its derivatives) recovered from environmental sample |
Structural biology | Study of the molecular structure and dynamics of biological macromolecules, and their relationship to their function |
Metabolomics | Large-scale study of small molecules, i.e. metabolites, within cells, biofluids, tissues or organisms |
Neurobiology and neuroinformatics | Computational models and tools for sharing, integration, and analysis of experimental data related to the nervous system |
Follow the Bio-IT blog to get updates on how our small glossary will be used!
Photo by Markus Spiske on Unsplash