3C-CoDash EMBL Event 2026
Connect, Collaborate, Commit
**28-30 January 2026, EMBL, Heidelberg and online **
Hello World! The Data Science Community would like to invite you to the Data Science 3C-CoDash (28.01.2026 to 30.01.2026) – to help, bring people together and build cool things.
Any EMBL biological data science projects are welcome – whether it’s a small project or a large idea, we are here to provide the space and time to work together with great people.
During the project proposal phase, we collected proposals in the following tracks:
- Training and support
- Scientific workflows
- Scientific services
- Standards and community
We have received 10 proposals listed below, which are now open for registration until the 12/12/2025 !
Event Details and Timeline
📅 Registration opens and Projects are published: 14/11/2025
📅 3C-CoDash from 28/01/2026 to 30/01/2026
📍 Location: EMBL Heidelberg + Virtual
🕑 Format: Hybrid (most projects will run both on-site and online)
💡 Cost: participation is free of charge
🍽️ Refreshments will be provided, but please be aware that we are likely not able to cover lunches and dinner.
Schedule
All times are indicated for CET time zone.
Wednesday, 28 January
| Time | Event | Location |
|---|---|---|
| 10:30 | Arrival and Coffee | Operon foyer |
| 11:00 | Opening session - Opening remarks: Laurent Thomas - Presentation of the EMBL Data Science Center: Lisanna Paladin | Large operon |
| 11:30 | Project presentations | Large operon |
| 13:00 | Lunch | EMBL canteen |
| 14:00 | Workshop sessions | |
| 15:30 | Coffee break | Operon foyer |
| 16:00-17:30 | Workshop sessions |
Thursday, 29 January
| Time | Event | Location |
|---|---|---|
| 9:30 | Workshop sessions | |
| 11:00 | Coffee break | Operon foyer |
| 11:30 | Workshop sessions | |
| 13:00 | Lunch | EMBL canteen |
| 14:00 | Workshop sessions | |
| 15:30 | Coffee break | Operon foyer |
| 16:00 | Workshop sessions | |
| 17:30-20:00 | Get together / Pizza, beer and antipasti | Operon foyer |
Friday, 30 January
| Time | Event | Location |
|---|---|---|
| 9:30 | Wrap-up session: Project presentations (Chair: Sarah) | Large operon |
| 11:00 | Coffee break | Operon foyer |
| 11:30 | Wrap-up session: Project presentations + Closing remarks (Sarah) | Large operon |
| 13:00 | Lunch |
Projects
Project Title
Online genome browsers for non-model organisms
Project lead
Cyril Cros
Format
Hybrid
Project Description
Model organisms have critical online database (Flybase, Wormbase etc) that are now getting consolidated within Alliance of genome resources. They require manpower which is not available for non model organisms. In turn non model organisms get more annoying to work with because they lack such resources. Some efforts exist like Molluscdb, Hymenoptera Genome Database, but they are limited or use older tools. Projects like Ensembl( https://beta.ensembl.org/ ) help address this, but they work on more established species. They also don’t allow you to easily add a lot of genomic tracks. I was frustrated with this state of thing coming from Wormbase, and as a side project I have started looking into making my own database for my marine worms. What I propose is to use recent technologies to make it easier to deploy and self host a fixed set of resources for any set of species of interest (or taxon). They would be an online genome browser (JBrowse2), an annotation server (Apollo2), BLAST databases (SequenceServer / DIAMOND). I have played around so far with Nextflow to help process the kind of resources needed and upload them to EMBL S3. I then use Kubernetes to deploy some of those (JBrowse2 and SequenceServer so far). A similar project would be https://www.sanger.ac.uk/tool/genomehubs/ but they try to cover a broad set of species. I would rather cover a few species but make an effort to integrate existing genomic resources (RNAseq, ATAC, HiC…).
Key Goals / Expected Outcomes
-
Build a Nextflow workflow that takes in an assembly, annotation, and a set of already processed genomic tracks (RNAseq, ATAC, HiC, SNPs).
I was using an assembly CSV sample sheet and a genomics tracks sample sheet.
Crucially, we should be able to do multiple assemblies for the same species and multiple related species. They are uploaded to a S3 server. -
Build a Helm Chart to deploy the relevant set of genomic resources with JBrowse2, Sequence Server (with search links back to JBrowse2), Apollo2, and any other resource of interest.
Maybe a landing page like this showing the list of species and assembly statistics. -
Look into pipelines to build on top of that, that do actual data processing. That could be functional annotation, annotationg transfer (TOGA / LiftOff), whole genome alignments.
Background / Resources
Started something at https://git.embl.org/grp-arendt/jbrowse2-annelids
Running demo at https://genomes.arendt.embl.de/ and https://blast-arendt.embl.de/
- https://jbrowse.org/jb2/ß
- https://sequenceserver.com/
- https://blobtoolkit.genomehubs.org/
Skills Needed
- Nextflow (Groovy) / Kubernetes
- Some interest in genome assembly / functional annotation
Project Title
Modernizing Python Data Science: Pandas + Seaborn Upgrade for Software Carpentry
Project lead
Francesco Tabaro
Format
hybrid
Project Description
This project will update the widely-used Software Carpentry “Python for Data Analysis” curriculum to reflect current best practices in the Python data science ecosystem. The existing lesson relies heavily on numpy arrays and basic matplotlib for data manipulation and visualization, but modern data science workflows have evolved toward more intuitive, powerful tools.
Key Goals / Expected Outcomes
Technical Modernization
- To replace numpy array manipulation with pandas DataFrame operations for more intuitive data handling.
- To upgrade basic matplotlib plots to seaborn’s statistical visualizations for better insights and aesthetics
Educational Enhancement
- Reduce learning curve through pandas’ readable, English-like syntax
- Teach data cleaning, merging, and preprocessing skills essential for real-world projects Align curriculum with tools students will encounter in real-life environments
Practical Application
- Enable immediate application of learned skills to students’ own datasets
- Create more engaging, visually compelling examples that maintain student interest
Background / Resources
We will start from the current version of the lessons.
Skills Needed
Knowledge of Pandas and Seaborn Python packages is required.
Project Title
Easy EESSI at EMBL
Project lead
Renato Alves
Format
hybrid
Project Description
Bioinformatic software, and in particular poorly packaged software, remains a pain-point for all users, inside and outside EMBL, and especially so to bioinformatic novices. The once convenient “conda” approach is now a minefield of restrictions and licensing. On the other end of the spectrum, containers are convenient but bulky, requiring increasingly larger storage and being no easier to manage when versions change. To make this landscape more complicated, EMBL offers two main types of computing environments based on different Linux distributions (Rocky and Ubuntu) that, while similar, are sufficiently different to make switching between them a non-trivial step.
The EESSI project, a worldwide scientific packaging project, promises to address all the above while simultaneously working towards more reproducible biocomputational science. Its adoption at EMBL is however still limited, mainly by the currently available software. EESSI will be the primary and default source of software in the next iteration of the EMBL HPC cluster. With this project proposal we plan to package some commonly used software at EMBL and simultaneously use this opportunity to increase familiarity with the tools and steps to make software available globally through EESSI. As EMBL’s setup is particular, we will also document our progress along the way.
Key Goals / Expected Outcomes
Increasing local knowledge on EESSI and its ecosystem of tools. Packaging at least one (painful) bio-computing software and all required dependencies. Documentation on particularities of using EESSI at EMBL with a focus on onboarding of novices.
Background / Resources
An environment with Enterprise Linux 10 (Rocky 10) will be made available before and during the hackathon to serve as sandbox for experimentation and build tests. We encourage anyone interested in this project to get familiarised with https://www.eessi.io/ and/or connect with the project’s SLACK.
Skills Needed
Basic knowledge of software installation in UNIX/Linux is recommended. Familiarity with at least one of EMBL’s computing environments (HPC cluster, Jupyterhub, BARD, RStudio, Seneca, …) is beneficial. If you know or regularly use building toolkits (make, ebuild, cmake, portage, ninja, gradle, rake, nix, …) we need you :) If you are a heavy workflows user and are around, we also want to talk to you. Knowledge of different generations of CPUs and GPUs may prove useful.
Project Title
A shinyApp for user-friendly analysis of biological count data
Project lead
Felix Schneider
Format
hybrid
Project Description
Biologists at EMBL often produce count data, where events fall into certain categories and these events are counted. Usually, this kind of data is analyzed with a Fisher test. However, biological data often is collected in replicates, for which the Fisher test may not be ideal. Alternative methods such as mixed models exist, but are not appreciated by biologists due to their perceived complexity. In this project, we aim to offer a user-friendly tool that allows biologists to analyze their data using advanced methods, without having to code up GLMMs manually. We have started work on a shinyApp, which reads tabular data and metadata, visualizes them, runs a model and returns easy-to-interpret results, as well as publication-ready statements. For the CoDash event, we have several tasks for participants that address different areas of expertise and levels of difficulty, spanning The implementation of features in the app, such as improved data reading and checking, data visualization, data analysis, report generation Optimizing the theoretical framework: a decision tree that maps experimental designs to suitable statistical models Documentation of the app Identify relevant biological data sets and design test cases Visual design and naming the app Deployment (Containerization (Docker/Podman), K8s, Standalone)
Some of these tasks are also suitable for people who are relatively new to the respective technologies.
Key Goals / Expected Outcomes
A functional shinyApp that will be deployed at EMBL and be available for all EMBL, possibly beyond
Background / Resources
We have a repository with a prototype: https://git.embl.org/kaspar/count-data-project
Skills Needed
R coding, shiny app development, git, statistics, containerization, visual design, writing skills. Different skills are needed for different tasks.
Project Title
DocBot: An LLM-powered natural language interface to pan-EMBL documentation
Project lead
Vijay Venkatesh Subramoniam
Format
Hybrid
Project Description
DocBot is an EBI-wide chatbot designed to provide streamlined access to user documentation across projects. Powered by large language models (LLMs), Docbot assists users by answering queries based on the specific documentation available for a given project(wwwint.ebi.ac.uk/docbot) (EBI intranet only at the moment).
DocBot already works as a prototype at EMBL-EBI, with support for:
- multiple LLM backends (Google Gemini and AWS)
- Project-specific or global documentation search
- Conversation history stored locally in the browser (no user data stored server-side), transparent citation of documentation sections used to generate the answer.
Key Goals / Expected Outcomes
-
Onboard at least one EMBL Heidelberg resource (preferably a Core Facility) into DocBot. Improve onboarding workflow (provision documentation + test queries + sample answers) so any project can add documentation using a simple template.
-
Produce a small set of deliverables for the community: A working chatbot instance with Heidelberg documentation included A reusable onboarding guide (“How to add your project documentation to DocBot”) A short tutorial / blog post or demo video
Stretch goals:
Optional integration into partner websites (e.g., Core Facility pages)
Success criterion: by the end of CoDash, Core Facilities (or other Heidelberg projects) can independently onboard their documentation.
Background / Resources
- Working prototype: wwwint.ebi.ac.uk/docbot (EBI intranet only at the moment).
- Documentation already ingested for Pride, ENA, UniProt, Ensembl and EuropePMC
- Codebase: https://gitlab.ebi.ac.uk/vsubramoniam/docbot_backend, https://gitlab.ebi.ac.uk/ebi-wp/docbot-scrapy, https://gitlab.ebi.ac.uk/ebi-wp/docbot_frontend
Skills Needed
Note: The following are possible areas of activity, not required skills:
- React/Typescript frontend: UI improvements (project/LLM switching, feedback)
- Python/Node backend: Documentation ingestion workflow, vector DB automation
- RAG / embeddings / LLM tuning: Improve retrieval quality
- Documentation / tech writing: Creating onboarding guide and tutorial
- Data resource / service maintainers (from EMBL sites): Provide documentation + test queries
Project Title
Protocol Hub for Life Sciences: Central Resource for Standardized Protocols for Cell and Developmental Biology
Project lead
Mirsana Ebrahim Kutty
Format
hybrid
Project Description
Life science field has a reproducibility crisis. The insufficient description of methods leads to a huge loss in time and effort for people when they try to reproduce data or method. To partially circumvent/ mitigate this, I propose to create a resource protocol hub for lifesciences , which will host templates for common and established protocols used in cell and developmental biology. Deviating from the classical approach, the protocol templates can be created through data mining into the method section of open access articles using large language models. This will compile and then integrate key details required for a specific protocols by identifying key and recurring words and then propose a template , which will serve as an outline or minimum requirement / descriptor for the protocol. This basic outline will serve as the central node node and variations of the specific protocols ( such as from various labs) can be visualised as additional nodes. Furthermore, the basic structure will be downloadable as a template and can be later accessed for scientific publishing of articles, which can eventually lead to both standardisation of protocol formats and provide key parameters required for the reproducibility. In long term, this could also be used as a template for protocol description in the method sections of the articles, either by linking to the protocol hub or simply by reducing time and effort to write down the method section. Additionally, this could also serve as a historical record for the evolution of these methods – as they evolve and replace others. Novelty: This kind of platform doesn’t yet exist to the best of my knowledge. What will this achieve?
- One place to find all the protocols .
- Enhanced reproducibility of the research
- Decreased optimization time for a novel researcher
- The historical record of a method development .
Once this is established , you could have it as journal practice to link your method section to these maintained protocol hub. This can reduce the time required to write an article and have a better use of the time in general.
Key Goals / Expected Outcomes
- Prepare a unified descriptor / minimum requirements for basic techniques / protocols using data mining approaches
- Visualise variations of these existing protocols – various labs, reagents…
- Sortable by year , method type etc…
- Downloadable template for method description for a research article to increase reproducibility
Background / Resources
Proposed/ possible method: Accessing open access journal methods to find all the protocols listed related to a method. This is then followed by a combining them to create structure to find the recurring themes / concepts. Once this is mined , then one can create a basic protocol to create a template with the essential details for the description as the first level. This can then be followed by a second layer to group protocols by one / two predominant labs as multiple versions of records. This will allow the researcher to have an idea about the method / minimum requirements. Once this is established then this can be used as a community standard for one particular protocol.
Skills Needed
- Experience with LLM
- Experience with lifescience method / text
- Data visulaization methods
- Biology expertise
- Statistics
Project Title
Brillouin hyperspectral images storage, visualisation and analysis
Project lead
Carlo Bevilacqua and Sebastian Hambura
Format
hybrid
Project Description
We aim to establish a standardized file format and accompanying software for the visualization and analysis of Brillouin microscopy images. Brillouin microscopy is a hyperspectral imaging technique, where each voxel in the image is associated with a full spectrum. To support this, we have proposed an initial version of a dedicated file format, named .brim (Brillouin imaging), built on top of Zarr. We have also developed a web application, BrimView, for visualizing and analyzing these data. BrimView is implemented using the Panel (HoloViz) library, which provides powerful tools for data visualization and it is written in Python, a widely used language in the scientific community. The current software is already a working solution, but could be improved in 3 directions:
- Brillouin data processing: GPU function fitting, ROI selection and analysis, custom python processing scripts
- User experience: handling datasets containing multiple imaging modalities, nicely handling data editing of local files from the browser, writing a nice user guide / introduction tutorial
- Interaction of Brimview and .brim file format with existing tools: ImageJ, REMBI, OME-Zarr
Key Goals / Expected Outcomes
It would be helpful to get input on any of the topics listed above. Specifically:
- how to implement data fitting on the GPU, so that it is accessible both in Python and WASM? Is WebGPU a good option?
- which tools already exist in Python to do simple ROI selection and analysis?
- how to allow users to input their own analysis script (or fitting function) without risking code injection?
- how to store multiple modalities with corresponding registration?
- are there tools to help writing documentation for the GUI?
- how to write an ImageJ plugin using the existing Python code?
Lastly, expert feedback on the structure of the .brim file and/or general user feedback on the usability of BrimView would also benefit us.
Background / Resources
More details about the project are in our recent preprint. The web app is online at brimview.embl.org and its source code is on GitHub.
Skills Needed
Python, WebDev, Panel (Holoviz) UI/UX, Microscopy data visualization, WASM, User guide writing, Brillouin Microscopy/Hyperspectral microscopy
Project Title
A dream Electronic Lab Notebook
Project lead
Matthias Monfort
Format
in person
Project Description
Develop a user-centered specification of the dream Electronic Lab Notebook (ELN)
Key Goals / Expected Outcomes
Define the essential functionalities an ELN would need to have to satisfy the daily note-taking work flow of EMBL scientists.
Tasks:
- Explore how different scientists have established their own note-taking work flow to document their work.
- Formalize the different entities involved in a scientific note-taking task (projects, experiments, etc.)
- Formalize the different processes involved in such a task. For example:
a. Writing in an office-like document
b. Taking pictures and uploading images - Rank entities and processes (essential, good-to-have, non-essential).
- Create a mock-up of this ideal ELN.
Background / Resources
This workshop aims at taking a broader perspective on how researchers take notes to document their work. We open the floor to explore diverse possibilities and gather insights that can shape more effective, user-centered solutions for note-taking in science.
The document generated as part of this workshop is of general interest, but also specifically is of interest to LabID developers. LabID (labid.embl.org/docs) is a FAIR research data-management web platform that embeds an ELN. In order to guide the future development of its ELN and ensure it meets the evolving needs of researchers, it is essential to first understand what scientists truly require—and value—when documenting their experiments.
Skills Needed
- No specific skill is required for the brainstorming part.
- UI/UX knowledge welcome
Project Title
FAIR imaging data management with OME-Zarr, REMBI and LabID - from image conversion to visualization, processing and publication
Project lead
Laurent Thomas
Format
hybrid
Project Description
OME-Zarr is gaining popularity in microscopy as a format to store large images or collections of images, in the cloud or on distributed storage systems. Besides its efficient data access, it supports extended metadata, similar to its predecessor the OME-Tiff. Key applications include not only large datasets (~terabyte scale), but also highly multidimensional data.
This workshop has two complementary components to be run in parallel:
- training participants how to convert to and use OME-Zarr format
- building a simple user interface for an existing conversion tool.
In part 1, participants will have the opportunity to experiment with this new format, and see how they can integrate it in their data management practices. Workshop participants are encouraged to bring a small image dataset of their own, that they would like to convert to OME-Zarr. The workshop will be performed on two cloud-based desktop environments for image analysis developed at the EMBL, BARD (for internal users) and BAND (for external users). We will start by providing a brief introduction to this platform and the OME-Zarr specification. Then we will introduce conversion tools such as BatchConvert or EuBI-Bridge to convert image datasets to OME-Zarr. Participants will be assisted with converting their datasets using these tools. We will also show how to visualize images saved in this format, and demonstrate a small example of analysis workflow / programmatic access to the data. We will discuss how to make the imaging data FAIR (Findable Accessible Interoperable Reusable) by providing REMBI (REcommended Metadata for Biological Imaging) compliant metadata. Finally we will see how the conversion workflow and the associated data can be registered in LabID, the on-site data management system available at the EMBL, and how to make it ready for submission to BioImage Archive. The following BioImage Archive submission will be used as a show case: https://www.ebi.ac.uk/biostudies/bioimages/studies/S-BIAD2258.
In part 2, we will build a simple point and click user interface for the EuBI-Bridge conversion tool. Options for this include a “terminal-based user interface” (TUI), or a plug-in for the python based Napari image viewer. Such a UI should increase accessibility of the conversion tool to non-expert users, and facilitate adoption of the OME-Zarr format.
Key Goals / Expected Outcomes
- Learn about OME-Zarr and associated metadata specification
- Convert your own data to OME-Zarr
- Learn how to visualize and use OME-Zarr in a real-case scenario
- Learn how LabID can be used to register the conversion workflow and associated data
- Learn how to prepare the data for submission to BioImage Archive
- Build a simple UI for an OME-Zarr conversion tool
Background / Resources
Participants are expected to have a background in microscopy or image analysis, and are encouraged to bring a small example dataset they want to convert.
Participants can experiment with the conversion tools on their own computer, but to avoid spending time on setup, we will favor using the online environments (BAND/BARD).
Prior to the event, the participants will be encouraged to watch this recent webinar on the OME-Zarr topic: https://www.ebi.ac.uk/training/events/towards-open-and-standardised-imaging-data-introduction-bio-formats-ome-tiff-and-ome-zarr/.
Finally, discussion around other examples of BioImage Archive submissions is also welcome.
Skills Needed
No programming skills or prior experience with OME-Zarr is required. Experienced OME-Zarr users are also welcome to share their expertise.
Project Title
Image Component Integration for Depictio
Project lead
Thomas Weber
Format
hybrid
Project Description
Create an interactive image viewer component for Depictio dashboards that handles images efficiently. Users should be able to pan, zoom views while interacting with quantitative data visualizations. The project will explore modern web-based image visualization approaches and determine the optimal technical stack during implementation.
Key Goals / Expected Outcomes
- Create working image viewer prototype using modern web technologies
- Establish data conversion pipeline for pyramidal/tiled formats
- Integrate reusable Dash component within Depictio
- Document implementation and provide usage examples
Background / Resources
Potential approaches to explore:
- Viv/Vizarr (optimized for scientific imaging with multi-channel support)
- OpenSeadragon (widely-used deep zoom viewer)
- Leaflet/OpenLayers (map libraries adapted for images)
Key concepts:
- Pyramidal/multi-resolution tiling strategies
- Image formats: PNG, JPG, TIFF, OME-TIFF, Zarr, Deep Zoom Images (DZI)
- Bio-Formats and VIPS for format conversion
- Depictio repository
Skills Needed
Essential: Python (intermediate), JavaScript (intermediate), web development basics Helpful: Plotly Dash, image processing concepts, React components Bonus: Experience with tiling systems, web-based viewers, or large-scale data handling
Registration
Thank you for your interest in joining the Data Science 3C-CoDash! 🎉
👉 Registration is now open here until 12/12/2025.
You can choose up to 3 workshops, by order of preference, and we will try our best to assign people to their favorite choice.
Participants will be notified of their assignment on 19/12/2025.
Contact
If you have any questions about the Hackathon, or anything else Bio-IT-related, please contact bio-it@embl.de.