3C-CoDash EMBL Event 2026

Connect, Collaborate, Commit

**28-30 January 2026, EMBL, Heidelberg and online **

Hello World! The Data Science Community would like to invite you to the Data Science 3C-CoDash (28.01.2026 to 30.01.2026) – to help, bring people together and build cool things.
Any EMBL biological data science projects are welcome – whether it’s a small project or a large idea, we are here to provide the space and time to work together with great people.

During the project proposal phase, we collected proposals in the following tracks:

  1. Training and support
  2. Scientific workflows
  3. Scientific services
  4. Standards and community

We have received 10 proposals listed below, which are now open for registration until the 12/12/2025 !

Event Details and Timeline

📅 Registration opens and Projects are published: 14/11/2025

📅 3C-CoDash from 28/01/2026 to 30/01/2026

📍 Location: EMBL Heidelberg + Virtual

🕑 Format: Hybrid (most projects will run both on-site and online)

💡 Cost: participation is free of charge

🍽️ Refreshments will be provided, but please be aware that we are likely not able to cover lunches and dinner.

Schedule

All times are indicated for CET time zone.

Wednesday, 28 January

Time Event Location
10:30 Arrival and Coffee Operon foyer
11:00 Opening session - Opening remarks: Laurent Thomas - Presentation of the EMBL Data Science Center: Lisanna Paladin Large operon
11:30 Project presentations Large operon
13:00 Lunch EMBL canteen
14:00 Workshop sessions  
15:30 Coffee break Operon foyer
16:00-17:30 Workshop sessions  

Thursday, 29 January

Time Event Location
9:30 Workshop sessions  
11:00 Coffee break Operon foyer
11:30 Workshop sessions  
13:00 Lunch EMBL canteen
14:00 Workshop sessions  
15:30 Coffee break Operon foyer
16:00 Workshop sessions  
17:30-20:00 Get together / Pizza, beer and antipasti Operon foyer

Friday, 30 January

Time Event Location
9:30 Wrap-up session: Project presentations (Chair: Sarah) Large operon
11:00 Coffee break Operon foyer
11:30 Wrap-up session: Project presentations + Closing remarks (Sarah) Large operon
13:00 Lunch  

Projects

Project Title

Online genome browsers for non-model organisms

Project lead

Cyril Cros

Format

Hybrid

Project Description

Model organisms have critical online database (Flybase, Wormbase etc) that are now getting consolidated within Alliance of genome resources. They require manpower which is not available for non model organisms. In turn non model organisms get more annoying to work with because they lack such resources. Some efforts exist like Molluscdb, Hymenoptera Genome Database, but they are limited or use older tools. Projects like Ensembl( https://beta.ensembl.org/ ) help address this, but they work on more established species. They also don’t allow you to easily add a lot of genomic tracks. I was frustrated with this state of thing coming from Wormbase, and as a side project I have started looking into making my own database for my marine worms. What I propose is to use recent technologies to make it easier to deploy and self host a fixed set of resources for any set of species of interest (or taxon). They would be an online genome browser (JBrowse2), an annotation server (Apollo2), BLAST databases (SequenceServer / DIAMOND). I have played around so far with Nextflow to help process the kind of resources needed and upload them to EMBL S3. I then use Kubernetes to deploy some of those (JBrowse2 and SequenceServer so far). A similar project would be https://www.sanger.ac.uk/tool/genomehubs/ but they try to cover a broad set of species. I would rather cover a few species but make an effort to integrate existing genomic resources (RNAseq, ATAC, HiC…).

Key Goals / Expected Outcomes

  1. Build a Nextflow workflow that takes in an assembly, annotation, and a set of already processed genomic tracks (RNAseq, ATAC, HiC, SNPs).
    I was using an assembly CSV sample sheet and a genomics tracks sample sheet.
    Crucially, we should be able to do multiple assemblies for the same species and multiple related species. They are uploaded to a S3 server.

  2. Build a Helm Chart to deploy the relevant set of genomic resources with JBrowse2, Sequence Server (with search links back to JBrowse2), Apollo2, and any other resource of interest.
    Maybe a landing page like this showing the list of species and assembly statistics.

  3. Look into pipelines to build on top of that, that do actual data processing. That could be functional annotation, annotationg transfer (TOGA / LiftOff), whole genome alignments.

Background / Resources

Started something at https://git.embl.org/grp-arendt/jbrowse2-annelids
Running demo at https://genomes.arendt.embl.de/ and https://blast-arendt.embl.de/

  • https://jbrowse.org/jb2/ß
  • https://sequenceserver.com/
  • https://blobtoolkit.genomehubs.org/

Skills Needed

  • Nextflow (Groovy) / Kubernetes
  • Some interest in genome assembly / functional annotation

Project Title

Modernizing Python Data Science: Pandas + Seaborn Upgrade for Software Carpentry

Project lead

Francesco Tabaro

Format

hybrid

Project Description

This project will update the widely-used Software Carpentry “Python for Data Analysis” curriculum to reflect current best practices in the Python data science ecosystem. The existing lesson relies heavily on numpy arrays and basic matplotlib for data manipulation and visualization, but modern data science workflows have evolved toward more intuitive, powerful tools.

Key Goals / Expected Outcomes

Technical Modernization

  1. To replace numpy array manipulation with pandas DataFrame operations for more intuitive data handling.
  2. To upgrade basic matplotlib plots to seaborn’s statistical visualizations for better insights and aesthetics

Educational Enhancement

  • Reduce learning curve through pandas’ readable, English-like syntax
  • Teach data cleaning, merging, and preprocessing skills essential for real-world projects Align curriculum with tools students will encounter in real-life environments

Practical Application

  • Enable immediate application of learned skills to students’ own datasets
  • Create more engaging, visually compelling examples that maintain student interest

Background / Resources

We will start from the current version of the lessons.

Skills Needed

Knowledge of Pandas and Seaborn Python packages is required.

Project Title

Easy EESSI at EMBL

Project lead

Renato Alves

Format

hybrid

Project Description

Bioinformatic software, and in particular poorly packaged software, remains a pain-point for all users, inside and outside EMBL, and especially so to bioinformatic novices. The once convenient “conda” approach is now a minefield of restrictions and licensing. On the other end of the spectrum, containers are convenient but bulky, requiring increasingly larger storage and being no easier to manage when versions change. To make this landscape more complicated, EMBL offers two main types of computing environments based on different Linux distributions (Rocky and Ubuntu) that, while similar, are sufficiently different to make switching between them a non-trivial step.

The EESSI project, a worldwide scientific packaging project, promises to address all the above while simultaneously working towards more reproducible biocomputational science. Its adoption at EMBL is however still limited, mainly by the currently available software. EESSI will be the primary and default source of software in the next iteration of the EMBL HPC cluster. With this project proposal we plan to package some commonly used software at EMBL and simultaneously use this opportunity to increase familiarity with the tools and steps to make software available globally through EESSI. As EMBL’s setup is particular, we will also document our progress along the way.

Key Goals / Expected Outcomes

Increasing local knowledge on EESSI and its ecosystem of tools. Packaging at least one (painful) bio-computing software and all required dependencies. Documentation on particularities of using EESSI at EMBL with a focus on onboarding of novices.

Background / Resources

An environment with Enterprise Linux 10 (Rocky 10) will be made available before and during the hackathon to serve as sandbox for experimentation and build tests. We encourage anyone interested in this project to get familiarised with https://www.eessi.io/ and/or connect with the project’s SLACK.

Skills Needed

Basic knowledge of software installation in UNIX/Linux is recommended. Familiarity with at least one of EMBL’s computing environments (HPC cluster, Jupyterhub, BARD, RStudio, Seneca, …) is beneficial. If you know or regularly use building toolkits (make, ebuild, cmake, portage, ninja, gradle, rake, nix, …) we need you :) If you are a heavy workflows user and are around, we also want to talk to you. Knowledge of different generations of CPUs and GPUs may prove useful.

Project Title

A shinyApp for user-friendly analysis of biological count data

Project lead

Felix Schneider

Format

hybrid

Project Description

Biologists at EMBL often produce count data, where events fall into certain categories and these events are counted. Usually, this kind of data is analyzed with a Fisher test. However, biological data often is collected in replicates, for which the Fisher test may not be ideal. Alternative methods such as mixed models exist, but are not appreciated by biologists due to their perceived complexity. In this project, we aim to offer a user-friendly tool that allows biologists to analyze their data using advanced methods, without having to code up GLMMs manually. We have started work on a shinyApp, which reads tabular data and metadata, visualizes them, runs a model and returns easy-to-interpret results, as well as publication-ready statements. For the CoDash event, we have several tasks for participants that address different areas of expertise and levels of difficulty, spanning The implementation of features in the app, such as improved data reading and checking, data visualization, data analysis, report generation Optimizing the theoretical framework: a decision tree that maps experimental designs to suitable statistical models Documentation of the app Identify relevant biological data sets and design test cases Visual design and naming the app Deployment (Containerization (Docker/Podman), K8s, Standalone)

Some of these tasks are also suitable for people who are relatively new to the respective technologies.

Key Goals / Expected Outcomes

A functional shinyApp that will be deployed at EMBL and be available for all EMBL, possibly beyond

Background / Resources

We have a repository with a prototype: https://git.embl.org/kaspar/count-data-project

Skills Needed

R coding, shiny app development, git, statistics, containerization, visual design, writing skills. Different skills are needed for different tasks.

Project Title

DocBot: An LLM-powered natural language interface to pan-EMBL documentation

Project lead

Vijay Venkatesh Subramoniam

Format

Hybrid

Project Description

DocBot is an EBI-wide chatbot designed to provide streamlined access to user documentation across projects. Powered by large language models (LLMs), Docbot assists users by answering queries based on the specific documentation available for a given project(wwwint.ebi.ac.uk/docbot) (EBI intranet only at the moment).

DocBot already works as a prototype at EMBL-EBI, with support for:

  1. multiple LLM backends (Google Gemini and AWS)
  2. Project-specific or global documentation search
  3. Conversation history stored locally in the browser (no user data stored server-side), transparent citation of documentation sections used to generate the answer.

Key Goals / Expected Outcomes

  1. Onboard at least one EMBL Heidelberg resource (preferably a Core Facility) into DocBot. Improve onboarding workflow (provision documentation + test queries + sample answers) so any project can add documentation using a simple template.

  2. Produce a small set of deliverables for the community: A working chatbot instance with Heidelberg documentation included A reusable onboarding guide (“How to add your project documentation to DocBot”) A short tutorial / blog post or demo video

Stretch goals:

Optional integration into partner websites (e.g., Core Facility pages)

Success criterion: by the end of CoDash, Core Facilities (or other Heidelberg projects) can independently onboard their documentation.

Background / Resources

  1. Working prototype: wwwint.ebi.ac.uk/docbot (EBI intranet only at the moment).
  2. Documentation already ingested for Pride, ENA, UniProt, Ensembl and EuropePMC
  3. Codebase: https://gitlab.ebi.ac.uk/vsubramoniam/docbot_backend, https://gitlab.ebi.ac.uk/ebi-wp/docbot-scrapy, https://gitlab.ebi.ac.uk/ebi-wp/docbot_frontend

Skills Needed

Note: The following are possible areas of activity, not required skills:

  1. React/Typescript frontend: UI improvements (project/LLM switching, feedback)
  2. Python/Node backend: Documentation ingestion workflow, vector DB automation
  3. RAG / embeddings / LLM tuning: Improve retrieval quality
  4. Documentation / tech writing: Creating onboarding guide and tutorial
  5. Data resource / service maintainers (from EMBL sites): Provide documentation + test queries

Project Title

Protocol Hub for Life Sciences: Central Resource for Standardized Protocols for Cell and Developmental Biology

Project lead

Mirsana Ebrahim Kutty

Format

hybrid

Project Description

Life science field has a reproducibility crisis. The insufficient description of methods leads to a huge loss in time and effort for people when they try to reproduce data or method. To partially circumvent/ mitigate this, I propose to create a resource protocol hub for lifesciences , which will host templates for common and established protocols used in cell and developmental biology. Deviating from the classical approach, the protocol templates can be created through data mining into the method section of open access articles using large language models. This will compile and then integrate key details required for a specific protocols by identifying key and recurring words and then propose a template , which will serve as an outline or minimum requirement / descriptor for the protocol. This basic outline will serve as the central node node and variations of the specific protocols ( such as from various labs) can be visualised as additional nodes. Furthermore, the basic structure will be downloadable as a template and can be later accessed for scientific publishing of articles, which can eventually lead to both standardisation of protocol formats and provide key parameters required for the reproducibility. In long term, this could also be used as a template for protocol description in the method sections of the articles, either by linking to the protocol hub or simply by reducing time and effort to write down the method section. Additionally, this could also serve as a historical record for the evolution of these methods – as they evolve and replace others. Novelty: This kind of platform doesn’t yet exist to the best of my knowledge. What will this achieve?

  1. One place to find all the protocols .
  2. Enhanced reproducibility of the research
  3. Decreased optimization time for a novel researcher
  4. The historical record of a method development .

Once this is established , you could have it as journal practice to link your method section to these maintained protocol hub. This can reduce the time required to write an article and have a better use of the time in general.

Key Goals / Expected Outcomes

  1. Prepare a unified descriptor / minimum requirements for basic techniques / protocols using data mining approaches
  2. Visualise variations of these existing protocols – various labs, reagents…
  3. Sortable by year , method type etc…
  4. Downloadable template for method description for a research article to increase reproducibility

Background / Resources

Proposed/ possible method: Accessing open access journal methods to find all the protocols listed related to a method. This is then followed by a combining them to create structure to find the recurring themes / concepts. Once this is mined , then one can create a basic protocol to create a template with the essential details for the description as the first level. This can then be followed by a second layer to group protocols by one / two predominant labs as multiple versions of records. This will allow the researcher to have an idea about the method / minimum requirements. Once this is established then this can be used as a community standard for one particular protocol.

Skills Needed

  • Experience with LLM
  • Experience with lifescience method / text
  • Data visulaization methods
  • Biology expertise
  • Statistics

Project Title

Brillouin hyperspectral images storage, visualisation and analysis

Project lead

Carlo Bevilacqua and Sebastian Hambura

Format

hybrid

Project Description

We aim to establish a standardized file format and accompanying software for the visualization and analysis of Brillouin microscopy images. Brillouin microscopy is a hyperspectral imaging technique, where each voxel in the image is associated with a full spectrum. To support this, we have proposed an initial version of a dedicated file format, named .brim (Brillouin imaging), built on top of Zarr. We have also developed a web application, BrimView, for visualizing and analyzing these data. BrimView is implemented using the Panel (HoloViz) library, which provides powerful tools for data visualization and it is written in Python, a widely used language in the scientific community. The current software is already a working solution, but could be improved in 3 directions:

  1. Brillouin data processing: GPU function fitting, ROI selection and analysis, custom python processing scripts
  2. User experience: handling datasets containing multiple imaging modalities, nicely handling data editing of local files from the browser, writing a nice user guide / introduction tutorial
  3. Interaction of Brimview and .brim file format with existing tools: ImageJ, REMBI, OME-Zarr

Key Goals / Expected Outcomes

It would be helpful to get input on any of the topics listed above. Specifically:

  1. how to implement data fitting on the GPU, so that it is accessible both in Python and WASM? Is WebGPU a good option?
  2. which tools already exist in Python to do simple ROI selection and analysis?
  3. how to allow users to input their own analysis script (or fitting function) without risking code injection?
  4. how to store multiple modalities with corresponding registration?
  5. are there tools to help writing documentation for the GUI?
  6. how to write an ImageJ plugin using the existing Python code?

Lastly, expert feedback on the structure of the .brim file and/or general user feedback on the usability of BrimView would also benefit us.

Background / Resources

More details about the project are in our recent preprint. The web app is online at brimview.embl.org and its source code is on GitHub.

Skills Needed

Python, WebDev, Panel (Holoviz) UI/UX, Microscopy data visualization, WASM, User guide writing, Brillouin Microscopy/Hyperspectral microscopy

Project Title

A dream Electronic Lab Notebook

Project lead

Matthias Monfort

Format

in person

Project Description

Develop a user-centered specification of the dream Electronic Lab Notebook (ELN)

Key Goals / Expected Outcomes

Define the essential functionalities an ELN would need to have to satisfy the daily note-taking work flow of EMBL scientists.

Tasks:

  1. Explore how different scientists have established their own note-taking work flow to document their work.
  2. Formalize the different entities involved in a scientific note-taking task (projects, experiments, etc.)
  3. Formalize the different processes involved in such a task. For example:
    a. Writing in an office-like document
    b. Taking pictures and uploading images
  4. Rank entities and processes (essential, good-to-have, non-essential).
  5. Create a mock-up of this ideal ELN.

Background / Resources

This workshop aims at taking a broader perspective on how researchers take notes to document their work. We open the floor to explore diverse possibilities and gather insights that can shape more effective, user-centered solutions for note-taking in science.

The document generated as part of this workshop is of general interest, but also specifically is of interest to LabID developers. LabID (labid.embl.org/docs) is a FAIR research data-management web platform that embeds an ELN. In order to guide the future development of its ELN and ensure it meets the evolving needs of researchers, it is essential to first understand what scientists truly require—and value—when documenting their experiments.

Skills Needed

  • No specific skill is required for the brainstorming part.
  • UI/UX knowledge welcome

Project Title

FAIR imaging data management with OME-Zarr, REMBI and LabID - from image conversion to visualization, processing and publication

Project lead

Laurent Thomas

Format

hybrid

Project Description

OME-Zarr is gaining popularity in microscopy as a format to store large images or collections of images, in the cloud or on distributed storage systems. Besides its efficient data access, it supports extended metadata, similar to its predecessor the OME-Tiff. Key applications include not only large datasets (~terabyte scale), but also highly multidimensional data.

This workshop has two complementary components to be run in parallel:

  1. training participants how to convert to and use OME-Zarr format
  2. building a simple user interface for an existing conversion tool.

In part 1, participants will have the opportunity to experiment with this new format, and see how they can integrate it in their data management practices. Workshop participants are encouraged to bring a small image dataset of their own, that they would like to convert to OME-Zarr. The workshop will be performed on two cloud-based desktop environments for image analysis developed at the EMBL, BARD (for internal users) and BAND (for external users). We will start by providing a brief introduction to this platform and the OME-Zarr specification. Then we will introduce conversion tools such as BatchConvert or EuBI-Bridge to convert image datasets to OME-Zarr. Participants will be assisted with converting their datasets using these tools. We will also show how to visualize images saved in this format, and demonstrate a small example of analysis workflow / programmatic access to the data. We will discuss how to make the imaging data FAIR (Findable Accessible Interoperable Reusable) by providing REMBI (REcommended Metadata for Biological Imaging) compliant metadata. Finally we will see how the conversion workflow and the associated data can be registered in LabID, the on-site data management system available at the EMBL, and how to make it ready for submission to BioImage Archive. The following BioImage Archive submission will be used as a show case: https://www.ebi.ac.uk/biostudies/bioimages/studies/S-BIAD2258.

In part 2, we will build a simple point and click user interface for the EuBI-Bridge conversion tool. Options for this include a “terminal-based user interface” (TUI), or a plug-in for the python based Napari image viewer. Such a UI should increase accessibility of the conversion tool to non-expert users, and facilitate adoption of the OME-Zarr format.

Key Goals / Expected Outcomes

  • Learn about OME-Zarr and associated metadata specification
  • Convert your own data to OME-Zarr
  • Learn how to visualize and use OME-Zarr in a real-case scenario
  • Learn how LabID can be used to register the conversion workflow and associated data
  • Learn how to prepare the data for submission to BioImage Archive
  • Build a simple UI for an OME-Zarr conversion tool

Background / Resources

Participants are expected to have a background in microscopy or image analysis, and are encouraged to bring a small example dataset they want to convert.

Participants can experiment with the conversion tools on their own computer, but to avoid spending time on setup, we will favor using the online environments (BAND/BARD).

Prior to the event, the participants will be encouraged to watch this recent webinar on the OME-Zarr topic: https://www.ebi.ac.uk/training/events/towards-open-and-standardised-imaging-data-introduction-bio-formats-ome-tiff-and-ome-zarr/.

Finally, discussion around other examples of BioImage Archive submissions is also welcome.

Skills Needed

No programming skills or prior experience with OME-Zarr is required. Experienced OME-Zarr users are also welcome to share their expertise.

Project Title

Image Component Integration for Depictio

Project lead

Thomas Weber

Format

hybrid

Project Description

Create an interactive image viewer component for Depictio dashboards that handles images efficiently. Users should be able to pan, zoom views while interacting with quantitative data visualizations. The project will explore modern web-based image visualization approaches and determine the optimal technical stack during implementation.

Key Goals / Expected Outcomes

  • Create working image viewer prototype using modern web technologies
  • Establish data conversion pipeline for pyramidal/tiled formats
  • Integrate reusable Dash component within Depictio
  • Document implementation and provide usage examples

Background / Resources

Potential approaches to explore:

  • Viv/Vizarr (optimized for scientific imaging with multi-channel support)
  • OpenSeadragon (widely-used deep zoom viewer)
  • Leaflet/OpenLayers (map libraries adapted for images)

Key concepts:

  • Pyramidal/multi-resolution tiling strategies
  • Image formats: PNG, JPG, TIFF, OME-TIFF, Zarr, Deep Zoom Images (DZI)
  • Bio-Formats and VIPS for format conversion
  • Depictio repository

Skills Needed

Essential: Python (intermediate), JavaScript (intermediate), web development basics Helpful: Plotly Dash, image processing concepts, React components Bonus: Experience with tiling systems, web-based viewers, or large-scale data handling

Registration

Thank you for your interest in joining the Data Science 3C-CoDash! 🎉

👉 Registration is now open here until 12/12/2025.

You can choose up to 3 workshops, by order of preference, and we will try our best to assign people to their favorite choice.
Participants will be notified of their assignment on 19/12/2025.

Contact

If you have any questions about the Hackathon, or anything else Bio-IT-related, please contact bio-it@embl.de.