Name: Streamlining Bioinformatics Data Pipelines with Omnipy
Start: 2024-12-12 13:00:00 UTC
End: 2024-12-12 16:00:00 UTC

View event

Date: 12 December 2024 @ 13:00 - 16:00

Timezone: Brussels

Duration: 3 hours

Language of instruction: English

Loading map...

The time is ripe to challenge fundamental assumptions on managing bioinformatics data. With good reason, the notion of piping together command line tools as workflows has long dominated bioinformatics. However, returning to flat files on the command line at each junction adds parsing and serialisation steps that are unnecessary and often change the data in subtle (or less subtle) ways that are hard to fix. With Omnipy, we instead follow the lead of ETL (Extract, Transform and Load) solutions for Big Data – streamlined as an elegant and powerful Python library. Omnipy offers a systematic and scalable approach to building pipelines where the data is in focus, more than the tools. Through the specification of data models/parsers, Omnipy allows researchers to import data in various formats and wrangle the data through stepwise transformations. For automation with large data, Omnipy seamlessly scales up for deployment on remote infrastructures. The workshop will introduce the Omnipy library and its main concepts. This will take form as a mix of presentations and hands-on exercises. While Omnipy is designed for cross-domain applicability, the primary use cases are bioinformatics-related. We will thus make use of real-world examples that should feel relevant to many of the attendees. We have held similar workshops before, e.g. for Oslo Bioinformatics Week 2023, Digital Scholarship Days 2024 (part1 and part two) at UiO. Continuously improved over several years – not the least through feedback from workshop attendees – Omnipy is finally getting ready for its v1.0 release!

Contact: [email protected]

Venue: Ole-Johan Dahl's House, 23B Gaustadalléen

City: Oslo

Region: Oslo kommune

Country: Norway

Postcode: 0373

Prerequisites:

The participants should have some experience with Python programming/scripting. We will not spend time explaining basic syntax and concepts, other than what is related to type hints. Experience with type hints in Python is useful, but not required.

Learning objectives:

Introduction to Python type hints and Pydantic models
How to use type hints to define models, datasets, tasks and flows in Omnipy
How to write a simple parser for a tabular file format
How to set up an executable mapping of data from one metadata schema to another
How to automate an Omnipy data pipeline by deploying it to the Prefect orchestrator on NIRD (National Infrastructure for Research Data)

Organizer: The workshop is provided by the Oslo node of ELIXIR Norway as part of an extended event organised by the Student Committee of the Centre for Bioinformatics at the University of Oslo in collaboration with the ISCB Regional Student group in Norway

Host institutions: University of Oslo

Eligibility:

First come first served

Target audience: PhD, Postdoctoral Fellows, Technical personnel

Capacity: 20

Tech requirements:

Laptop, with an account set up for Google Colab

Cost basis: Free to all

Sponsors: UiO:Life Science, the Group for temporary employees at the Division of Laboratory Medicine (KLM TempAware), NCMM - Norsk senter for molekylærmedisin

Scientific topics: Data curation and archival, Data identity and mapping, Data quality management, Data governance, Workflows

Operations: Data handling

External resources:

FAIRtracks

omnipy

Activity log

Content provider

Node

Streamlining Bioinformatics Data Pipelines with Omnipy

FAIRtracks

omnipy