Ready for BioData management? - 02/07/2019 @ Instituto Gulbenkian de Ciência

Start date

Tuesday, July 2, 2019 - 09:45

End date

Tuesday, July 2, 2019 - 17:30

Organisers

BioData.pt

Instituto Gulbenkian de Ciência

Are you extracting all the possible value from your data?

In research activities, data management is an asset required for success.

Today, funding agencies, journals, and other stakeholders require data producers to share, archive, and plan data management.

IGC is organizing a workshop with speakers from R&I Institutions to share showcases, a group simulation of a data management plan and tools available to improve your skills.

You will learn what, how and why of data management and how it can add value to your research.

Making the data produced in Biological and Health research Findable, Accessible, Interoperable and Reusable improves the return of public and private investment in Science. Furthermore, these four criteria are becoming additional KPIs to measure the impact of your deliverables.

Keynote Speaker - Ulrike Wittig

July 2^nd at the session Creating value from Data Management will have Ulrike Wittig as the Key Note Speaker.

Ulrike Wittig has a background in biochemistry and received her Ph.D. in biology for the experimental work on mechanisms of apoptosis and oxidative stress in mammalian cells. She is a research associate in the Scientific Database and Visualisation group at the Heidelberg Institute for Theoretical Studies (HITS), Germany. For more than 20 years she is active in the development of biological databases, database curation and data management. She is member of the Systems Biology Service Center of de.NBI /ELIXIR-Germany and of the FAIRDOM initiative. Ulrike Wittig is getting involved with the Biocuration community and as a member of the STRENDA commission she actively supports the initiative for Standards for Reporting Enzymology Data.

Creating Value from Data Management (Panel)

For the first time in Portugal BioData.pt gathered for the event “Ready for Data Management - Creating value from Data Management” five of the top heads of Bioinformatics in Portugal: Arlindo Oliveira (IST), Cláudio Sunkel (replaced at the last minute by Jorge Vieira, both from IBMC), Élio Sucena (IGC), Pedro Cruz (IBET), Rogério Gaspar (FFUL), Mário Gaspar Silva (INESC ID /Elixir Portugal), José Pereira Leal (BioData.pt/Ophiomics).

From left to right: Élio Sucena (IGC), Jorge Vieira (IBMC), Arlindo Oliveira (IST), Isabel Rocha (UNL), Pedro Cruz (IBET), Rogério Gaspar (FFUL), Ulrike Wittig (Heidelberg Univ.), Mário Gaspar Silva (INESC ID)From left to right: Élio Sucena (IGC), Jorge Vieira (IBMC), Arlindo Oliveira (IST), Isabel Rocha (UNL), Pedro Cruz (IBET), Rogério Gaspar (FFUL), Ulrike Wittig (Heidelberg Univ.), Mário Gaspar Silva (INESC ID) — From left to right: Élio Sucena (IGC), Jorge Vieira (IBMC), Arlindo Oliveira (IST), Isabel Rocha (NOVA), Pedro Cruz (IBET), Rogério Gaspar (FFUL), Ulrike Wittig (Heidelberg Univ.), Mário Gaspar Silva (INESC-ID)

This event took place on the 2nd July afternoon, in IGC Ionians Auditorium, and it was an innovative session about BioData Management that brought together students, researchers, funders and bioinformatics experts to discuss and bring new ideas about this subject. This session chaired by Isabel Rocha (Universidade Nova de Lisboa) was brilliant started with Ulrike Wittig from the Institute of Theoretical Sciences of Heidelberg University that contibuted with a lot of relevant information about the best practices and FAIRDOM framework in Data Management.

Ulrike Wittig (Heidelberg University) speaking about FAIR principles

The audience had the chance to engage and contribute to a discussion about how data management should be done and which are the main needs about data management.

The session was wrapped up by José Leal from BioData and Ophiomics that left the audience with thought: “Creating more and more data is not necessarily good, we need to know how to add value through Data Management and Data Curation. Data anyone can collect... but in the future, the challenge, which seems to be already here, is to make sense of all this data and transform data into knowledge...”

José Pereira Leal closing up the session

Invited Speakers

Some biographic notes about our invited speakers :

	Élio Sucena, IGC, got his PhD in 2001 from the University of Cambridge (UK) on Genetics, Evolution and Development under the supervision of David L. Stern. He then had post-doctoral experience at the Universities of Princeton (US) and Western Ontario (Canada). Here, he pursued studies on the Evolution of Development using the comparative method on distinct arthropod models, namely Drosophilaspecies, polyembryonic wasps and spider mites. In 2004, he established an independent research team at Instituto Gulbenkian de Ciência (Portugal) where he has been ever since.
	Arlindo Oliveira, IST, is a professor of the Department of Computer Science and Engineering of Instituto Superior Técnico (IST). He is presently the President of IST and member of its executive board. Obtained a PhD from UC Berkeley in 1994, under a Fulbright fellowship, after a BSc and a MSc from IST, in 1986 and 1989, respectively. His major areas of interest are Algorithms and Complexity, Machine Learning, Bioinformatics and Digital Circuit Design. Arlindo Oliveira also worked at CERN, Cadence Laboratories and INESC-ID. And he’s a member of the Portuguese Academy of Engineering and a senior member of IEEE.
	Cláudio Sunkel, IBMC, is a Full Professor of Molecular Biology at the Biomedical Institute of University of Porto, Director of the Institute of Molecular Cellular Biology and head of the Molecular Genetics Group at the same institute. He is a Member of the European Molecular Biology Organization since 2000, was Vice-President of the European Molecular Biology Conference (2007-2010), Vice-President of the European Molecular Biology Laboratory Council (2010-2012), elected Chair of the EMBL Council (2013). Currently is a member of the Welcome Trust-India Alliance fellowship selection committee (2009-2014). During 2007-2008 was the National Coordinator for the Evaluation of Research Units by the Foundation for Science and Technology of Portugal. His laboratory is mostly devoted to the study cell division and of the mechanisms involved in maintaining genomic stability in higher eukaryotes. Started these studies during his posdoctoral work at Imperial College, UK, and later continued in Porto. Has published over 100 original peer review articles. Supervised 24 PhD students and many postdocs. Teaches Molecular Genetics for Biochemistry and Bioengineering BsC Major at ICBAS, University of Porto. First Degree Honors in Biology (1979) and a Ph.D. in Genetics (1983) from Sussex University, UK. Was born in Santiago, Chile, 1958.
	Pedro Cruz, IBET has a degree in Chemical Engineering, branch of Biotechnology from IST – University of Lisbon, an MBA in International Businesses from the Catholic University – Lisbon, and a PhD in Biochemical Engineering from ITQB – UNL. Pedro was involved in the start-up of iBET’s pilot plant and worked as a researcher at Novartis-Switzerland. Has large scientific experience gained through the participation in several international research projects in the fields of viral vaccines and gene therapy as a Senior Researcher at the Animal Cell Technology Lab of iBET, including various periods abroad (Innogenetics-Belgium, GBF (now HZI)–Germany, Penn State U.–USA). He is also lecturing on bachelor and masters management courses at Universidade Atlântica – Oeiras, in the areas of entrepreneurship, strategy and project valuation. Pedro is founder and the current CSO of ECBio.
	Rogério Gaspar, FFUL, has more than 20 year’s experience in the design and evaluation of nanoparticles and liposomes for drug (e.g. Leishmaniasis and cancer) and gene (cytosolic) delivery. More recently, these interests have broadened to other aspects of the nanomedicine field including the design of vectors for MRI imaging and also understanding the cellular mechanisms that govern the action of nanomedicines. He has a long career in regulatory affairs and regulatory science both in academic, regulatory and industrial positions, at national and international levels. He frequently participates and coordinates international activities dealing with regulatory aspects of nanomedicines (including European Science Foundation, European Commission, European Medicines Agency, European Federation of Pharmaceutical Sciences – EUFEPS, TI Pharma – NL).

Hands on DMP Coaching team

	Ana Portugal Melo is the BioData Executive Director and Deputy Head of Node of ELIXIR PT. Ana is a Biologist with a PhD in Biomedical Sciences and a Post-Graduation in Management.
	Daniel Faria is Postdoctoral Researcher at Instituto Gulbenkian de Ciencia learn more about Daniel Faria here
	João Cardoso is a junior researcher at the INESC-ID, IDSS - Information and Decision Support Laboratory. He holds a MsC in Telecommunications and Informatics Engineering from the Instituto Superior Técnico, and is currently pursueing a PhD in Computer Science and Engineering at the same institute. His research domains are conceptual modeling and management, business process management modeling, semantic technologies and data management.
	Ricardo Leite has a PhD in Molecular and Cellular Biology, an undergraduate degree in Biochemistry, and a master in Marine Sciences. Ricardo leads the Genomics Unit of Instituto Gulbenkian de Ciências as well the Bioinformatic Unit as acting head. He's also the Coordinator of the User Support of Biodata.pt and has broad scientific experience in bioinformatics, evolution, industrial biotechnology, NGS sequencing, biodefense and marine biotechnology
	Yulia Karimova has a degree in Mathematics and Informatics, a Masters degree in Information and Science, and currently, she's attending a Ph.D. in Digital Media at the University of Porto. She's also a researcher at the INESC TEC and involved in project “TAIL" related to research data management. Yulia collaborates with researchers from different scientific domains to analyze the difficulties they face during deposit, description, and publish data. With the overall goal to contribute to the development of tools that help researchers facilitate all RDM activities.

Hands on DMP

On 2nd July BioData.pt organized a hands-on session focused on demistifying the concept of Data Management Plan (DMP). The session took place at the recently renovated training room of Instituto Gulbenkian de Ciência (IGC).

The organizers developed a participatory methodology based on the DMP Common Standard Model, where groups of 6 to 8 participants would create their own DMP (based on a fictitious research project, proposed by the organizing team), by filling in a “DMP Canvas”. The process, introduced by an overall presentation framing the value and the steps to create the DMP, was supported continuously by a team of facilitators from BioData.pt.

Panoramic view of the room during the workshop

Training room overview

Fifty-eight researchers and project managers, from 19 institutions, from Minho to Algarve, namely BioData.pt consortium (21), IGC (15), IST-ULisboa (4) and Champalimaud Foundation (3) were involved in this pilot initiative.

Participants were very committed and enthusiastic in response to this challenge. Each group had the opportunity to present and discuss the resulting DMP with a committee of BioData.pt members, chaired by the invited guest Ulrike Wittig, from the Institute of Theoretical Sciences of Heidelberg University.

A bottle of sparkling wine was awarded to the best DMP.

Overall, participants and organizers considered this an innovative and successful method to introduce research professionals to data management, particularly to the currently mandatory DMPs.

BioData.pt is available to extend this session to other institutions and geographies and is working in making the used materials available for the community. Furthermore, a next level DMP course is being prepared where institution teams will have the opportunity to familiarize themselves with the Data Stewardship Wizard framework, that allows for the interactive creation of DMPs, and bring their own projects, to create their respective DMPs.

Interviews

At the event, we took the opportunity to interview some participants and ask them about their experience with the workshop. We ended up talking with people from differente fields and institutions that occupy very different positions in their groups, and gathered some opinions about the workshop and their take away message at the end of the morning session.

On July 2nd we got together for a hands-on experience in Data Management Plans (DMP). The event started with some presentations and then participants had the chance of putting their heads together to create a DMP for a project. During lunch we went around to ask participants their opinions about data management and the working session they had just participated in.

Laura Ward, João Bauto and Hugo Cachitas

We talked to Laura Ward, João Bauto and Hugo Cachitas from the Champalimaud Foundation, about the perspective from software platforms, long-term usability and the difference between human and animal data.

Bruno Costa: So what did you think of this experience?

Laura Ward: I think it's useful because you had the research, you had software platform stuff, and Pedro to kind of coordinate everything. It's kind of interesting to see different people's opinions. And it was very confusing. We still have no solutions now but we realized-

João Bauto: You have way too many options.

LW: A much broader overview of -

JB: The kind of detail that you probably have to get in. It's something that we, at least us from the software platform, are not aware. So it's something that, when in connection with the people that do the management, it might help us do it. But on the long term I think it will be difficult to apply this to most of the grants I think.

BC: But you can go through a protocol to see which kind of areas you should try to focus on.

LW: It's good to identify the bare minimum, for sure. Because this level was really detailed and I think this will be really difficult to implement. Unless there's somebody whose job that is, right? Because getting a P.I. to go through that level of detail is nigh on impossible.

BC: Technically it should be like a one or two day experience where you go through this with all of the people involved in the project. And we go through something like an exercise like this, we go through all of the steps and try to get the bare minimum for the project and plan ahead what would happen.

LW: If it is for a project than I think it could be part of the annual meetings. It just has to be done at the application stage, before then it's quite a big commitment. And obviously as they get further, you do them more often then it will become more routine.

BC: I guess it would work maybe not as you’re applying for the grant but once the grant is is given. I think this could be the first step.

LW: Yeah, maybe even before it started. So once you get the funding that's kind of a little bit of dead time in between, isn't it? That would be a good time to do this kind of exercise. Because people won't be necessarily totally ingrained in the experiments. And they're also supposed to guide you on how to do your experiments! So you don't do it too late because everything will already been set up, probably incorrectly.

BC: So was this workshop useful?

LW: Yes, it was really useful to me. I already tried to write a DMP, really basic, because I didn't have much idea of what I was doing, so yes really useful.

BC: What is your name? I didn't catch your name.

JB: I'm João, I work on software platforms and I'm basically data curating / data storage so I'm more focused on that. For me, I'm more interested on trying to get some kind of standards on how to save data or how to share it, than on how to focus on the management of the project itself. So we have centralized data storage on Champalimaud and we're still trying to figure out how to process everything. What should we save. How it should be saved. And in terms of long term solutions when we have to save data, how it should be done and implemented.

BC: For software there should be a different kind of requirements.

Hugo Cachitas: Yeah. So for human data, it's pretty different than the one that we are most used to working with, which is animals. So we don't have to care about anonymizing the data.

LW: Because at Champalimaud it is sort of slightly separated, isn't it? But they still don't have anybody specifically alocated to the management.

HC: Yeah, but we have to provide the access to computational for both animal and human data. It's kind of important for us to do the separation of - making sure that the data that we are running cannot be addressed to a specific person. So it's important for us to make sure that we don't have any data that it's not anonymous.

BC: What is your name?

HC: I'm Hugo and I also work at a software platform, developing specific databases on a request basis. I was recently asked to check on a DMP for a researcher that was submitting it for a grant. I don't know how it went down but- So when I help them and they have a simple request sometimes, they want to sort their data in a s-

BC: But once you're preparing the structure of the database, you always have to think of how this should be implemented and how the data should be stored, what kind of tables. This is just another step to consider when making the management plan.

HC: Yeah. Yeah.

BC: How would this differ from the process of constructing the architecture of the data model and the start of data?

HC: I never looked into it from the DMP perspective. More like the usability of it. But more for the work they do, not to share it afterwards. Although that is important and I think we're kind of converging to that afterwards.

BC: Otherwise they're just going to be data silos.

HC: Which makes no sense.

LW: We talked a bit about whose responsibility it is of sharing, as well, right?

HC: But it's more about the data formats, I mean, because we are not following any convention for now.

BC: You can convert that on the fly, I guess. That it's not necessarily the biggest problem. Just ensuring that beforehand you know how data is going to be processed and that you know who can have access to it. What kind of data, how it’s planned in case something is lost, if it's replicable...

HC: For now, it's just a matter of having the fields they need to conduct their research, the axis is on a group level. So it's very confined at this stage. But then, yeah all of these steps that the DMP requires, we also need to think about them for this first level. Let's say we take care of the backups: where to host the database, if it's internal / private, if it has patient data, if it can go to Amazon, or something more public. Or that we have no control over the machines. And so all of that we discuss beforehand.

BC: So implicitly there's a kind of- that's a data management plan! But without a formal structure.

HC: Yes.

BC: Do you think this workshop is going to contribute to change a bit the process with which this is done? You see the need to refine some of the aspects of your current-

HC: The process of it I'm not sure. But at least it was really helpful to identify a lot of problems that arise in all of these, when you are trying to prepare a project like this. So we actually spent- while some were doing the exercise, others were discussing the points that were arising, and what they usually do, and what is done in terms of other countries’ policies. Because there's not really any conventions between the USA and Europe and etc.

LW: It's quite nice not to have the pressure of having to- like write this DMP now, it’s due next week. And actually just have the time to think about it instead. Which is kind of the whole point of the DMP in the first place, isn't it? To make people kind of consider-

BC: If you do it beforehand, you don't have the pressure of having to fit everything somewhere just to get it done. You can actually plan how it should accurately be done, and anticipate all of the issues that can arise.

HC: Yeah but, as we discussed, it is really hard to anticipate most of the points that the DMP requests because it changes. It can change so much.

BC: But as long as you think of some of these issues that can arise, I think you can plan for them. And when they actually happen, you can see-

HC: Yes, at least we have a plan, an initial plan. And then you can know how much you are deviating from it. It's good to keep track of it. But I think the value of the DMP is to replicate scientific results. Besides a publication you also have the (BC: usability) yeah, and the track of all the data processing. But I think the scientific community doesn't value that enough. So it's a lot easier to publish new results than the results that are revisiting old stuff. And so I don't know if, even if you have a common or a structured database with standards across countries and the research units, I don't know if people are really going to give it the use it deserves. Because then there's no outcome. Or at least in the short-term.

BC: I think in the long-term it will enable interoperability. It will enable usability of the data. Because if you have a detailed provenance of all of the data and the descriptive process of how the data was generated, people can reuse it. And it provides a broader usability of the data, because you can aggregate data from different sources.

HC: In that sense, yes. You can complement your datasets with datasets from other people and it can increase a lot your dataset, if it fits. It also needs to be-

LW: It's the quality assurance thing that is the most difficult, right? And that's part of the data management plan and we've talked about it a bit. But it's really difficult to know how to do that. How do you know that somebody else's data is-

HC: Jorge was saying that there is this repository where everyone can put things in there, but they're not curated. So when you go there and take something out, most of the time it's either incomplete or it has some errors in the middle. So if you take the time, you can select the data that really helps you. But most of the time you can just throw it out. So that's a lot of steps still needed to really have this.

BC: You need a checklist to ensure that the data that is being deposited meets the minimal amount of those criteria.

HC: We also talked about that. You cannot do that on the management side of it. Who hosts the data doesn't have the resources, usually, to have a team there to curate all of the data. So you need to enforce it when you upload. But then if you're a researcher and you go upload your data, if it is too hard or if it takes a lot of steps and the audit is never compliant with what it needs...Unless you are forced by the funding agency, you're not trained to do it.

BC: My only solution to that is to provide a score for the data you uploaded. Based on that score you can rank it. That could entice people to get a better score for their data.

HC: Depends on how you score it. In a movie database, sometimes a good score doesn't mean the movie is good so.

LW: Doesn't mean that the data is interesting!

Inês Chaves

Watch the video on our Youtube channel here.

Sandra Silva

We interviewed Sandra Silva from the Institute for Bioengineering and Biosciences (iBB) at Instituto Superior Técnico (IST), about why she signed-up for this experience and what they currently do in her lab for data management.

Bruno Costa: What motivated you to sign-up for this session?

Sandra Silva: I’m very interested in data management, specially because in Microbiology, in our field of Biology, we produce a lot of data, massive quantities of data. And very often we have difficulty in knowing how to organize them, a problem that we face in our day-to-day.. So that was the goal that brought me here.

BC: How do you usually plan how the data is going to be treated? Is there a plan? Is it as the data is processed you put it in a disk, is there a process?

SS: At the moment, I think it’s a bit as the projects develop you come up with different methods for each researcher. Each person self-organizes, sharing just the final data or the raw data with the project leader, but inside the same group there’s no sharing of data. There’s that difficulty because these are big volumes of data and you can’t do it easily.

BC: There’s no person in your team or institution that is responsible for creating policies for data submission to platforms, for example, when it’s necessary to publish the data?

SS: Not in an organized fashion, no. Each group ends up working with its own data. There isn’t that coordination between groups.

BC: You think that, after having this experience, you might ponder creating policies for the processing of data, a set of protocols?

SS: Yes, one of my goals of coming here was exactly that one. Of trying to establish contacts with BioData.pt to bring some policies to my investigation group. To start, in some way, to have data management strategies.

BC: Did you think the workshop was useful? Did it focus on the aspects you thought were going to be difficult? Was there some things missing?

SS: I think this workshop was very short, it’s normal that you can’t know everything with just a morning session. And it was very focused in the planning of a project in general, divided into several parts, so it’s not easy for someone to become informed about everything in just one morning session. But it gave us an idea of what is a Data Management Plan, that’s for sure.

BC: Did you have an idea of what were licenses? Of the need of assigning licenses, for example?

SS: We ended up not focusing on that because throughout the working session, we didn’t manage to get to the end. And the introduction was very short, so I didn’t exactly- (MS: Next time?) Yes, exactly, next time!

Filipe Lopes

Watch the video on our Youtube channel here.

João Dias and Nádia Duarte

We interviewed João Dias, a PhD student in Computational Biology and Nádia Duarte, a lab manager, both from Instituto Gulbenkian de Ciência, about their experience with the workshop and how useful it was.

Bruno Costa: What were your expectations when you signed up for this workshop?

João Dias: My expectations were to broaden my knowledge. I’d like to understand what can be done in terms of data management. What kind of tools exist and what kind of process you have to go through so that these datasets are usable.

BC: And doing this workshop, has it been useful? Did you like it and would you recommend it to someone else?

JD: I would recommend it to everyone that wants to understand a little of what this is. This first part was basically creating a Data Management Plan (DMP). Everyone that is considering applying for a grant with FCT or other, has to think about all of these questions that we worked on. And all of the process, which I was unfamiliar with. So, yes absolutely, I would recommend it.

BC: What were your expectations when you signed up for this course?

Nádia Duarte: I wanted to know more about the best way of doing data management.

BC: And how did this whole group process go? Did you think that it was useful to share the experience with other people from different fields?

ND: Yes, I think it was useful. Some parts are still a bit difficult because you have to integrate several concepts and understand, among all of that data, what is a dataset first. And there are very specific fields that were a bit complicated for us to match. But I think it is useful, it gives us an idea of what is needed, what information you have to gather. I think we still have- But maybe in the afternoon everything will be clearer.

BC: Listening to a talk is one thing, but in fact, people would rather experience it. Have that process of going down a list...

JD: Even the debates that we had… The question of what is a dataset and what isn’t.

ND: But there’s something else, as well. There’s the question of the organization and then there’s each associated file, the associated informatic part… There were some parts where we couldn’t keep up. But I didn’t know what to expect, and now I do!