Preview

Vavilov Journal of Genetics and Breeding

Advanced search

Laboratory information systems for research management in biology

https://doi.org/10.18699/VJGB-23-104

Abstract

Modern investigations in biology often require the efforts of one or more groups of researchers. Often these are groups of specialists from various scientific fields who generate and share data of different formats and sizes. Without modern approaches to work automation and data versioning (where data from different collaborators are stored at different points in time), teamwork quickly devolves into unmanageable confusion. In this review, we present a number of information systems designed to solve these problems. Their application to the organization of scientific activity helps to manage the flow of actions and data, allowing all participants to work with relevant information and solving the issue of reproducibility of both experimental and computational results. The article describes methods for organizing data flows within a team, principles for organizing metadata and ontologies. The information systems Trello, Git, Redmine, SEEK, OpenBIS and Galaxy are considered. Their functionality and scope of use are described. Before using any tools, it is important to understand the purpose of implementation, to define the set of tasks they should solve, and, based on this, to formulate requirements and finally to monitor the application of recommendations in the field. The tasks of creating a framework of ontologies, metadata, data warehousing schemas and software systems are key for a team that has decided to undertake work to automate data circulation. It is not always possible to implement such systems in their entirety, but one should still strive to do so through a step­by­step introduction of principles for organizing data and tasks with the mastery of individual software tools. It is worth noting that Trello, Git, and Redmine are easier to use, customize, and support for small research groups. At the same time, SEEK, OpenBIS, and Galaxy are more specific and their use is advisable if the capabilities of simple systems are no longer sufficient. 

About the Authors

A. M. Mukhin
Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences; Kurchatov Genomic Center of ICG SB RAS; Novosibirsk State University
Russian Federation

Novosibirsk



F. V. Kazantsev
Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences; Kurchatov Genomic Center of ICG SB RAS; Novosibirsk State University
Russian Federation

Novosibirsk



S. A. Lashin
Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences; Kurchatov Genomic Center of ICG SB RAS; Novosibirsk State University
Russian Federation

Novosibirsk



References

1. Barillari C., Ottoz D.S.M., Fuentes-Serna J.M., Ramakrishnan C., Rinn B., Rudolf F. openBIS ELN-LIMS: an open-source database for academic laboratories. Bioinformatics. 2016;32(4):638-640. DOI 10.1093/bioinformatics/btv606

2. Bauch A., Adamczyk I., Buczek P., Elmer F.J., Enimanev K., Glyzewski P., Kohler M., Pylak T., Quandt A., Ramakrishnan C., Beisel C., Malmström L., Aebersold R., Rinn B. openBIS: a flexible framework for managing and analyzing complex data in biology research. BMC Bioinformatics. 2011;12:468. DOI 10.1186/1471-2105-12-468

3. Brazma A., Hingamp P., Quackenbush J., Sherlock G., Spellman P., Stoeckert C., Aach J., Ansorge W., Ball C.A., Causton H.C., Gaasterland T., Glenisson P., Holstege F.C., Kim I.F., Markowitz V., Matese J.C., Parkinson H., Robinson A., Sarkans U., Schulze-Kremer S., Stewart J., Taylor R., Vilo J., Vingron M. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 2001;29(4):365-371. DOI 10.1038/ng1201-365.

4. Brown R., Porter T. Category Theory and Higher Dimensional Algebra: potential descriptive tools in neuroscience. arXiv. 2003. DOI 10.48550/arXiv.math/0306223

5. Chacon S., Straub B. Pro Git. Kaliforniya: Apress Berkli, 2014. DOI 10.1007/978-1-4842-0076-6

6. Ehresmann A., Vanbremeersch J. Memory Evolutive Systems: Hierarchy, Emergence, Cognition. Elsevier Science, 2007. Friedrich A., Kenar E., Kohlbacher O., Nahnsen S. Intuitive web-based experimental design for high-throughput biomedical data. BioMed Res. Int. 2015;2015:958302. DOI 10.1155/2015/958302

7. Galaxy Community. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res. 2022;50(W1):W345-W351. DOI 10.1093/nar/gkac247

8. Guizzardi G. Ontology, ontologies and the “I” of FAIR. Data Intell. 2020;2(1-2):181-191. DOI 10.1162/dint_a_00040

9. Guizzardi G., Fonseca C.M., Benevides A.B., Almeida J.P.A., Porello D., Sales T.P. Endurant Types in Ontology-Driven Conceptual Modeling: Towards OntoUML 2.0. In: Conceptual Modeling – 37th International Conference, Xi’an, China, October 22–25, 2018. Proceedings. Berlin: Springer, 2018;136-150. DOI 10.1007/978-3-030-00847-5_12

10. Gutierrez C., Hurtado C.A., Vaisman A. Introducing time into RDF. IEEE Trans. Knowl. Data Eng. 2007;19(2):207-218. DOI 10.1109/TKDE.2007.34

11. Hiltemann S., Rasche H., Gladman S., Hotz H.-R., Larivière D., Blankenberg D., Jagtap P.D., Wollmann T., Bretaudeau A., Goué N., Griffin T.J., Royaux C., Bras Y.L., Mehta S., Syme A., Coppens F., Droesbeke B., Soranzo N., Bacon W., Psomopoulos F., Gallardo-Alba C., Davis J., Föll M.C., Fahrner M., Doyle M.A., Serrano-Solano B., Fouilloux A.C., van Heusden P., Maier W., Clements D., Heyl F., Network G.T., Grüning B., Batut B. Galaxy Training: a powerful framework for teaching! PLoS Comput. Biol. 2023;19(1):e1010752. DOI 10.1371/journal.pcbi.1010752

12. Hoops S., Sahle S., Gauges R., Lee C., Pahle J., Simus N., Singhal M., Xu L., Mendes P., Kummer U. COPASI – a COmplex PAthway SImulator. Bioinformatics. 2006;22(24):3067-3074. DOI 10.1093/bioinformatics/btl485

13. Hucka M., Bergmann F.T., Chaouiya C., Dräger A., Hoops S., Keating S.M., König M., Le Novère N., Myers C.J., Olivier B.G., Sahle S., Schaff J.C., Sheriff R., Smith L.P., Waltemath D., Wilkinson D.J., Zhang F. The Systems Biology Markup Language (SBML): language specification for Level 3 Version 2 Core Release 2. J. Integr. Bioinform. 2019;16(2):20190021. DOI 10.1515/jib-2019-0021

14. Kuś M., Skowron B. (Eds.) Category Theory in Physics, Mathematics, and Philosophy, Springer Proceedings in Physics. Cham: Springer, 2019. DOI 10.1007/978-3-030-30896-4

15. MongoDB: The Developer Data Platform [WWW Document], n.d. MongoDB. URL https://www.mongodb.com (accessed 9.19.23)

16. Novère N.L., Finney A., Hucka M., Bhalla U.S., Campagne F., ColladoVides J., Crampin E.J., Halstead M., Klipp E., Mendes P., Nielsen P., Sauro H., Shapiro B., Snoep J.L., Spence H.D., Wanner B.L. Minimum information requested in the annotation of biochemical models (MIRIAM). Nat. Biotechnol. 2005;23(12):1509-1515. DOI 10.1038/nbt1156

17. Novère N.L., Hucka M., Mi H., Moodie S., Schreiber F., Sorokin A., Demir E., Wegner K., Aladjem M.I., Wimalaratne S.M., Bergman F.T., Gauges R., Ghazal P., Kawaji H., Li L., Matsuoka Y., Villéger A., Boyd S.E., Calzone L., Courtot M., Dogrusoz U., Freeman T.C., Funahashi A., Ghosh S., Jouraku A., Kim S., Kolpakov F., Luna A., Sahle S., Schmidt E., Watterson S., Wu G., Goryanin I., Kell D.B., Sander C., Sauro H., Snoep J.L., Kohn K., Kitano H. The Systems Biology Graphical Notation. Nat. Biotechnol. 2009;27(8): 735-741. DOI 10.1038/nbt.1558

18. Olivier B.G., Snoep J.L. Web-based kinetic modelling using JWS Online. Bioinformatics. 2004;20(13):2143-2144. DOI 10.1093/bioinformatics/bth200

19. Petzold A., Asmi A., Vermeulen A., Pappalardo G., Bailo D., Schaap D., Glaves H.M., Bundke U., Zhao Z. ENVRI-FAIR-interoperable environmental FAIR data and services for society, innovation and research. In: 15th International Conference on eScience (eScience), San Diego, CA, USA, 2019. IEEE, 2019;277-280. DOI 10.1109/eScience.2019.00038

20. PostgreSQL: the world’s most advanced open source database [WWW Document], n.d. URL https://www.postgresql.org/

21. Rad B.B., Bhatti H.J., Ahmadi M. An introduction to Docker and analysis of its performance. Int. J. Comput. Sci. Netw. Secur. 2017;17(3): 228-235

22. Rocca-Serra P., Brandizi M., Maguire E., Sklyar N., Taylor C., Begley K., Field D., Harris S., Hide W., Hofmann O., Neumann S., Sterk P., Tong W., Sansone S.-A. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics. 2010;26(18):2354-2356. DOI 10.1093/bioinformatics/btq415

23. Roche D.G., Kruuk L.E.B., Lanfear R., Binning S.A. Public data archiving in ecology and evolution: how well are we doing? PLoS Biol. 2015;13(11):e1002295. DOI 10.1371/journal.pbio.1002295

24. Schreiber F., Bader G.D., Golebiewski M., Hucka M., Kormeier B., Novère N.L., Myers C., Nickerson D., Sommer B., Waltemath D., Weise S. Specifications of standards in systems and synthetic bio logy. J. Integr. Bioinform. 2015;12(2):1-3. DOI 10.1515/jib-2015-258

25. Software OpenLink. Virtuoso Open-Source Edition: Building. 2022. URL https://github.com/openlink/virtuoso-opensource

26. Spivak D.I., Kent R.E. Ologs: a categorical framework for knowledge representation. PLoS One. 2012;7(1):e24274. DOI 10.1371/journal.pone.0024274

27. The Univalent Foundations Program. Homotopy Type Theory: Univalent Foundations of Mathematics. Princeton, NJ: Institute for Advanced Study, 2013

28. Wilkinson M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.W., da Silva Santos L.B., Bourne P.E., … van Mulligen E., Velterop J., Waagmeester A., Wittenburg P., Wolstencroft K., Zhao J., Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data. 2016;3:160018. DOI 10.1038/sdata.2016.18

29. Wolstencroft K., Owen S., Krebs O., Nguyen Q., Stanford N.J., Golebiewski M., Weidemann A., Bittkowski M., An L., Shockley D., Snoep J.L., Mueller W., Goble C. SEEK: a systems biology data and model management platform. BMC Syst. Biol. 2015;9:33. DOI 10.1186/s12918-015-0174-y

30. Yan Y., Yan J. Hands-On Data Science with Anaconda: Utilize the right mix of tools to create high-performance data science applications. Packt Publishing Ltd., 2018

31. Zeeberg B.R., Riss J., Kane D.W., Bussey K.J., Uchio E., Linehan W.M., Barrett J.C., Weinstein J.N. Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics. 2004;5:80. DOI 10.1186/1471-2105-5-80


Review

Views: 353


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)