A pipeline for processing hyperspectral images, with a case of melanin-containing barley grains as an example

https://doi.org/10.18699/vjgb-24-50
Abstract
Analysis of hyperspectral images is of great interest in plant studies. Nowadays, this analysis is used more and more widely, so the development of hyperspectral image processing methods is an urgent task. This paper presents a hyperspectral image processing pipeline that includes: preprocessing, basic statistical analysis, visualization of a multichannel hyperspectral image, and solving classification and clustering problems using machine learning methods. The current version of the package implements the following methods: construction of a confidence interval of an arbitrary level for the difference of sample averages; verification of the similarity of intensity distributions of spectral lines for two sets of hyperspectral images on the basis of the Mann–Whitney U-criterion and Pearson’s criterion of agreement; visualization in two-dimensional space using dimensionality reduction methods PCA, ISOMAP and UMAP; classification using linear or ridge regression, random forest and catboost; clustering of samples using the EM-algorithm. The software pipeline is implemented in Python using the Pandas, NumPy, OpenCV, SciPy, Sklearn, Umap, CatBoost and Plotly libraries. The source code is available at: https://github.com/igor2704/Hyperspectral_images. The pipeline was applied to identify melanin pigment in the shell of barley grains based on hyperspectral data. Visualization based on PCA, UMAP and ISOMAP methods, as well as the use of clustering algorithms, showed that a linear separation of grain samples with and without pigmentation could be performed with high accuracy based on hyperspectral data. The analysis revealed statistically significant differences in the distribution of median intensities for samples of images of grains with and without pigmentation. Thus, it was demonstrated that hyperspectral images can be used to determine the presence or absence of melanin in barley grains with great accuracy. The flexible and convenient tool created in this work will significantly increase the efficiency of hyperspectral image analysis.
Keywords
About the Authors
I. D. BusovRussian Federation
Novosibirsk
M. A. Genaev
Russian Federation
Novosibirsk
E. G. Komyshev
Russian Federation
Novosibirsk
V. S. Koval
Russian Federation
Novosibirsk
T. E. Zykova
Russian Federation
Novosibirsk
A. Y. Glagoleva
Russian Federation
Novosibirsk
D. A. Afonnikov
Russian Federation
Novosibirsk
References
1. Afonnikov D.A., Genaev M.A., Doroshkov A.V., Komyshev E.G., Pshenichnikova T.A. Methods of high-throughput plant phenotyping for large-scale breeding and genetic experiments. Russ. J. Genet. 2016;52(7):688-701. https://doi.org/10.1134/S1022795416070024]
2. Afonnikov D.A., Komyshev E.G., Efimov V.M., Genaev M.A., Koval V.S., Gierke P.U., Börner A. Relationship between the characteristics of bread wheat grains, storage time and germination. Plants. 2021;11(1):35. https://doi.org/10.3390/plants11010035
3. Amanah H.Z., Wakholi C., Perez M., Faqeerzada M.A., Tunny S.S., Masithoh R.E., Choung M.G., Kim K.H., Lee W.H., Cho B.K. Near-infrared hyperspectral imaging (NIR-HSI) for nondestructive prediction of anthocyanins content in black rice seeds. Appl. Sci. 2021;11(11):4841. https://doi.org/10.3390/app11114841
4. Ambrose A., Kandpal L.M., Kim M.S., Lee W.H., Cho B.K. High speed measurement of corn seed viability using hyperspectral imaging. Infrared Phys. Technol. 2016;75:173-179. https://doi.org/10.1016/j.infrared.2015.12.008
5. Baek I., Kim M.S., Cho B.K., Mo C., Barnaby J.Y., McClung A.M., Oh M. Selection of optimal hyperspectral wavebands for detection of discolored, diseased rice seeds. Appl. Sci. 2019;9(5):1027. https://doi.org/10.3390/app9051027
6. Balasubramanian M., Schwartz E.L. The isomap algorithm and topological stability. Science. 2002;295(5552):7. https://doi.org/10.1126/science.295.5552.7a
7. Becht E., McInnes L., Healy J., Dutertre C.A., Kwok I.W., Ng L.G., Ginhoux F., Newell E.W. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 2019;37(1):38-44. https://doi.org/10.1038/nbt.4314
8. Cheshkova A.F. A review of hyperspectral image analysis techniques for plant disease detection and identification. Vavilovskii Zhurnal Genetiki i Selektsii = Vavilov Journal of Genetics and Breeding. 2022;26(2):202-213. https://doi.org/10.18699/VJGB-22-25 (in Russian)]
9. Cormen T.H., Leiserson C.E., Rivest R.L., Stein C. Introduction to Algorithms. Cambridge, Massachusetts: The MIT Press, 2022
10. da Silva B.C., de Mello Prado R., Baio F.H.R., Campos C.N.S., Teodoro L.P.R., Teodoro P.E., Santana D.C., Fernandes T.F.S., da Silva J.C.A., de Souza Loureiro E. New approach for predicting nitrogen and pigments in maize from hyperspectral data and machine learning models. Remote Sens. Appl. Soc. Environ. 2024;33:101110. https://doi.org/10.1016/j.rsase.2023.101110
11. da Silva Medeiros M.L., Cruz-Tirado J.P., Lima A.F., de Souza Netto J.M., Ribeiro A.P.B., Bassegio D., Godoy H.T., Barbin D.F. Assessment oil composition and species discrimination of Brassicas seeds based on hyperspectral imaging and portable near infrared (NIR) spectroscopy tools and chemometrics. J. Food Compos. Anal. 2022;107:104403. https://doi.org/10.1016/j.jfca.2022.104403
12. Dempster A.P., Laird N.M., Rubin D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B. 1977; 39(1):1-22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
13. Díaz-Martínez V., Orozco-Sandoval J., Manian V., Dhatt B.K., Walia H. A deep learning framework for processing and classification of hyperspectral rice seed images grown under high day and night temperatures. Sensors. 2023;23(9):4370. https://doi.org/10.3390/s23094370
14. ElMasry G., Mandour N., Ejeez Y., Demilly D., Al-Rejaie S., Verdier J., Belin E., Rousseau D. Multichannel imaging for monitoring chemical composition and germination capacity of cowpea (Vigna unguiculata) seeds during development and maturation. Crop J. 2022; 10(5):1399-1411. https://doi.org/10.1016/j.cj.2021.04.010
15. Fakthongphan J., Graybosch R.A., Baenziger P.S. Combining ability for tolerance to pre‐harvest sprouting in common wheat (Triticum aestivum L.). Crop Sci. 2016;56(3):1025-1035. https://doi.org/10.2135/cropsci2015.08.0490
16. Falcioni R., Antunes W.C., Demattê J.A.M., Nanni M.R. Reflectance spectroscopy for the classification and prediction of pigments in agronomic crops. Plants. 2023;12(12):2347. https://doi.org/10.3390/plants12122347
17. Feng H., Chen G., Xiong L., Liu Q., Yang W. Accurate digitization of the chlorophyll distribution of individual rice leaves using hyperspectral imaging and an integrated image analysis pipeline. Front. Plant Sci. 2017;8:1238. https://doi.org/10.3389/fpls.2017.01238
18. Flintham J., Adlam R., Bassoi M., Holdsworth M., Gale M. Mapping genes for resistance to sprouting damage in wheat. Euphytica. 2002; 126:39-45. https://doi.org/10.1023/A:1019632008244
19. Gao T., Chandran A.K.N., Paul P., Walia H., Yu H. HyperSeed: an endto-end method to process hyperspectral images of seeds. Sensors. 2021;21(24):8184. https://doi.org/10.3390/s21248184
20. Glagoleva A.Y., Shmakov N.A., Shoeva O.Y., Vasiliev G.V., Shatskaya N.V., Börner A., Afonnikov D.A., Khlestkina E.K. Metabolic pathways and genes identified by RNA-seq analysis of barley nearisogenic lines differing by allelic state of the Black lemma and pericarp (Blp) gene. BMC Plant Biol. 2017;17(Suppl. 1):182. https://doi.org/10.1186/s12870-017-1124-1
21. Glagoleva A.Y., Novokreschyonov L.A., Shoeva O.Y., Kovaleva O.N., Khlestkina E.K. Studying grain color diversity in the barley collection of VIR. Trudy po Prikladnoy Botanike, Genetike i Selektsii = Proceedings on Applied Botany, Genetics, and Breeding. 2022; 183(3):76-84. https://doi.org/10.30901/2227-8834-2022-3-76-84 (in Russian)]
22. Gowen A.A., O’Donnell C.P., Cullen P.J., Downey G., Frias J.M. Hyperspectral imaging - an emerging process analytical tool for food quality and safety control. Trends Food Sci. Technol. 2007;18(12): 590-598. https://doi.org/10.1016/j.tifs.2007.06.001
23. Greenwood P.E., Nikulin M.S. A Guide to Chi-Squared Testing. New York: Wiley, 1996;196-202
24. Hancock J.T., Khoshgoftaar T.M. CatBoost for big data: an interdisciplinary review. J. Big Data. 2020;7(1):94. https://doi.org/10.1186/s40537-020-00369-8
25. Hao J., Ho T.K. Machine learning made easy: a review of Scikit-learn package in python programming language. J. Educ. Behav. Stat. 2019;44(3):348-361. https://doi.org/10.3102/1076998619832248
26. He X., Feng X., Sun D., Liu F., Bao Y., He Y. Rapid and nondestructive measurement of rice seed vitality of different years using nearinfrared hyperspectral imaging. Molecules. 2019;24(12):2227. https://doi.org/10.3390/molecules24122227
27. Ho T.K. Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition. 1995;1: 278-282. https://doi.org/10.1109/ICDAR.1995.598994
28. Howse J. OpenCV Computer Vision with Python. Birmingham: Packt Publishing, 2013
29. Jin B., Qi H., Jia L., Tang Q., Gao L., Li Z., Zhao G. Determination of viability and vigor of naturally-aged rice seeds using hyperspectral imaging with machine learning. Infrared Phys. Technol. 2022; 122:104097. https://doi.org/10.1016/j.infrared.2022.104097
30. Jolliffe I.T. Principal component analysis for special types of data. In: Principal Component Analysis. Springer Series in Statistics. New York, NY: Springer, 2002;338-372. https://doi.org/10.1007/0-387-22440-8_13
31. Kandpal L.M., Lohumi S., Kim M.S., Kang J.S., Cho B.K. Nearinfrared hyperspectral imaging system coupled with multivariate methods to predict viability and vigor in muskmelon seeds. Sens. Actuators B. 2016;229:534-544. https://doi.org/10.1016/j.snb.2016.02.015
32. Khlestkina E.K. Current applications of wheat and wheat-alien precise genetic stocks. Mol. Breed. 2014;34(2):273-281. https://doi.org/10.1007/s11032-014-0049-8
33. Komyshev E.G., Genaev M.A., Busov I.D., Kozhekin M.V., Artemenko N.V., Glagoleva A.Y., Koval V.S., Afonnikov D.A. Determination of the melanin and anthocyanin content in barley grains by digital image analysis using machine learning methods. Vavilovskii Zhurnal Genetiki i Selektsii = Vavilov Journal of Genetics and Breeding. 2023;27(7):859-868. https://doi.org/10.18699/VJGB-23-99 (in Russian)]
34. Krupnov V.A. Genetic complexity and context specificity of traits improving wheat yield under drought conditions. Vavilovskii Zhurnal Genetiki i Selektsii = Vavilov Journal of Genetics and Breeding. 2013;17(3):524-534 (in Russian)]
35. Lachman J., Martinek P., Kotíková Z., Orsák M., Šulc M. Genetics and chemistry of pigments in wheat grain. A review. J. Cereal Sci. 2017;74:145-154. https://doi.org/10.1016/j.jcs.2017.02.007
36. Liu C., Huang W., Yang G., Wang Q., Li J., Chen L. Determination of starch content in single kernel using near-infrared hyperspectral images from two sides of corn seeds. Infrared Phys. Technol. 2020; 110:103462. https://doi.org/10.1016/j.infrared.2020.103462
37. Lu Y., Young S., Linder E., Whipker B., Suchoff D. Hyperspectral imaging with machine learning to differentiate cultivars, growth stages, flowers, and leaves of industrial hemp (Cannabis sativa L.). Front. Plant Sci. 2022;12:810113. https://doi.org/10.3389/fpls.2021.810113
38. Ma T., Tsuchikawa S., Inagaki T. Rapid and non-destructive seed viability prediction using near-infrared hyperspectral imaging coupled with a deep learning approach. Comput. Electron. Agric. 2020;177: 105683. https://doi.org/10.1016/j.compag.2020.105683
39. Machálková L., Janečková M., Hřivna L., Dostálová Y., Hernandez K., Joany L., Mrkvicová E., Vyhnánek T., Trojan V. Impact of added colored wheat bran on bread quality. Acta Univ. Agric. Silvic. Mendelianae Brun. 2017;65(1):99-104. https://doi.org/10.11118/actaun201765010099
40. Matzrafi M., Herrmann I., Nansen C., Kliper T., Zait Y., Ignat T., Siso D., Rubin B., Karnieli A., Eizenberg H. Hyperspectral technologies for assessing seed germination and trifloxysulfuron-methyl response in Amaranthus palmeri (Palmer amaranth). Front. Plant Sci. 2017;8:474. https://doi.org/10.3389/fpls.2017.00474
41. McInnes L., Healy J., Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. ArXiv. 2018;1802. 03426. https://doi.org/10.48550/arXiv.1802.03426
42. Mo C., Kim G., Lee K., Kim M.S., Cho B.K., Lim J., Kang S. Nondestructive quality evaluation of pepper (Capsicum annuum L.) seeds using LED-induced hyperspectral reflectance imaging. Sensors. 2014;14(4):7489-7504. https://doi.org/10.3390/s140407489
43. Norman R.D., Harry S. Applied Regression Analysis. Williams, 2007
44. Nunez-Iglesias J., Van der Walt S., Dashnow H. Elegant SciPy: The Art of Scientific Python. Sebastopol, CA: O’Reilly Media, 2017
45. Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V., Gulin A. CatBoost: unbiased boosting with categorical features. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2018; 6639-6649
46. Qin J., Chao K., Kim M.S., Lu R., Burks T.F. Hyperspectral and multispectral imaging for evaluating food safety and quality. J. Food Eng. 2013;118(2):157-171. https://doi.org/10.1016/j.jfoodeng.2013.04.001
47. Reddy P., Panozzo J., Guthridge K.M., Spangenberg G.C., Rochfort S.J. Single seed near-infrared hyperspectral imaging for classification of perennial ryegrass seed. Sensors. 2023;23(4):1820. https://doi.org/10.3390/s23041820
48. Savitzky A., Golay M.J. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964;36(8):1627- 1639. https://doi.org/10.1021/ac60214a047
49. Serrano L., Filella I., Penuelas J. Remote sensing of biomass and yield of winter wheat under different nitrogen supplies. Crop Sci. 2000; 40(3):723-731. https://doi.org/10.2135/cropsci2000.403723x
50. Shoeva O.Yu., Strygina K.V., Khlestkina E.K. Genes determining the synthesis of flavonoid and melanin pigments in barley. Vavilovskii Zhurnal Genetiki i Selektsii = Vavilov Journal of Genetics and Breeding. 2018;22(3):333-342. https://doi.org/18699/VJ18.369 (in Russian)]
51. Stančin I., Jović A. An overview and comparison of free Python libraries for data mining and big data analysis. In: 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE, 2019;977-982. https://doi.org/10.23919/MIPRO.2019.8757088
52. Wakholi C., Kandpal L.M., Lee H., Bae H., Park E., Kim M.S., Mo C., Lee W.H., Cho B.K. Rapid assessment of corn seed viability using short wave infrared line-scan hyperspectral imaging and chemometrics. Sens. Actuators B. 2018;255:498-507. https://doi.org/10.1016/j.snb.2017.08.036
53. Weber V.S., Araus J.L., Cairns J.E., Sanchez C., Melchinger A.E., Orsini E. Prediction of grain yield using reflectance spectra of canopy and leaves in maize plants grown under different water regimes. Field Crops Res. 2012;128:82-90. https://doi.org/1016/j.fcr.2011.12.016
54. Wilcoxon F. Individual comparisons by ranking methods. In: Kotz S., Johnson N.L. (Eds.). Breakthroughs in Statistics. Springer Series in Statistics. New York, NY: Springer, 1992;196-202. https://doi.org/10.1007/978-1-4612-4380-9_16
55. Yang G., Wang Q., Liu C., Wang X., Fan S., Huang W. Rapid and visual detection of the main chemical compositions in maize seeds based on Raman hyperspectral imaging. Spectrochim. Acta A. Mol. Biomol. Spectrosc. 2018;200:186-194. https://doi.org/10.1016/j.saa.2018.04.026
56. Yoosefzadeh-Najafabadi M., Earl H.J., Tulpan D., Sulik J., Eskandari M. Application of machine learning algorithms in plant breeding: predicting yield from hyperspectral reflectance in soybean. Front. Plant Sci. 2021;11:624273. https://doi.org/10.3389/fpls.2020.624273
57. Zahavi A., Palshin A., Liyanage D.C., Tamre M. Influence of illumination sources on hyperspectral imaging. In: 20th International Conference on Research and Education in Mechatronics (REM). Wels, Austria, 2019;1-5. https://doi.org/10.1109/REM.2019.8744086
58. Zhang X., He Y. Rapid estimation of seed yield using hyperspectral images of oilseed rape leaves. Ind. Crops Prod. 2013;42:416-420. https://doi.org/10.1016/j.indcrop.2012.06.021
59. Zhang T., Wei W., Zhao B., Wang R., Li M., Yang L., Wang J., Sun Q. A reliable methodology for determining seed viability by using hyperspectral data from two sides of wheat seeds. Sensors. 2018; 18(3):813. https://doi.org/10.3390/s18030813
60. Zhu F., Qiao X., Zhang Y., Jiang J. Analysis and mitigation of illumination influences on canopy close-range hyperspectral imaging for the in situ detection of chlorophyll distribution of basil crops. Comput. Electron. Agric. 2024;217:108553. https://doi.org/10.1016/j.compag.2023.108553
61. Žibrat U., Susič N., Knapič M., Širca S., Strajnar P., Razinger J., Von- čina A., Urek G., Stare B.G. Pipeline for imaging, extraction, preprocessing, and processing of time-series hyperspectral data for discriminating drought stress origin in tomatoes. MethodsX. 2019; 6:399-408. https://doi.org/10.1016/j.mex.2019.02.022
62. Zou Z., Chen J., Wu W., Luo J., Long T., Wu Q., Wang Q., Zhen J., Zhao Y., Wang Y., Chen Y., Zhou M., Xu L. Detection of peanut seed vigor based on hyperspectral imaging and chemometrics. Front. Plant Sci. 2023;14:1127108. https://doi.org/10.3389/fpls.2023.1127108