- Autor
- Bürstmayr, Hermann
- Gronauer, Andreas
- Holzinger, Andreas
- Roth, Peter M.
- Stampfer, Karl
- TitelProceedings of the OAGM Workshop 2022
- Digitalization for Smart Farming and Forestry; October 18, November 7–8, 2022; University of Natural Resources and Life Sciences, Vienna, Tulln, Austria
- Datei
- DOI10.3217/978-3-85125-954-4
- LicenceCC BY
- ISBN978-3-85125-954-4
Kapitel
PrefaceBürstmayr, Hermann; Gronauer, Andreas; Holzinger, Andreas; Roth, Peter M.; Stampfer, Karl; 10.3217/978-3-85125-954-4-00 Redundant 1-cells in Multi-labeled 2-Gmap Irregular PyramidsBanaeyan, Majid; Kropatsch, Walter G.; Hladuvka, Jiri; 10.3217/978-3-85125-954-4-01Nowadays the amount of generated digital data is growing faster and faster in a broad spectrum of application domains such as biomedical and biological imaging, document processing, remote sensing, video surveillance, etc. Processing such big data encourages efficient data structure and powerful processing algorithms. The n-dimensional generalized map is a useful structure that completely represents the topological structure of an image. Their advantages have been widely proved in the literature. Nevertheless, the main disadvantage of these structures is the high rate of memory requirement. This paper, first proposes an efficient method that implicitly encodes two of the three involutions in the 2-Gmap that dramatically reduces the amount of required memory. Second, it introduces a new formalism to define and detect redundant 1-cells (edges), in the 2-Gmap. Removing such redundant information the reduced memory is further decreased approximately by half. Finally, experiments show the advantage of the proposed method in a real database of high-resolution X-ray microtomography (μCT) and fluorescence microscopy. On the Regularising Levenberg-Marquardt Method for Blinn-Phong Photometric StereoRadow, Georg; Breuß, Michael; 10.3217/978-3-85125-954-4-02Photometric stereo refers to the process to compute the 3D shape of an object using information on illumination and reflectance from several input images from the same point of view. The most often used reflectance model is the Lambertian reflectance, however this does not include specular highlights in input images. In this paper we consider the arising non-linear optimisation problem when employing Blinn-Phong reflectance for modeling specular effects. To this end we focus on the regularising Levenberg-Marquardt scheme. We show how to derive an explicit bound that gives information on the convergence reliability of the method depending on given data, and we show how to gain experimental evidence of numerical correctness of the iteration by making use of the Scherzer condition. The theoretical investigations that are at the heart of this paper are supplemented by some tests with real-world imagery. Image Forgery Detection and Localization Using a Fully Convolutional NetworkFischinger, David; Schreiber, David; Boyer, Martin; 10.3217/978-3-85125-954-4-03To fight the growing problem of fake news – and specifically image manipulation – we propose a simple, yet efficient neural network architecture for detecting and localizing various image forgeries on a pixel-level. Robust features for forgery detection and localization were learned and the trained model performs well, even on heavily downscaled images, but without the excessive processing time of competitive approaches based on image decomposition and merging of the fragmental results. We provide detailed explanations regarding the creation of our training dataset comprising 1.9 million images. Finally, we compare the proposed solution against several state-of-the-art methods on four public benchmark datasets in order to demonstrate its superior performance. A Modular Model Combining Visual and Textual Features for Document Image ClassificationDuhan, Amer; Sablatnig, Robert; 10.3217/978-3-85125-954-4-04 26Document image classification is the classification of digitized documents. Typically, these documents are either scanned or photographed. One page of such a document is referred to as a document image. Classifying document images is a crucial task since it is an initial step in downstream applications. Most state-of-the-art document image classification models are based on a transformer network, which are pretrained on millions of scanned document images and thus require a huge amount of training resources. Additionally, this and other state-of-the-art document image classification models have well beyond 100 million parameters. In this work, we address both challenges. First, we create a model capable of competing with the current state-of-the-art models without pretraining on millions of scanned document images. Second, we create a model several times smaller than current state-of-the-art models in terms of parameters. The results show that the developed approach achieves an accuracy of 93.70% on the RVL-CDIP dataset, and a new state-of-the-art accuracy of 96.25% on Tobacco3482. Statistical shape modeling and analysis of the vestibular organ based on CT-imagesBrito, Claudia Companioni; Willenbrink, Matthias; Fritscher, Karl; Schubert, Rainer; 10.3217/978-3-85125-954-4-05The human body’s stable posture and movement are dictated by the precise functioning of the vestibular organ, mainly the ampulla organs in the semicircular canals. The development of electronic devices such as vestibular implants aims to improve the vestibular system’s capacity by stimulating the involved vestibular nerves. We aim to describe and analyze anatomical variations of the inner ear using computationally derived statistical shape models. The models should support the design process of vestibular implants. Based on a dataset of 81 cone-beam computed tomography, this work covers constructing a statistical shape model of the semicircular canals using a recently developed novel Particle-Based Modeling approach. The method optimally places correspondence points on each surface using a gradient descent energy function. Then Principal Component Analysis is used to describe anatomical variation. The model was evaluated in terms of reconstruction accuracy, compactness, generalization, and specificity. Results obtained by the workflow based on human datasets and the average shape of a statistical model revealed a high qualitative understanding and a quantitatively comparable range. The first three principal components captured 57.7% of the cumulative variation. The analysis led to 26 principal components to account for 95% of the total shape variation captured. The shape model can be used for virtual product development and testing and to estimate the detailed inner ear shape from a clinical patient computed tomography scan. For the first time, we could describe the geometry of the human semicircular canals based on a large sample of data from living humans compared with other studies. Lena KernstockStrebl, Julia; Stumpe, Eric; Baumhauer, Thomas; Kernstock, Lena; Seidl, Markus; Zeppelzauer, Matthias; 10.3217/978-3-85125-954-4-06The segmentation of plant leaves is an essential prerequisite for vision-based automated plant phenotyping applications like stress detection, measuring plant growth and detecting pests. Segmenting plant leaves is challenging due to occlusions, self-shadows, varying leaf shapes, poses and sizes and the presence of particularly fine structures. We present a novel leaf segmentation approach that takes single pixels as input to initialize the segmentation of leaves. Additionally, we introduce a new strategy for transfer learning that we call “tandem learning” which enables the integration of previously learned network representations into a structurally different network. We evaluate different configurations of our approach on publicly available data sets and show that it yields competitive segmentation results compared to more complex segmentation approaches. Towards Uncertainty Detection in Automated Leaf Tissue SegmentationGrexova, Rachel; Voggeneder, Klara; Tholen, Danny; Theroux-Rancourt, Guillaume; Kropatsch, Walter; Hladuvka, Jiri; 10.3217/978-3-85125-954-4-07In order to use segmented volumetric data for subsequent analyses, it is important to detect and understand, where the segmentation is reliable and where it is uncertain. This is especially critical in deep learning segmentation which relies on manually annotated ground truth. Especially in applications using medical and biological data, ground truth annotations are often sparse, imbalanced, and imprecise. We propose to utilize 2.5D orthogonal ensembles not only to arrive at dense segmentation but, more importantly, to indicate areas of high prediction fidelity and areas of uncertainty. Our ensemble achieved accuracy above 95% in the high fidelity areas of a volume of a poplar leaf segment. This accuracy was achieved not only for a fresh leaf sample similar to the training data, but also for a severely dehydrated sample. Well-represented classes contained large areas of high prediction fidelity and exhibited high validation metrics. By contrast, under-represented classes tend to contain large areas of uncertainty. Indication of uncertainty could be used as a basis to revise the predictions by domain experts. This is in turn expected to improve and/or enlarge the ground truth and allows for training of higher-quality segmentation models. An unsupervised, shape-based 3d cell instance segmentation method for plant tissuesPalmrich, Alexander; Voggeneder, Klara; Tholen, Danny; Theroux-Rancourt, Guillaume; Hladuvka, Jiri; Kropatsch, Walter; 10.3217/978-3-85125-954-4-08We present a segmentation method for tissue images that uses the shape of image foreground to infer the location of individual cells. The method works in arbitrary dimension and is suited for volumetric scans. It is unsupervised, but allows a user to specify parameters to correct for the presence of noise and to steer the segmentation behavior. After describing the algorithm and its limitations, we analyze its complexity (linear in voxel count) and evaluate the quality of the segmentation result by applying it to a leaf x-ray micro-tomography scan. Exploring Learning-Based Approaches for Bomb Crater Detection in Historical Aerial ImagesBurges, Marvin; Zambanini, Sebastian; Sablatnig, Robert; 10.3217/978-3-85125-954-4-09Many countries were the target of air strikes during WorldWar II. The heritage of these attacks is still present today, as numerous unexploded bombs are uncovered yearly in Central Europe. While these bombs pose a significant explosion hazard, they can be inferred from the existence of craters. Therefore, analyzing aerial images from World War II surveillance flights allows for preliminary risk estimation. In this paper, we train and evaluate 12 different object detector architectures and compare them to a crater detection algorithm on our custom historical aerial dataset. We show that modern detectors, in combination with a large enough historical aerial crater dataset, can outperform a current method for crater detection, achieve a precision of 0.6 and a recall of 0.6 on our dataset, and can process large remotely sensed images within seconds, rather than minutes. Additionally, pretraining and different dataset extensions are evaluated and discussed. Automated nuclear morphometry as a prognostic marker in canine cutaneous mast cell tumorsParlak, Eda; Haghofer, Andreas; Donovan, Taryn A.; Klopfleisch, Robert; Winkler, Stephan; Kiupel, Matti; Aubreville, Marc; Bertram, Christof A.; 10.3217/978-3-85125-954-4-10The prognosis of canine cutaneous mast cell tumors (ccMCT) is evaluated by various histologic parameters including the variability in size and shape of tumor nuclei (nuclear pleomorphism). Traditionally, nuclear pleomorphism is estimated by pathologists. However, a more precise measurement could be achieved by automated morphometry, which was investigated in this study. Eighty-six annotated images from ccMCT were used to develop a nuclear segmentation model, which yields an IoU of 0.79 on the test set. The prognostic value was determined on 96 ccMCT cases with known patient outcomes by two-fold cross-validation. Several features of nuclear size and shape were extracted from the segmentation mask and the ideal combination and thresholds of these features were determined by an XGBoost model independently for the two dataset splits. Tumorrelated death was predicted on the left-out data set part with an AUC of 0.82 and 0.86, respectively. This study shows a high prognostic value of algorithmic nuclear morphometry in ccMCT. Future studies should compare the algorithm with estimates by pathologists. Modeling the diffusion of CO2 inside leavesSauzeau, Yannis; Kropatsch, Walter; Hladuvka, Jiri; 10.3217/978-3-85125-954-4-11Propagation of fluids or gasses in closed compartments, like CO2 in green plants, is described by diffusion equation. This partial differential equation is usually solved iteratively and, especially in higher dimensions, tends to be computationally intensive. In this work, we propose to cast the n-dimensional problem to 1D diffusion. First, we apply a constrained distance transform to compute, for every voxel, its distance to the closest stoma. Second, we cast the iterative computation of CO2 concentration to the evaluation of closed-form, polynomial functions. This in turn allows us to restrict the computation of CO2 concentration to places of interest, e.g., to the close vicinity of the epidermis or cell walls where photosynthesis takes place. Novel contactless fingerprint scanner for Legal Enforcement AgenciesWeissenfeld, Axel; Voko, Erich; Strobl, Bernhard; Kohn, Bernhard; Dominguez, Gustavo Fernandez; 10.3217/978-3-85125-954-4-12Biometric recognition systems integrated into mobile devices have gained acceptance during recent years. Authorities are particular interested on mobile contactless solutions due to many reasons: officers can acquire data wherever they are, solutions are generally easy to use, hygiene and no latent data is present. This paper presents a new mobile contactless fingerprint sensor which uses a liquid lens integrated with a TOF sensor. The device was used by the national police to acquire data of refugees. Matching results show promising results, while police officers expressed their satisfaction about the developed prototype. Crop row detection utilizing spatial CNN modulesRiegler-Nurscher, Peter; Rupp, Leopold; 10.3217/978-3-85125-954-4-13Mechanical weed control is becoming increasingly important over conventional methods, not least because of environmental challenges. Precise guidance of the hoeing machine along the crop rows is necessary to be able to work efficiently. In this work, the use of deep learning methods for crop row detection is presented and evaluated on a custom data set. Recent advances in the task of vision based lane detection, like Spatial CNN (SCNN) and Recurrent Feature- Shift Aggregator (RESA), can potentially be applied to crop row detection as well. These methods are expected to improve the detection of the crop rows, especially in the case of strong weed growth and challenging environmental conditions, compared to the state of the art. A Computer Vision System for Evaluation of Field Robot OperationsKitzler, Florian; Gronauer, Andreas; Motsch, Viktoria; 10.3217/978-3-85125-954-4-14The usage of field robots is increasing as more commercial products become available on the market. Among other measures, they can be used for seeding and mechanical weed control, using the geolocation of each individual seedling. The weed control process is performed without visual recognition of the plants. The precision of such weed control robots depends on the quality of the localisation, plant emergence, and soil properties. In order to evaluate the field robot operation accuracy, we developed a costeffective, long-term autonomously working computer vision evaluation system based on two RGB cameras for pre- and post-weed control image capture. Our system was successfully tested to collect image data of the hoeing precision of a FarmDroid FD20 field robot. Vision-Language Models for Filtering and Clustering Forensic DataWeissenfeld, Axel; Strobl, Bernhard; 10.3217/978-3-85125-954-4-15With image- and video-capable devices in the hands of a majority of the population worldwide, the amount of media data keeps growing. Hence, the search of specific images and clustering of datasets is of great importance to extract the relevant information, e.g. search for a specific person by legal enforcement agencies (LEAs). This paper presents a new tool which uses vision-language models to filter and cluster forensic data. The tool provides a GUI, which enables a flexible search by accepting textual as well as image input, to search large amounts of data in near real-time. Estimation of nitrogen yield in wheat using radiative transfer model inversion based on an artificial neural networkKoppensteiner, L. J.; Neugschwandtner, R. W.; 10.3217/978-3-85125-954-4-16The objective of this study was to estimate nitrogen yield in wheat based on hyperspectral reflectance measurements with a handheld spectroradiometer. To do so, the radiative transfer model PROSAIL was inverted and an artificial neural network applied. The model was trained and tested using a simulated dataset and field experimental data. Results of the simulated dataset show that the inversion of PROSAIL based on an artificial neural network was successful. Furthermore, estimations of nitrogen yield compared to experimentally collected data feature high R2 and low RRMSE. The technique proposed in this study is a promising tool to collect information on nitrogen yield of wheat canopy in a quick and non-destructive way with low calibration requirements. This can be utilized by practical farmers for field monitoring and site-specific nitrogen fertilization as well as scientists and breeders for quick and non-destructive data collection in field experiments. Additionally, this approach can be adapted for different crops and varying sensors, e.g., multi- and hyperspectral UAV-mounted sensors as well as satellite data. Selection of YOLOX Backbone for Monitoring Sows’ Activity in Farrowing Pens with a Possibility of Temporary CratingOczak, Maciej; 10.3217/978-3-85125-954-4-17Activity monitoring of sows in farrowing pens is an important application of computer vision in Precision Livestock Farming. One example with a benefit for welfare of sows is farrowing prediction in pens with a possibility of temporary crating. In 2 experiments we tested various YOLOX backbones to estimate the generalization ability of the models on seen and unseen farrowing pens and animals. Models performed better on known pens and animals (~0.9 mAP) in comparison to unknown (~0.8 mAP). Results suggest that it is better to include some images of sows in the training set from the environment where the algorithm will be implemented. However, mAP as high as 0.8 suggests that on many farms it might be not necessary to re-train the model. Speed of inference of YOLOX models was ranging from 21 fps (YOLOX-x) to 42 fps (YOLOX-nano) on recorded videos. This should be sufficient to monitor activity level of sows in the farrowing compartment of production unit of VetFarm Medau (20 pens). Influence of Data Processing on Hyperspectral-Based Classification of Managed Permanent GrasslandMotsch, Viktoria; Britz, Roland; Gronauer, Andreas; 10.3217/978-3-85125-954-4-18The botanical composition of grassland stands can be determined using a combination of hyperspectral imaging and machine learning. Data processing before machine learning can significantly improve overall model performance. Specific preprocessing variants, such as smoothening and derivation of the spectrum, were found to be beneficial for classifying grassland species groups in detached models using hyperspectral data from permanent grassland obtained under laboratory conditions. Compared to extensively preprocessed data, raw spectral data yielded no statistically decreased performance in most cases. In Defense of Information Plane AnalysisBasirat, Mina; Geiger, Bernhard C.; Roth, Peter M.; 10.3217/978-3-85125-954-4-19In this paper, we tackle the problem of analyzing neural network training via information plane analysis. The key idea is to describe the mutual information between the input and a hidden layer and a hidden layer and the target over time. Even though this is a reasonable approach, previous works showed inconsistent or even contradicting interpretations. Since the mutual information cannot be computed analytically, the authors applied different kinds of estimators, often not describing the mutual information very well. Taking these findings into account, we want to show that despite this theoretical limitation information planes allow at least for a geometric interpretation. Thus, enabling us to analyze different aspects of neural network learning for real-world problems. Scientific Spotlight: Computer-assisted mitotic count using a deep learning-based algorithm improves interobserver reproducibility and accuracyBertram, Christof A.; Aubreville, Marc; Klopfleisch, Robert; 10.3217/978-3-85125-954-4-20 A Modern Approach for EarlyWildfire DetectionWinter, Kurt; Roth, Peter M.; 10.3217/978-3-85125-954-4-21Wildfire is a constant threat to wildlife, vegetation, and society in history. Thus, detecting such fires in an early stage is of high relevance, raising the need for automatic approaches building on visual object detection, namely to detect smoke. To this end, typically feature-based approaches have proven to work well in the past. However, the goal of this work was to evaluate whether or not modern approaches building on neural networks would be beneficial in this context. To this end, we generated a new dataset, allowing us to train and evaluate neural-network-based smoke detectors. In addition, we demonstrate that each of the approaches has benefits and shortcomings, however, also that a carefully designed fusion strategy can improve the detection results in practice.