- Pichler, Andreas
- Roth, Peter M.
- Sablatnig, Robert
- Stübl, Gernot
- Vincze, Markus
- TitelProceedings of the Joint ARW & OAGM Workshop 2019
- May 9-10, 2019, University of Applied Sciences Upper Austria, Campus Steyr, Steyr, Austria
- LicenceCC BY
Preface10.3217/978-3-85125-663-5-00 Action Selection for Interactive Object Segmentation in Clutter10.3217/978-3-85125-663-5-01Robots operating in human environments are often required to recognise, grasp and manipulate objects. Identifying the locations of objects amongst their complex surroundings is therefore an important capability. However, when environments are unstructured and cluttered, as is typical for indoor human environments, reliable and accurate object segmentation is not always possible because the scene representation is often incomplete or ambiguous. We overcome the limitations of static object segmentation by enabling a robot to directly interact with the scene with non-prehensile actions. Our method does not rely on object models to infer object existence. Rather, interaction induces scene motion and this provides an additional clue for associating observed parts to the same object. We use a probabilistic segmentation framework in order to identify segmentation uncertainty. This uncertainty is then used to guide a robot while it manipulates the scene. Our probabilistic segmentation approach recursively updates the segmentation given the motion cues and the segmentation is monitored during interaction, thus providing online feedback. Experiments performed with RGB-D data show that the additional source of information from motion enables more certain object segmentation that was otherwise ambiguous. We then show that our interaction approach based on segmentation uncertainty maintains higher quality segmentation than competing methods with increasing clutter. Intuitive Human Machine Interaction based on Multimodal Command Fusion and Interpretation10.3217/978-3-85125-663-5-02The drastic transition from mass production to mass customization and small lot-sizes in production industry, requires intuitive interaction, programming and setup approaches for machinery and robotics in order to reduce setting-up time or adaption effort. Multimodal data fusion and analysis is considered as a potential enabling technology to achieve intuitive human machine interaction. Our work focuses on robust interpretation of commands, issued by a human actor, which are combined of single attributes created from different multimodal channels. The presented approach is demonstrated using an example of human robot interaction, where the user interacts with the robot to setup a robotic process sequence. Using High-Level Features and Neglecting Low-Level Features: Application Self-Localization10.3217/978-3-85125-663-5-03Common self-localization algorithms as well as trajectory algorithms for autonomous vehicles rely on low-level features such as laser readings. Identifying higher-level features or objects increases the system quality, but often contradicts the sensor noise model used, especially in the case of dynamic features such as doors, humans or other vehicles. A laser scan of an open door, for example, can look like the end of a corridor, leading to false data association between a map and features. The novelty in this work is that features which belong to the category of dynamic objects are only used as high-level features and are removed from the low-level feature pool. In the work presented here, rgb cameras as well as a laser scanner readings are used to detect doors and to estimate their opening angle. These dynamic features are used as landmarks for selflocalization, and the corresponding laser scan readings are ignored by particle weighting. The resulting method is currently a work in progress, and preliminary results are shown. The software developed for this paper is publicly available and integrated into the open-source mobile robot programming toolkit (MRPT). Safety of Mobile Robot Systems in Industrial Applications10.3217/978-3-85125-663-5-04The fourth industrial revolution is in full swing and according to ”BCC Research” a compound annual growth rate of 23.1% will be expected on the global market for the period of 2018 to 2023. Leading new technologies as mobile robotics and manipulator systems will facilitate more flexible and efficient production processes. Unfortunately, mentioned in the latest ”Statista” report, the complexity of mobile robotic systems and missing standards are one of the major obstacles for a broad rollout of mobile robot systems. This paper presents a selection of what is already possible in the field of mobile robots and mobile manipulation systems and gives an outlook on current and upcoming leading edge developments. We focus on the requirements of the industry and addresses the related barriers concerning the design and implementation of safe applications. As a result, we propose best practice, recommendations and first concepts to overcome the discussed challenges in implementation. Independent Offset Wheel Steering: An Improved Approach to Wheeled Mobile Robot Locomotion10.3217/978-3-85125-663-5-05In this paper, a new wheel configuration for mobile robot locomotion called IWOS (Independent Wheel Offset Steering) is presented. This approach offers quasiomnidirectionality, collision detection and mitigation, expressive navigation capabilities with a simple mechanical design. First, an overall study of popular wheel designs and configurations is provided and then a detailed explanation of IWOS as well as it’s distinct advantages are given. A proof of concept is shown using the physics simulation (GazeboSim) simulating various scenarios. An Autonomous Mobile Handling Robot Using Object Recognition10.3217/978-3-85125-663-5-06Due to the trend away from mass production to highly customized goods, there is a great demand for versatile robots in the manufacturing industry. Classic fixedprogrammed industrial robots and rail-bound transport vehicles, which are restricted to transporting standardized boxes, do not offer enough flexibility for modern factories. Machine learning methods and 3D vision can give manipulators the ability to perceive and understand the environment and therefore enable them to perform object manipulation tasks. State of the art grasp-detection methods rely on data with cumbersome annotated grasp-poses, while labelled data for object recognition only is easier to gather. This work describes the development of an automatic transport robot using a sensitive manipulator and 3D vision for autonomous transport of objects. This mobile manipulator is able to drive flexible paths, localize predefined objects and grasp them using an out-of-the-box neural network for object detection and hand-crafted methods for extracting grasp-points from depth images to avoid cumbersome grasp-point-annotated training data. Furthermore, this paper discusses problems occurring when a neural network trained on human-captured photos is applied to robot-view images. Dynamic parameter identification of the Universal Robots UR510.3217/978-3-85125-663-5-07In this paper, methodology for parameter identification of an industrial serial robot manipulator is shown. The presented methodology relies on the fact that any mechanical system can be written in form linear with respect to some set of parameters. Based on experimental measurements done on the Universal Robots UR5, the presented methodology is applied and the dynamical parameters of the robot are determined in two ways. First by use of the Moore-Penrose pseudoinverse, and then by use of optimization. At the end, the ability of the determined parameters to predict measurements other than the ones used for the identification is shown. Machine Vision for Embedded Devices: from Synthetic Object Detection to Pyramidal Stereo Matching10.3217/978-3-85125-663-5-08In this work we present an embedded and allin-one system for machine vision in industrial settings. This system enhances the capabilities of an industrial robot providing vision and perception, e.g. deep learning based object detection and 3D reconstruction by mean of efficient and highly scalable stereo matching. To this purpose we implemented and tested innovative solutions for object detection based on synthetically trained deep networks and a novel approach for depth estimation that embeds traditional 3D stereo matching within a pyramidal framework in order to reduce the computation time. Both object detection and 3D stereo matching have been efficiently implemented on the embedded device. Results and performance of the implementations are given for publicly available datasets, in particular the T-Less dataset for textureless object detection, Kitti Stereo and Middlebury Stereo datasets for depth estimation. Towards a flexible industrial robot system architecture10.3217/978-3-85125-663-5-09The present work deals with the recording, transmission and presentation of sensor data, which is transmitted by different sensors mounted on or in mobile robots. Complex, heterogeneous, modular robot systems require manufacturerand user-independent standardized interfaces based on open communication standards and information models to enable interoperability and integration. Cross-system communication and data retrieval from different devices of different manufacturers is complicated by proprietary application programming interfaces (APIs). It is virtually impossible to exchange modules with devices from alternative manufacturers, which makes it difficult to integrate devices that meet the requirements. The OPC-UA communication interface is a platform-independent standard and is widely used in robotics and automation technology to connect compatible devices with different interfaces. In this paper we present the concept and implementation of a standardized communication interface for data exchange and visualization with ROS-based robot systems. Flexible industrial mobile manipulation: a software perspective10.3217/978-3-85125-663-5-10With ongoing research in robotics, some specific architectural approaches of robotic systems earn more and more interest by all kinds of industries. Mobile manipulators –robots consisting of a mobile base and a serial manipulator– provide the ability to make robotic manipulation locationindependent, which will be an essential feature in future production. Such robot platforms offer a high level of flexibility and efficiency of robot applications. Especially under the aspect of modularity, mobile manipulators would provide even more flexibility by offering the possibility to exchange or extend the robot hardware for specific applications. To achieve this, modularity also has to be considered in software. In this paper, we present a software architecture for modular mobile manipulation applications. It provides mechanism for reconfigurability, easy programming, and an easy approach for adding external hardware components. Being targeted at industrial use, the architecture also considers security and software deployment aspects. These considerations will, in combination with all the other aspects, be presented by means of two modular mobile manipulation platforms and a set of representative scenarios. HolisafeHRC: Holistic Safety Concepts in Human-Robot Collaboration10.3217/978-3-85125-663-5-11The success of human-robot collaboration (HRC) systems is currently facing problems related to unsolved issues in terms of safety. Standards have been established that provide a framework for implementation of such systems, but the actual safety assessment is still very difficult due to the overall complexity of HRC systems. This creates barriers for potential users and system integrators, which is a limiting factor in terms of industrial exploitation. The HolisafeMRK project addresses the safety issues in HRC and aims to develop a method for risk assessment analysis. This paper presents an overview of HolisafeMRK, a methodology for risk assessment analysis, and intermediate results. RNN-based Human Pose Prediction for Human-Robot Interaction10.3217/978-3-85125-663-5-12In human-robot collaborative scenarios human workers operate alongside and with robots to jointly perform allocated tasks within a shared work environment. One of the basic requirements in these scenarios is to ensure safety. This can be significantly improved when the robot is able to predict and prevent potential hazards, like imminent collisions. In this paper, we apply a recurrent neural network (RNN) to model and learn human joint positions and movements in order to predict their future trajectories. Existing human motion prediction techniques have been explored in a pseudo scenario to predict human motions during task execution. Building upon previous work, we examined their applicability to our own recorded dataset, representing a more industrial-oriented scenario. We used one second of motion data to predict one second ahead. For better performance we modified the existing architecture by introducing a different output-layer, as opposed to common structures in recurrent neuronal networks. Finally, we evaluated the artificial neuronal network performance by providing absolute positional errors. Using our method we were able to predict joint motion over a one second period with less than 10 cm mean error. Adaptive Loading Station for High-Mix Production Systems10.3217/978-3-85125-663-5-13This paper presents a loading station for highmix production systems in a production shift without factory workers. A UR10e is used to load and unload a pallet, fitted on an Autonomous Guided Vehicle (AGV). The robot is supported by the software XRob and cameras for visual detection of the raw parts as well as the raw part trays. Additionally the cameras are used to correct the position of the AGV. The process itself is orchestrated by the workflow engine centurio.work. Workflow-based programming of human-robot interaction for collaborative assembly stations10.3217/978-3-85125-663-5-14In certain domains manual assembly of products is still a key success factor considering quality and flexibility. Especially when thinking of flexibility traditional, fully automated assembly using specialized robot stations is mostly not feasible for small lot sizes due to high costs for programming and mechanical adaptations. In the last years collaborative robots (cobots) entered the market to broaden the way for robotassisted manual assembly. The idea was to use the robot for small repetitive tasks at the manual assembly station and keep the human factor for dealing with flexibility. Unfortunately most of the new cobots came with the same programming system as their ancient relatives. Thinking of human-robot collaboration these traditional approaches do not consider the human factor at the assembly station. Therefore, this paper presents a new approach, called Human Robot Time and Motion (HRTM) providing a modeling language providing generic basic elements which can be performed by a human worker or a robot. Correspondingly a workflow-oriented programming model and a prototypical development environment featuring BPMN and MQTT is presented. A Dynamical System for Governing Continuous, Sequential and Reactive Behaviors10.3217/978-3-85125-663-5-15In interaction with humans or movable objects, robots not only need to react to surprising information quickly, but they also need to synchronize their motions with the world, which can be done by introducing decision points (discrete state transitions), or by continuously adjusting the execution velocity. We present a novel dynamical system based on stable heteroclinic channel networks that can represent static, markovian states as well as continuous transitions between states in a compact and consistent state vector. This so-called phasestate machine can implement regular state machine semantics, but it additionally has the built-in capability to provide and adjust phases and blend consecutive movement primitives for smooth operation. In this paper, we investigate the dynamic properties, present examples for programming specific state machine semantics, and demonstrate the sequencing and mixing of continuous movement primitives. Multilingual Speech Control for ROS-driven Robots10.3217/978-3-85125-663-5-16To improve the collaboration of humans and robots, a multilingual speech (MLS) control was created, which allows to manage multiple ROS-based robots at any time. Object Grasping in Non-metric Space Using Decoupled Direct Visual Servoing10.3217/978-3-85125-663-5-17In this paper we present a robotic system for grasping novel objects. Using a low-cost camera mounted on the end-effector, our system utilizes visual servoing control to command the gripper to a grasp position that is prescribed during a teach-in phase when the object is presented to the system. By using decoupled direct visual servoing, an intensitybased approach, object grasping is done without any 3D input and requires no metric information about the object. Although the robot moves in the 3D Euclidean space and is controlled in the joint space, the command signal is derived completely from pixel information from the input image in the 2D projective space. Furthermore, the control strategy is extended for trajectory following in the control error space to generate smoother and more stable trajectories. This enables more direct and accurate positioning of the end-effector. A set of experiments is performed with a 7 DoF KUKA LWR IV robotic arm and shows the capability of precisely grasping objects from cluttered scenes. The system also shows robustness to object movement during the grasping process as well as robustness against errors in the camera calibration. Carl Friedrich TUK a Social Companion Robot10.3217/978-3-85125-663-5-18With improvements in electronics and mechanics, robots have become more compact as well as more space and energy efficient. Hence, they are now a more integral part of our everyday lives. Thanks to Artificial Intelligence (AI) they are on the verge of entering our social lives too. Following this trend, Technische Universit¨at Kiwi (TUK) is a family of social robots developed and under further development at the Institute of Computer Technology at TU Wien. The project deals with the design and creation of a companion robot. The main purpose of this work is to realize a relatable robot which can eventually serve in therapeutic applications, in particular for the children on the autism spectrum. To this end, the companion robot should be able to interact with the user and express emotions. The goal of the companion robot is to create a safe environment by serving as a safety blanket, in particular where other aids such as therapeutic pets cannot be used. Ultimately, we hope that by collecting helpful data, the companion robot can contribute to the therapy procedures as well as improvement of daily life interactions with family and friends. In this paper, we present Carl Friedrich, the first of TUK family. A case study on working with a commercial mobile robotics platform10.3217/978-3-85125-663-5-19During a period of roughly one and a half years, the authors had the opportunity to work with a commercial mobile robotics platform, namely the “Apollo” platform by Slamtec, to implement real-world use cases in the domain of interactive entertainment installations. During the course of this development, the authors have gained insight into the strengths and limitations of the platform in realistic scenarios. This work will present the use cases, and discuss the experiences with and insights into working with the Apollo platform. Traffic cone based self-localization on a 1:10 race car10.3217/978-3-85125-663-5-20This document describes a feature-based selflocalization running on-board on an automated 1:10 race car. Traffic cones are detected using an Intel RealSense depth camera and integrated into the self-localization as landmarks. The work presents a novel approach for how to detect traffic cones by fusing depth and RGB data. Motion commands and sensor readings are combined with an Extended Kalman Filter (EKF). The performance of the overall system was evaluated using an external motion capture system. Computational Performance of the Forward and Inverse Kinematics of an Anthropomorphic Robot Arm10.3217/978-3-85125-663-5-21Robot motions are described commonly in the task space and the joint space of the robot. The forward and inverse kinematics are the two mappings between these spaces and are extensively utilized for control and path planning algorithms. Thus, computational performance is a key property for real-time implementations. This work investigates the computational performance of the forward and inverse kinematics of the KUKA LWR IV+, a 7-degree-of-freedom anthropomorphic robot arm, which were formulated using homogeneous coordinates and dual quaternions. All algorithms were evaluated with respect to the number of floating point operations and the computation time. Significant performance improvements could be revealed using code optimization of modern computer algebra software. The results show, that the optimized forward kinematics implementations perform almost equally. For inverse kinematics, the formulation in homogeneous coordinates performs about 70% better. Robbie – A tele-operated robot with autonomous capabilities for EnRicH-2019 robotics trial10.3217/978-3-85125-663-5-22In public emergencies such as nuclear accidents or natural disasters, an immediate and accurate overview as well as an assessment of the area is the basis of all coordinated plans and actions for the rescue team. The persistent lack of such information leads to high risks and casualties for rescue workers. Mobile robots help to minimize risks and support the rescue teams with urgent information, as well as with debris clearing and search and rescue operations. This work discusses the necessities and requirements of mobile robots in search and rescue (S&R) applications such as an nuclear disaster. Further it describes the current hardware setup as well as the software architecture of the mobile robot Robbie of UAS Technikum Wien. General Robot-Camera Synchronization Based on Reprojection Error Minimization10.3217/978-3-85125-663-5-23This paper describes a synchronization method to estimate the time offset between a robot arm and a camera mounted on the robot (i.e., robot-camera synchronization) based on reprojection error minimization. In this method, we detect a calibration pattern (e.g., checkerboard) from camera images while projecting the pattern onto the image space with robot hand poses and forward kinematics. Then, we estimate the delay of the camera data by finding the robot-camera time offset which minimizes the reprojection error between the visually detected and the projected patterns. Since the proposed method does not rely on any camera-specific algorithms, it can be easily applied to any new camera models, such as RGB, infrared, and X-ray cameras, by changing only the projection model. Through experiments on a real system, we confirmed that the proposed method shows a good synchronization accuracy and contributes to the accuracy of a continuous scan data mapping task. Control of Autonomous Mobile Robot using Voice Command10.3217/978-3-85125-663-5-24Controlling the machine by voice or speech has always aroused the curiosity of humans. After many pieces of research and developments, the voice recognition system becomes an important and comfortable system to communicate with machines in day today’s life. In this paper, a voice control software system is created and integrated with the mapping algorithm available in Robotic Operating Systems (ROS) and implemented with a mobile robot Volksbot [Figure 1]. This paper also expresses the development of a Graphical User Interface (GUI) with different tabs to control the system in different ways, first by clicking voice command to navigate the robot to its destination in the available map, and second by typing the command in words or numbers. If the commands are mistaken, then the user can abort the commands by clicking stop tabs. In order to test the voice system accuracy, the experiment is performed with different voices as well as with different pitches. This work also shows the results of the accuracy of reaching the destination’s room in the map. Kinematics of steering a car10.3217/978-3-85125-663-5-25This paper presents an analysis of the kinematic manipulability of the human arm while steering a car. The human arm is modeled as 7-axis robot and a specialized measure of manipulability for the problem at hand is introduced. The analysis of different steering scenarios shows that optimal manipulability yields handling scenarios that are intuitive to a human operator. Furthermore the shoulder joint position is optimized to find the optimal seat position. Evaluation of Human-Robot Collaboration Using Gaze based Situation Awareness in Real-time10.3217/978-3-85125-663-5-26Human attention processes play a major role in the optimization of human-robot collaboration (HRC) systems. This work describes a novel framework to assess the human factors state of the operator primarily by gaze and in real-time. The objective is to derive parameters that determine information about situation awareness which represents a central concept in the evaluation of interaction strategies in collaboration. The control of attention provides measures of human executive functions that enable to characterize key features in the collaboration domain. Comprehensive experiments on HRC were conducted with typical tasks including collaborative pick-and-place in a lab based prototypical manufacturing environment. The methodology measures executive functions and situation awareness (SART) in the HRC task in real-time for human factors based performance optimization in HRC applications. 3D Pose Estimation from Color Images without Manual Annotations10.3217/978-3-85125-663-5-273D pose estimation is an important problem with many potential applications. However, 3D acquiring annotations for color images is a difficult task. To create training data, the annotating is usually done with the help of markers or a robotic system, which in both cases is very cumbersome, expensive, or sometimes even impossible, especially from color images. Another option is to use synthetic images for training. However, synthetic images do not resemble real images exactly. To bridge this domain gap, Generative Adversarial Networks or transfer learning techniques can be used but, they require some annotated real images to learn the domain transfer. We propose a novel approach to overcome these problems. On the Use of Artificially Degraded Manuscripts for Quality Assessment of Readability Enhancement Methods10.3217/978-3-85125-663-5-28This paper reviews an approach to assess the quality of methods for readability enhancement in multispectral images of degraded manuscripts. The idea of comparing processed images of artificially degraded manuscript pages to images that were taken before their degradation in order to evaluate the quality of digital restoration is fairly recent and little researched. We put the approach into a theoretical framework and conduct experiments on an existing dataset, thereby reproducing and extending the results described in the first publications on the approach. visIvis - Evaluation of Vision based Visibility Measurement10.3217/978-3-85125-663-5-29Reliable and exact assessment of visibility is es2 sential for safe air traffic or other critical infrastructure. In order to overcome the drawbacks of the currently subjective reports from human observers, we present ”visIvis”, an innovative solution to automatically derive visibility measures from standard cameras by a vision-based approach. Efficient Multi-Task Learning of Semantic Segmentation and Disparity Estimation10.3217/978-3-85125-663-5-30We propose a jointly trainable model for semantic segmentation and disparity map estimation. In this work we utilize the fact that the two tasks have complementary strength and weaknesses. Depth prediction is especially accurate at object edges and corners, while for semantic segmentation large, homogeneous regions are easier to capture. We propose a CNN7 based architecture, where both tasks are tightly interconnected to each other. The model consists of an encoding stage which computes features for both tasks, semantic segmentation and disparity estimation. In the decoding stage we explicitly add the semantic predictions to the disparity decoding branch and we additionally allow to exchange information in the intermediate feature representations. Furthermore, we set the focus on efficiency, which we achieve by the usage of previously introduced ESP building blocks. We evaluate the model on the commonly used KITTI dataset. 6D Object Pose Verification via Confidence-based Monte Carlo Tree Search and Constrained Physics Simulation10.3217/978-3-85125-663-5-31Precise object pose estimation is required for robots to manipulate objects in their environment. However, the quality of object pose estimation deteriorates in cluttered scenes due to occlusions and detection errors. The estimates only partially fit the observed scene, or are physically implausible. As a result, robotic grasps based on these poses may be unsuccess7 ful and derived scene descriptions may be unintelligible for a human observer. We propose a hypotheses verification approach that detects such outliers while, at the same time, enforces physical plausibility. On one hand, this is achieved by a tight coupling of hypotheses generation with the verification stage to guide the search for a solution. On the other hand, we integrate a constrained physics simulation into the verification stage to constantly enforce physical plausibility. By constraining the simulated objects to the most confident point correspondences, we prevent the estimated poses from erroneously diverging from the initial predictions. We thereby generate a plausible description of the observed scene. We evaluate our method on the LINEMOD and YCB-VIDEO datasets, and achieve state-of-the-art performance. Quantile Filters for Multivariate Images10.3217/978-3-85125-663-5-32Median filtering is known as a simple and robust procedure for denoising and aggregation of data. Its generalisation to arbitrary quantiles is straightforward, yielding a class of robust (rank-order) filters for univariate data. Motivated by earlier work from image processing on generalisations of median filtering to multivariate images, we study in this paper possible quantile filtering procedures for multivariate images. Discussions of multivariate quantile generalisations in the statistics literature suggest that the position parameter of a multivariate quantile should not be chosen from an interval as in the univariate case but from a unit ball in data space. This allows to derive multivariate quantile definitions from multivariate median concepts. We investigate quantile counterparts of several multivariate medians and explore their properties under the aspect of possible use as robust image filters. Motion Artefact Compensation for Multi-Line Scan Imaging10.3217/978-3-85125-663-5-33This work focuses on the compensation of transport synchronization artefacts that may occur during multiline scan acquisitions. We reduce these motion artefacts by a warping function that stretches/squeezes line frames in the scanning domain that were acquired too early/late. The estimation of the warping function is controlled by comparing light field views and enforce uniform spacing between line acquisitions. This approach enables multi-line scan systems to perform multi-line scan light field imaging largely independentfrom the transport and trigger quality A Research Project on Forensic Footwear Impression Retrieval10.3217/978-3-85125-663-5-34Footwear impressions are a valuable source of evidence for criminal investigations. By comparing them, forensic experts can show that a footwear impression was made by a specific shoe or impressions at different crime scenes were made by the same suspect. However, this process is very cumbersome and the current software solution used by the Austrian Police uses an annotation based search that is very subjective and therefore not accurate enough. Therefore, the goal of this project is a system that helps searching through databases with thousands of footwear impression images by automatically computing image similarities. Machine Vision Solution for a Turnout Tamping Assistance System10.3217/978-3-85125-663-5-35Safe and comfortable train travel is only possible on tracks that are in the correct geometric position. For this reason, so-called tamping machines are used worldwide to perform this important task of track maintenance. Turnout-tamping is a complex procedure to improve and stabilize the track situation in turnout-areas, which is usually only carried out by experienced operators. This application paper describes the current state of development of a 3D laser line scanner-based sensor system for a new tamping assistance system, which should support and relieve the operator in complex tamping areas. In this context, semantic segmentation (based on deep learning algorithms) is used to fully automatically identify essential and critical areas in the generated 3D depth images and process them for subsequent machine control. Longitudinal Finger Rotation in Vein Recognition - Deformation Detection and Correction10.3217/978-3-85125-663-5-36Finger vein biometrics is becoming more and more popular. However, longitudinal finger rotation, which can easily occur in practical applications, causes severe problems as the resulting vein structure is deformed in a non-linear way. These problems will become even more important in the future, as finger vein scanners are evolving towards contact-less acquisition. This paper provides a systematic evaluation regarding the influence of longitudinal rotation on the performance of finger vein recognition systems and the degree to which the deformations can be corrected. It presents two novel approaches to correct the longitudinal rotation, one based on the known rotation angle. The second one compensates the rotational deformation by applying a rotation correction in both directions using a pre-defined angle combined with score level fusion and works without any knowledge of the actual rotation angle. During the experiments, the aforementioned approaches and two additional are applied: one correcting the deformations based on an analysis of the geometric shape of the finger and the second one applying an elliptic pattern normalization of the region of interest. The experimental results confirm the negative impact of longitudinal rotation on the recognition performance and prove that its correction noticeably improves the performance again. Learning from the Truth: Fully Automatic Ground Truth Generation for Training of Medical Deep Learning Networks10.3217/978-3-85125-663-5-37Automatic medical image analysis has become an invaluable tool in the different treatment stages of diseases. Especially medical image segmentation plays a vital role, since segmentation is often the initial step in an image analysis pipeline. Convolutional neural networks (CNNs) have rapidly become a state of the art method for many medical image analysis tasks, such as segmentation. However, in the medical domain, the use of CNNs is limited by a major bottleneck: the lack of training data sets for supervised learning. Although millions of medical images have been collected in clinical routine, relevant annotations for those images are hard to acquire. Generally, annotations are created (semi-)manually by experts on a slice-by-slice basis, which is time consuming and tedious. Therefore, available annotated data sets are often too small for deep learning techniques. To overcome these problems, we proposed a novel method to automatically generate ground truth annotations by exploiting positron emission tomography (PET) data acquired simultaneously with computed tomography (CT) scans in combined PET/CT systems. PRNU-based Finger Vein Sensor Identification in the Presence of Presentation Attack Data10.3217/978-3-85125-663-5-38We examine the effectiveness of the Photo Response Non-Uniformity (PRNU) in the context of sensor identification for finger vein imagery. Experiments are conducted on eight publicly-available finger vein datasets. We apply a Wiener Filter (WF) in the frequency domain to enhance the quality of PRNU estimation and noise residual, respectively, and we use two metrics to rank PRNU similarity, i.e. Peak-to-Energy (PCE) and Normalized Cross Correlation (NCC). In the experiments, we include a dataset consisting of both real finger vein data and captured artifacts produced to assess presentation attacks. We investigate the impact of this situation on sensor identification accuracy and also try to discriminate spoofed images from non-spoof images varying decision thresholds. Results of sensor identification for finger vein imagery is encouraging, the obtained scores for classification accuracies are between 97% to 98% for different settings. Interestingly, selecting particular decision thresholds, it is also possible to discriminate real data from artificial data as used in presentation attacks. GMM Interpolation for Blood Cell Cluster Alignment in Childhood Leukaemia10.3217/978-3-85125-663-5-39The accurate quantification of cancer (blast) and non-cancer cells in childhood leukaemia (blood cancer) is a key component in assessing the treatment response and to guide patient specific therapy. For this classification task, cell specific biomarker expression levels are estimated by using flowcytometry measurements of multiple features of single blood cells. For the automated distinction between blasts and non-blasts a main challenge are data shifts and variations in the high-dimensional dataspace caused by instrumental drifts, inter patient variability, treatment response and different machines. In this work we present a novel alignment scheme for stable (non-cancer) cell populations in flowcytometry using Gaussian Mixture Models (GMM) as data representation format for the cell clusters' probability density function and a Wasserstein interpolation scheme on the manifold of GMM. The evaluation is performed using a dataset of 116 patients with acute lymphoblastic leukaemia at treatment day 15. Classification results show an improved normalization performance using Wasserstein metric compared to two other metrics with a mean sensitivity of 0.97 and mean f-score of 0.95. Detecting Out-of-Distribution Traffic Signs10.3217/978-3-85125-663-5-40This work addresses the problem of novel traffic sign detection, i.e. detecting new traffic sign classes during test-time, which were not seen by the classifier during training. This problem is especially relevant for the development of autonomous vehicles, as these vehicles operate in an open-ended environment. Due to which, the vehicle will always come across a traffic sign that it has never seen before. These new traffic signs need to be immediately identified so that they can be used later for re-training the vehicle. However, detecting these novel traffic signs becomes an extremely difficult task, as there is no mechanism to identify from the output of the classifier whether it has seen a given test sample before or not. To address this issue, we pose the novel traffic-sign detection problem as an out-of-distribution detection problem. We apply several state-of-the-art out-of-distribution detection methods and also establish a benchmark on the novel traffic-sign detection problem using the German Traffic Sign Recognition dataset. In our evaluation, we show that state-of-the-art out-of-distribution detection methods, achieve an AUROC score of 93.9% and a detection error of 9.2%. The Quest for the Golden Activation Function10.3217/978-3-85125-663-5-41Deep Neural Networks have been shown to be beneficial for a variety of tasks, in particular allowing for end-to-end learning and reducing the requirement for manual design decisions. However, still many parameters have to be chosen in advance, also raising the need to optimize them. Moreover, since increasingly more complex and deeper networks are of interest, strategies are required to make neural network training efficient and stable. While initialization and normalization techniques are well studied, a relevant and important factor is often neglected: the selection of a proper activation function (AF). We tackled this problem and learned task-specific activation functions. For that purpose, we take two main observations into account. First, the positive and negative parts of activation functions have a different influence on the information propagation. Second, the search space is very huge and hard to explore. Thus, motivated by evolution theory we introduced an approach to evolving piece-wise activation functions building on the ideas of Genetic Algorithms. Combining Deep Learning and Variational Level Sets for Segmentation of Buildings10.3217/978-3-85125-663-5-42The larger context behind this work is the automated visual assessment of building characteristics (e.g. building age and condition) for the estimation of real estate prices from outdoor pictures. A basic requirement to this end is the automated segmentation of buildings from photos, which is the focus of this work. We propose a combined deep-learned and variational segmentation method for the extraction of the building area from real estate images and demonstrate its capabilities on a novel dataset for building segmentation. Automatic Intrinsics and Extrinsics Projector Calibration withEmbedded Light Sensors10.3217/978-3-85125-663-5-43We propose a novel projector calibration method based on embedded light sensors. Our method can be used to determine intrinsics and extrinsics of one or multiple projectors without relying on an additional camera. We show that our method is highly accurate and more than 17 times faster than state of the art methods. This renders our method suitable for spatial augmented reality applications in the industrial domain. The Coarse-to-Fine Contour-based Multimodal Image Registration10.3217/978-3-85125-663-5-44Image registration brings two images into alignment despite any initial misalignment. Several approaches to image registration make extensive use of local image information extracted in interest points, known as local image descriptors. State-of-the-art methods perform a statistical analysis of the gradient information around the interest points. However, one of the challenges in image registration by using these local image descriptors arises for multimodal images taken from different imaging devices and/or modalities. In many applications such as medical image registration, the relation between the gray values of multimodal images is complex and a functional dependency is generally missing. This paper focuses on registering Mass spectrometry images to microscopic images based on contour features. To achieve more accurate multimodal image registration performance, we proposed a coarse-to-fine image registration framework. The pre-registration process is performed by using contour-based corners and curvature similarity between corners. Image blocking and DEPAC descriptors are used in the fine registration process. A local adaptive matching is performed for the final registration step. Evaluation Study on Semantic Object Labelling in Street Scenes10.3217/978-3-85125-663-5-45We present a processing pipeline for semantic scene labelling that was developed in view of autonomous driving applications. Our study focuses on two different methods for feature selection - Texture-layout-filter (TLF) and Single Histogram Class Models (SHCM) - whose influence on the performance of a random forest classifier is investigated. In tests on the Cityscapes dataset, we assess the effects of parameter variation and observe an improvement of the average class performance score by 16 percent when substituting the TLF by the computationally more demanding SHCM feature. Towards Object Detection and Pose Estimation in Clutter using only Synthetic Depth Data for Training10.3217/978-3-85125-663-5-46Object pose estimation is an important problem in robotics because it supports scene understanding and enables subsequent grasping and manipulation. Many methods, including modern deep learning approaches, exploit known object models, however, in industry these are difficult and expensive to obtain. 3D CAD models, on the other hand, are often readily available. Consequently, training a deep architecture for pose estimation exclusively from CAD models leads to a considerable decrease of the data creation effort. While this has been shown to work well for feature- and template-based approaches, real-world data is still required for pose estimation in a clutter. We use synthetically created depth data with domain-relevant background and randomized augmentation to train an end-to-end, multi-task network in order to detect and estimate poses of texture-less objects in cluttered real-world depth images. We present experiments and ablation studies on the architectural design choices and data representation with the LineMOD dataset. A Two-Stage Classifier for Collagen in Electron Tomography Images using a Convolutional Neural Network and TV Segmentation10.3217/978-3-85125-663-5-47We present an easily realizable practical strategy for the segmentation of tissue types in microscopy images of biological tissue. The strategy bases on a convolutional neural network (CNN) classifier that requires a low amount of manually-labeled data. Spatial regularity of the segmented images is enforced by a total variation (TV) regularization approach. The proposed strategy is applied to and tested on collagen segmentation in electron tomography image stacks. Semantic Image Segmentation using Convolutional Neural Nets for Lawn Mower Robots10.3217/978-3-85125-663-5-48Within a few years, semantic image segmentation has become a key task in image processing. This rapid progress already allows a paradigm shift in many areas with regard to the solution approach of many problems. Thus, it is obvious to use semantic image segmentation for autonomous lawnmowers. The resulting benefits are good orientation abilities in previously unseen environment, optimal path planning and the reduction of danger to humans and animals. This thesis deals with the comparison of different network architectures for semantic segmentation with respect to their suitability for use in autonomous lawn mowers. Sufficient segmentation accuracy and real-time capability are used as criteria for this.