Starting from:

$30

Cognitive Robotics-Open-Ended Learning Approaches for 3D Object Recognition Solved

 
Assignment overview
 Three-dimensional (3D) object recognition is a technique for identifying objects in images or point clouds. The goal of such techniques is to teach a computer to gain a level of understanding of what an image contains. We can use a variety of machine learning or deep learning approaches for object recognition. In this assignment, you will work with two sets of popular approaches: the hand-crafted methods and the deep transfer learning approach. Figure 1 shows the abstract architecture of these approaches.

Cognitive science revealed that humans learn to recognize

object categories ceaselessly over time. This ability allows Figure 1: Abstract architecture of (top) hand-crafted and (bottom) them to adapt to new environments, by enhancing their deep learning techniques for object recognition. knowledge from the accumulation of experiences and the conceptualization of new object categories. Taking this theory as an inspiration, we seek to create an interactive object recognition system that can learn 3D object categories in an open-ended fashion. In this project, “open-ended” implies that the set of categories to be learned is not known in advance. The training instances are extracted from on-line experiences of a robot, and thus become gradually available over time, rather than being completely available at the beginning of the learning process.

In this assignment, students have to optimize an open-ended learning approach for 3D object recognition and get familiar with the basic functionalities of ROS. We break this assignment down into two parts:

1.     The first part is about optimizing offline 3D object recognition systems, which take an object view as input and produces the category label as output (e.g., apple, mug, fork, etc).

2.    The second part of this assignment is dedicated to testing object recognition approaches in an openended fashion. In this assignment, the number of

categories is not pre-defined in advance and the         Figure 2:lated teacher and the learning agent.Abstract architecture for interaction between the simuknowledge of agent/robot is increasing over time by interacting with a simulated teacher using three actions: teach, ask, and correct (see Fig. 2).

Further details of these assignments are explained in the following sections. To make your life easier, we provide a virtual machine that has all the necessary programs, codes, dataset, libraries, and packages. We also offer template codes for each assignment.

If you are not familiar with the concept of ROS, please follow the beginner level of ROS Tutorials. For all student, going over all basic beginner level tutorials is strongly recommended.

 Z I recommend installing MATLAB on your machine since the output of experiments are automatically visualized in MATLAB. You can download it from download portal or use an online version provided by the university. As an alternative, we also provide a python script to visualize the generated MATLAB plots automatically.

Part I: Offline 3D object recognition setting (50%)
In this assignment, we assume that an object has already been segmented from the scene and we want to recognize its label. We intent to use an instance-based learning (IBL) approach to form new categories. From a general perspective, IBL approaches can be viewed as a combination of an object representation approach, a similarity measure, and a classification rule. Therefore, we represent an object category by storing the representation of objects’ views of the category. Furthermore, the choice of the object representation and similarity measure have impacts on the recognition performance as shown in Fig. 3.

 

Figure 3: The components used in a 3D object recognition system.

In the case of the similarity measure, since the object representation module represents an object as a normalized histogram, the dissimilarity between two histograms can be computed by different distance functions. In this assignment, you need to select 5 out of 14 distance functions that are dissimilar from each other. This policy will increase the chance that different functions lead to different results. The following 14 functions have been implemented and exist in the RACE framework:

 Euclidean, Manhattan, χ2, Pearson, Neyman, Canberra, KL divergence, symmetric KL divergence, Motyka, Cosine, Dice, Bhattacharyya, Gower, and Sorensen.

Z For the mathematical equations of these functions, we refer the reader to a comprehensive survey on distance/similarity measures provided by S. Cha (1).

The main intuition behind using instance-based learning in this study is that, IBL serves as a baseline approach for evaluating the object representations used in object recognition. More advance approaches, e.g., SVM-based and Bayesian learning, can be easily adapted.

To examine the performance of an object recognition, we provide a K-fold cross-validation procedure. K-fold crossvalidation is one of the most widely used methods for estimating the generalization performance of a learning algorithm. In this evaluation protocol, K folds are randomly created by dividing the dataset into K equal-sized subsets, where each subset contains examples from all the categories. In each iteration, a single fold is used for testing, and the remaining nine folds are used as training data. For K-fold cross-validation, we set K to 10, as is generally recommended in the literature. This type of evaluation is useful not only for parameter tuning but also for comparing the performance of your method with other approaches described in the literature.

L Your tasks for this part
For this assignment, students will work partly individual and partly in groups of two. Each student needs to optimize one hand-crafted and one deep learning based 3D object recognition algorithm. Therefore, each group will have four set of results. The students will need to write up the report together by discussing the selected approaches and comparing the obtained results in terms of instance accuracy (accmicro = #true predictions #predictions ), average class accuracy

 , and computation time. Note that you need to report average class accuracy to address class

imbalance, since instance accuracy is sensitive to class imbalance. You can think about the following groups:

•    (a) Hand-crafted object representation + IBL approach + K-NN:

–   list of available descriptors: [GOOD, ESF, VFH, GRSD]

–   distance functions as mentioned above

–   K ∈ [1, 3, 5, 7, 9 ]

•    (b) Deep transfer learning based object representation + IBL + K-NN

–   list of available network architectures: [mobileNet, mobileNetV2, vgg16_fc1, vgg16_fc2, vgg19_fc1, vgg19_fc2, xception, resnet50, denseNet121, denseNet169, densenet201, nasnetLarge, nasnetMobile, inception, inceptionResnet]

–   list of available element-wise pooling: [AVG, MAX, APP (append)]

–   distance functions as mentioned above,

–   K ∈ [1, 3, 5, 7, 9 ]

In this assignment, we use a small-scaled RGB-D dataset to evaluate the performance of diffident configurations of each approach. In particular, we use Restaurant RGB-D Object Dataset, which has a small number of classes with significant intra-class variation. Therefore, it is a suitable dataset for performing extensive sets of experiments to tuning the parameters of each approach.

L What we offer for this part
•    A detail instruction about how to run each of the experiments

•    A ROS-based cpp code for 10 fold-cross validation: we have implemented a set of object representationapproaches and different distance functions for object recognition purpose. You need to study each approach in depth and optimize its parameters.

•    A ROS-based cpp code for K-fold-cross validation with various deep learning architectures as object representa-tion and a set of distance functions for object recognition purpose. You need to study each approach in depth and optimize its parameters.

•    Sample bash scripts for running a bunch of experiments based on GOOD descriptor (hand-crafted), and Mo-bileNetV2 architecture (deep transfer learning), find them in rug_kfold_cross_validation/result folder.

•-A python script to visualize the confusion matrix as the output. Runp PATH_TO_EXP_DIR/ --offline to visualize the confusion matrix. You can use [-h] to see the instruction.python3 matlab_plots_parser.py

L How to run the experiments
We created a launch file for each of the mentioned object recognition Algorithms. A Launch file provides a convenient way to start up the roscore, and multiple nodes and set the parameters’ value (read more about launch file here). Before running an experiment, check the following:

                             • Yourug_kfold_cross_validation/launch/kfold_cross_validation.launchhave    to    update   the   value            of                                                                                          different  parameters   of   the   system   in   the  launch)   file    (e.g.,

Z You can also set the value of a parameter when you launch an experiment using the following

 command: $ roslaunch package_name launch_file.launch parameter:=value This

option is useful for running a bunch of experiments using a bash/python script

 Z The system configuration is reported at the beginning of the report file of the experiment. Therefore, you can use it as a way to debug/double-check the system’s parameters.

  For the hand-crafted based object recognition approaches:

After adjusting all necessary parameters in the launch file, you can run an experiment using the following command:

$ roslaunch rug_kfold_cross_validation kfold_cross_validation_hand_crafted_descriptor.launch   For the deep transfer learning based object representation approaches:

After adjusting all necessary parameters in the launch file, you need to open three terminals and use the following commands to run a deep transfer learning based object recognition experiment:

í MobileNetV2 Architecture

$ roscore

$ rosrun rug_deep_feature_extraction multi_view_RGBD_object_representation.py mobileNetV2

$ roslaunch rug_kfold_cross_validation kfold_cross_validation_RGBD_deep_learning_descriptor.launch ortho graphic_image_resolution:=150 base_network:=mobileNetV2 K_for_KNN:=3 name_of_approach:=TEST í VGG16 Architecture

$ roscore

$ rosrun rug_deep_feature_extraction multi_view_RGBD_object_representation.py vgg16_fc1

$ roslaunch rug_kfold_cross_validation kfold_cross_validation_RGBD_deep_learning_descriptor.launch ortho graphic_image_resolution:=150 base_network:=vgg16_fc1 K_for_KNN:=3 name_of_approach:=TEST

L What are the outputs of each experiment
• Results of an experiment, including a detail summary, and a confusion matrix (see Fig. 5 and 4), will be saved in:

$HOME/student_ws/rug_kfold_cross_validation/result/experiment_1/

 After each experiment, you need to either rename the experiment_1 folder or move it to another folder, otherwise its contents will be replaced by the results of a new experiment. • We also report a summary of a bunch of experiments in a txt file in the following path (see Fig. 6):

rug_kfold_cross_validation/result/results_of_name_of_approach_experiments.txt

 

Figure 4: Confusion matrices showing how well each model performed in object recognition task on restaurant object dataset. In each cell of a confusion matrix, we present the percentage and the absolute number of predictions. The darker diagonal cell shows the better prediction by the models.

 

Figure 5: A detailed summary of an experiment: the system configuration is specified at the beginning of the file. A summary of the experiment is subsequently reported. Objects that are incorrectly classified are highlighted by double dash-line, e.g., No. 9.

 

Figure 6: A summary of a bunch of experiments for the GOOD descriptor with diffident K and various distance functions: in these experiments, we trained all data first. We then saved the perceptual memory to be used in other experiments.



Part II: Test your approaches in a systematic open-ended scenario (50%)
The off-line evaluation methodologies are not well suited to evaluate open-ended learning systems, because they do not abide by the simultaneous nature of learning and recognition and also those methodologies imply that the set of categories must be predefined. We, therefore, adopted a teaching protocol designed for experimental evaluation in open-ended learning.

Algorithm 1       Teaching protocol for performance evaluation
 
1: Introduce Category1

2: nrepeat← 1 3:

4:       nIntroduce← n+1 Categoryn
. Ready for the next category
      kcrepeat←← 10

8:
. question / correction iteration
9:                     Present a previously unseen instance of Categoryc

10:                     Ask the category of this instance

 If needed, provide correct feedback

    until (cks (←←←s >success in lastk+1(c == nτ and k) >=? n1 :)k question/correction iterationsc+1
. accuracy threshold crossed
or (user sees no improvement in success))
. breakpoint reached
17: until (user sees no improvement in success)
. breakpoint reached
The idea is to emulate the interactions of a recognition system with the surrounding environment over long periods of time in a single context scenario (office, kitchen, etc.). The teacher follows a teaching protocol and interacts with the learning agent using three basic actions:

•    Teachcategory to the agent;: used for introducing a new object

•    Askcategory of a given object view;: used to ask the agent what is the

• Correctfeedback in case of misclassification.: used for providing corrective
Teaching protocol determines which examples are used for training the algorithm, and which are used to test the algorithm

(see Algorithm 1). The protocol can be followed by a human teacher. However, replacing a human teacher with a simulated one makes it possible to conduct systematic, consistent and reproducible experiments for different approaches. It allows the possibility to perform multiple experiments and explore different experimental conditions in a fraction of time a human would take to carry out the same task. We, therefore, developed a simulated_teacher to follow the protocol and autonomously interact with the system. For this purpose, the simulated_teacher is connected to a large database of labeled object views. The complete process is summarized in Algorithm 1 and the overall system architecture is depicted in Fig. 7.

The idea is that the simulated_teacher repeatedly picks unseen object views from the currently known categories and presents them to the agent for testing. Inside the learning agent, the object view is recorded in the Perceptual Memory if it is marked as a training sample (i.e. whenever the teacher uses teach or correct instructions), otherwise it is sent to the Object Recognition module. The simulated_teacher continuously estimates the recognition performance of the agent using a sliding window of size 3n iterations, where n is the number of categories that have already been introduced. If k, the number of iterations since the last time a new category was introduced, is less than 3n, all results are used. In case this performance exceeds a given classification threshold (τ = 0.67, meaning accuracy is at least twice the error rate), the teacher introduces a new object category by presenting three randomly selected objects’ views. In this way, the agent begins with zero knowledge and the training instances become gradually available according to the teaching protocol.

  Breakpoint: In case the agent can not reach the classification threshold after a certain number of iterations (i.e. 100 iterations), the simulated teacher can infer that the agent is no longer able to learn more categories

 

Figure 7: Interaction between the simulated teacher and the learning agent; (left) The simulated teacher is connected to a large object dataset and interacts with the agent by teach, ask and correct actions as shown by blue color; (right) In case of ask action, the agent is evaluated by a never-seen-before object. The agent recognizes the object and sends back the result to the simulated user; In the cases of teach and correct actions, the agent creates a new category model or updates the model of the respective category.

and therefore, terminates the experiment. It is possible that the agent learns all existing categories before reaching the breaking point. In such a case, it is not possible to continue the protocol, and the experiment is halted. In your report, this should be shown by the stopping condition, “lack of data”.

   Dataset: In this experiment, one of the largest available 3D object datasets, namely Washington RGB-D Object Dataset is used. It consists of 250,000 views of 300 objects and the objects are categorized into 51 categories. Figure 8 shows some example objects from the dataset. We have provided a short version of the dataset that each category only has 200 instances. We have already included the short version of the dataset (3.3 GB) into the virtual Figure 8: Sample point clouds of objects in Washmachine ($HOME/datasets/). Both versions are available online: ington RGB-D Object Dataset. short-version (3 GB) full-version (70 GB)

L Your tasks for this part
•    Based on the obtained results in the previous part (10-fold cross-validation experiments),the best system configuration for both hand-crafted and deep transfer learning approaches (i.e., objectselect representation + distance function). For each of the selected approaches, update the parameters of the simulated teacher in the launch files accordingly.

•    Since the order of introducing categories may have an effect on the performance of the system, you have toperform 10 experiments and report all 10 experiments plus the avg+std in a table (10 experiments for hand-crafted and 10 for deep-learning).

•    Visualize the following plots forapproaches (as an example, see Fig. 9), andthe best learning progress of hand-crafted and deep transfer learningcompare them together (for further details on evaluation metrics and plots, please check out the OrthographciNet paper, which is available in the Nestor):

–       protocol accuracy vs. #question/correction iterations (explain first 200 iterations)

–       number of learned category vs. #question/correction iterations

–       global classification accuracy vs. #question/correction iterations

–       number of stored instances per category

 Z It should be noted that, instead of having diffident plots for each approach, you can visualize all your results together using the provided python script. Such visualization is really useful to analyse and compare the approaches. More information about the python parser is available on Nestor under “Practical assignments” tab.

•    Theexample,protocol_thresholdτ = 0.67 means the recognition accuracy is at least twice better than the error.parameter, τ, defines how good the agent should learn categories.There-For fore, it can influence on all evaluation metrics. For each of the selected approaches, you need to per-

formrandom_sequence_generator:=falseonly three experiments by setting τ ∈ to have fair comparison. Finally, you need to analyse the[0.7,0.8,0.9], e.g., protocol_threshold:=0.7, and effect of τ based on the obtained results.

L What we offer for this part
•    Provide the simulated teacher code to assess the performance of your code in open-ended settings.

•    Provide a set of MATLAB/Python codes to visualize the progress of the agent (related to task #3).

$ python3 matlab_plots_parser.py -p PATH_TO_EXP_DIR/ --online

•    A bash script for running a bunch of experiments (find it out in rug_simulated_user/result folder).

L How to run the experiments
Similar to the offline evaluation, we created a launch file for hand-crafted and deep transfer learning based algorithms.

However, before running an experiment, check the following items:

• You have to update the value of different parameters of the system in the relative launch file.

 Z The system configuration is reported at the beginning of the report file of the experiment. Therefore, you can use it as a way to debug/double-check the system’s parameters.

  For hand-crafted based object representation approaches:

After setting a proper value for each of the system’s parameter, you can run an open-ended object recognition experiment using the following command:

$ roslaunch rug_simulated_user simulated_user_hand_crafted_descriptor.launch   For deep learning based object representation approaches:

Similar to the offline evaluation for deep learning based approaches, you need to open three terminals and use the following commands to run an open-ended object recognition experiment for an specific network architecture:

í MobileNetV2 Architecture

$ roscore

$ rosrun rug_deep_feature_extraction multi_view_RGBD_object_representation.py mobileNetV2 $ roslaunch rug_simulated_user simulated_user_RGBD_deep_learning_descriptor.launch ortho graphic_image_resolution:=150 base_network:=mobileNetV2 K_for_KNN:=7 name_of_approach:=TEST

í VGG16 Architecture

$ roscore

$ rosrun rug_deep_feature_extraction multi_view_object_RGBD_representation.py vgg16_fc1

$ roslaunch rug_simulated_user simulated_user_RGBD_deep_learning_descriptor.launch ortho graphic_image_resolution:=150 base_network:=vgg16_fc1 K_for_KNN:=7 name_of_approach:=TEST

 To have a fair comparison, the order of introducing categories should be same in both approaches. Therefore, we design a Boolean parameter named random_sequence_generator that can be used for this purpose. Check out the script we have provided for more details.

L What are the outputs of each experiment
• Results of an experiment, including a detail summary and a set of MATLAB files (see Fig. 9), will be saved in:

$HOME/student_ws/rug_simulated_user/result/experiment_1/

 After each experiment, you need to either rename the experiment_1 folder or move it to another folder, otherwise its contents will be replaced by the results of a new experiment. • The system also reports a summary of a bunch of experiments as a txt file in the following path :

rug_simulated_user/result/results_of_name_of_approach_experiments.txt

 Z Each time you run an experiment, the experiment results will be automatically appended to the log file. After running a bunch of 10 experiments, you have to report the content of the log file as a table in your report, compare the obtained results, and visualize the output of the best experiment for hand-crafted and deep transfer learning experiments (as an example see Fig. 9).

More products