$35
CSE 573 – Computer Vision and Image Processing
Project #1
1. Edge Detection [35 points]
The goal of this task is to experiment with two commonly used edge detection operator, i.e., Prewitt operator and Sobel operator, and familiarize you with tricks, e.g., padding, commonly used by computer vision practitioners. Specifically, the task is to detect edges in a given image, which is named “proj1-task1.jpg” and is stored in “./data/”.
You are required to implement all the functions that are labelled with “# TODO” in “task1.py” and “utils.py”. In “task1.py” and “utils.py”, we not only provide hints to you, but also provide utility functions that could be used as building blocks for you to complete this task. Therefore, you only need to write about 40 lines of code. Your code should be able to generate images that are identical to those stored in “./results/” (Note that images in “./results/” are provide to you for reference only. At test time, a different image will be used as input and the correct output images will be different from those already stored in “./results/”).
Comment the lines “raise NotImplementedError” instead of deleting them, when you implement the functions labelled with “# TODO”.
2. Character Detection [65 points]
The goal of this task is to experiment with template matching algorithms. Specifically, the task is to find a specific character (or set of characters) in a given image, which is named “proj1task2.pgm” and is store the results in “./data/”. You are required to implement a function named “detect” in “task2.py”, which detects a character in an image. The function “detect” takes a given image and a given template (that you will implement) that contains a character as input and returns the coordinates (i.e., coordinates of the top-left pixel) of the character contained in the template.
This task is composed of the following three sub tasks:
• Detect character “a” (lower case “a”). [3 point]
• Detect character “b” (lower case “b”). [3 point]
• Detect character “c” (lower case “c”). [3 point]
You need to customize your own templates. The templates containing characters “a”, “b” and “c” should be named as “a.pgm”, “b.pgm” and “c.pgm” respectively. They should be stored in “./data/”. Note that only ONE template will be used to detect all instance of a particular (lowercase) character, but it can be scaled or rotated, for example, in your program.
Evaluation
The F1 measure is the harmonic mean of Precision and Recall, with Precision being the # of true positives / # one says are positive, and the recall is the # of true positive / # of positives that exist.
The FI will be used as the metric. You are not expected to get all the characters, but so you can assume that if you get an F1 measure 0.6 you will get full credit
Hints:
• You may consider designing a grayscale template that emphasizes some pixels more than others.
• You may consider combining several “exact” template in an appropriate way to be able to recognize different variations of the 3 lower case characters in question.
• Once you read in your base template your program can further manipulate it for, for example, size.
• After you have a base system, you may also experiment with templates which only contain edges for template matching (detecting edges in both the image and the template). This might give better results than using the original image and template for template matching, as edges only preserve the scales and shapes of characters, which are important for template matching. Distracting information, e.g., colors and fonts, of the characters are partially eliminated in the image and template that only contain edges. (Functions that we provide to you in “utils.py” and functions that you implement in “task1.py” could be used to detect edges.)
• PGM images are a simple raw representation of any image, and additional information can be found here: http://netpbm.sourceforge.net/doc/pgm.html
Project Guidelines
• Do not modify the code provided.
• Do not use any API provided by opencv (cv2) and numpy (np) in your code (except “np.sqrt()”, “np.zeros()”, “np.ones()”, “np.multiply()”, “np.divide()”, “cv2.imread()”, “cv2.imshow()”, “cv2.imwrite()”, and “cv2.resize()”).
• Do not import any additional libraries (function, module, etc.) except native Python packages, e.g., pdb, os, sys.
• • You will upload your project to be UBLearns and to Autograder
(https://autograder.cse.buffalo.edu/), so please try both early in the process to make sure you understand
how they work.
Ask for the TA’s approval if you would like to use a programming language other than Python, C, C++, C# and Java.