syl

CS549 - Computer Vision - Fall 2006, Wednesdays from 6pm to 9pm

Prof. Matthew Ward
FL134, 831-5671, matt@cs.wpi.edu
Office Hours: Monday and Thursday at 11, Tuesday and Friday at 2:00 others by appointment

Textbook: Computer Vision, by L. Shapiro and G. Stockman (Prentice Hall)

Additional Resources: All documents for the course will be made available via this web page. I will also put other books on computer vision on reserve in the library, in case you are interested in alternate presentations of a given topic.

Overview: Computer Vision is the study of the theory and practice of extracting knowledge from digital images. It is sometimes also referred to as Image Understanding. It draws on concepts and techniques from several fields, including computer graphics, image processing, pattern recognition, and artificial intelligence, and indeed prior exposure to any of these other fields is beneficial to anyone interested in computer vision.

In this course we will study and, in some cases, implement a number of algorithms for extracting features from images and matching them to 2-D and 3-D models. This, in a sense, is the inverse of what we do in a course on 3-D graphics, where we start with the model and try to render a realistic scene. Thus those of you who have taken CS543 or its equivalent will recognize some of the math and data structures in this course.

Projects: Project due dates are given in the schedule. I've tried to leave adequate time to finish the projects, but you should be certain to not delay starting them. Late penalties will be assessed unless you get permission (for good cause) at least one week in advance to turn your project in late. If you find you are having difficulties figuring out how to get started on a project, please see me ASAP. The projects are described below.

Exams: There will be two exams given for this course. The first exam will be held on the ninth week of class (the actual date may vary, due to snow or illness-based cancellations) and will be worth 40% of your exam grade. The second will be held on the last class and count for the remaining 60%. For each exam you will be permitted one sheet of 8.5'' x 11'' paper for notes. If you do poorly on the first exam and much better on the second exam, I will count the second exam more. The converse, however, is not true.

Grading: You MUST obtain a passing average on both the exams and the projects in order to pass the course, which generally means obtaining a grade of 65 or more after scaling. Your projects can contribute plus or minus one grade to your final grade. Thus if you receive a B average on the exams, your project grade could elevate this to an A or drop it to a C (or leave it as a B). Note that you could pass the exams and fail the course if you do very poorly on the projects.

Facilities: You can use whatever computer you have at your disposal, as long as your projects can be demonstrated on a machine on campus. You will need to collect a number of images for your projects. For some, you might be able to find suitable images on the web. For others, you will need to capture images of different objects in different orientations, locations, and lighting conditions. The ATC has digital cameras that you can borrow if you don't have one at your disposal. You may need to convert the images from one format to another, depending on how they are captured and what platform you are using for your projects. ImageMagick (http://www.imagemagick.org/script/index.php) is freeware code for converting between formats that runs on most platforms. It is already running on the linux machines in the CS department, though that might not be the most recent release.

Software Resources: Projects must be implemented as stand-alone executable programs, not just calls to MatLab or other such environments. For C++, I suggest the CImg Library (http://cimg.sourceforge.net/). It runs on Linux, Windows, and Macs, and comes with many sample programs that can be used as templates for class projects. For Java, the Image Class supports an extensive selection of methods, including reading, displaying, and performing pixel operations. In both of these cases, I expect students to implement their own image operators rather than just searching for an existing module that does all the work. Some of the projects would be too easy otherwise! If you wish to do your projects in a different language or environment from the ones listed above, please clear it with me first. For example, you could certainly use the ImageMagick libraries to do your basic reading and writing of images and do all the other processing yourself. It contains bindings to more than 10 different programming languages.

Schedule:

September 6: Introduction, Image Formation and Representation
Readings: Ch. 1 and 2

September 13: Binary Image Analysis
Readings: Ch. 3

September 20: Intro to Pattern Recognition
Readings: Ch. 4
Project 1 due

September 27: Image Processing
Readings: Ch. 5

October 4: Color and Shading
Readings: Ch. 6

October 11: Texture and Motion
Readings: Ch. 7 and 9
Project 2 due

October 18: Image Segmentation
Readings: Ch. 10

October 25: 2-D Matching
Readings: Ch. 11

November 1: Exam 1 (Prof. Ward is away)
Project 3 due

November 8: 3-D from 2-D Images
Readings: Ch. 12

November 15: 3-D Transformations and Reconstruction
Readings: Ch. 13

November 29: 3-D Models and Matching
Readings: Ch. 14
Project 4 due

December 6: Case Studies
Readings: Ch. 16

December 13: Exam 2
Project 5 due

Project Details:

Project 1 - Getting Started:

The majority of projects for the course will involve developing algorithms for the recognition of a particular object (supplied by me or proposed by you) from a variety of views. As a starting point, in this project you will create a library of images of your object from different orientations, with different lighting conditions, and with different backgrounds (try 3 of each, for a total of 27 images). Once you create the images, write a function to generate binary views of your images using different thresholds. Experiment with different levels to find one or more values that results in the best separation of your object from the background. You may decide to go back and test other lighting configurations or backgrounds to improve your results. Write a brief summary of which configurations seem to work best and worst. Create a web page with your gallery of images, the thresholded binary images, and your summary. For this assignment, you can just e-mail me the URL for this web page.

Project 2 - Convolution and Edge Detection:

Many operations in image processing can be accomplished via convolution, whereby each pixel in the original image is replaced by a weighted sum of its neighboring pixels. The weightings are specified by a grid of values, and different weightings result in different effects on the original image. Smoothing and blurring, for example, can be accomplished by either a uniform or Gaussian-shaped weight pattern, as long as the sum of the weights is 1. Boundary or edge detection can be performed using patterns of positive and negative weights, so that regions of similar pixel values will result in low values after convolution and regions of transition will produce large positive or negative results.

For this assignment, you will implement your own smoothing and edge detection operators and test them on images (both binary and grey-scale/color) in your gallery. Clearly, for the binary images, smoothing will result in values between black and white, and edge detection should find complete boundaries around each region. For grey-scale or color images, the results may be not as satisfying, especially with edge detection. You can choose any of the smoothing and edge detection convolution patterns found in the book or discussed in class. If your software environment already has such operations built-in, you can use these to verify that your code works, but I expect you to implement the convolutions on your own. Once you have completed the implementation and testing, run the operators on the images in your gallery. Write a brief summary of your tests, identifying the configurations that you believe produced the best results. You may want to generate some new images or try a couple different edge detection operators to get the best results. You can put your results on a web page and send me the URL. Also, please e-mail me your source code for this project (the subject line should give your name followed by "Project 2").

Project 3 - Image Segmentation:

For most non-trivial computer vision tasks, an image must be divided up, or "segmented", into disjoint regions. Object recognition can then be based on the boundaries of the regions, the shape, color, or texture of regions, or inter-region relationships, such as "contains" or "is adjacent to". We saw in Project 1 that thresholding is one rather simplistic method for separating objects from the background, and in Project 2 we used edge detection to identify the location and orientation of likely boundaries between objects or parts of objects. The goal of this project is to extract more useful representations of the objects in the scene so that we can perform recognition in Project 4.

There are several general strategies to segmentation, and you may need to implement more than one to find a method that gives you good results for your object. Histogram-based methods look for good colors or intensities at which to separate your object into groups. These methods, while somewhat easy to implement, often suffer in situations with uneven lighting, shadows, or textured surfaces. Boundary tracking methods start with the strongest edges (highest contrast) and try to follow the boundary until the starting point is encounterred again. The problems with these methods include false edges and edges wider than one pixel. Region growing techniques start with seed locations and merge pixels into groups based on similarities in color or texture. These techniques often need to be post-processed with splitting or merging operations to reduce large regions enveloping several adjacent object parts as well as small regions that are really part of a single surface patch. In all cases, the result of segmentation should be an array that assigns each pixel to a particular segment. Run your resulting algorithm on the images in your gallery (not the binary ones!), and summarize what combination of algorithm and image configurations seem to give the most consistent results. Please submit results as in Project 2.

Project 4 - Object Recognition:

We've now reached the point where we can try to implement a system to recognize a particular object. This may be based on edge models, corner/junction models, polygon/shape models, surface (3-D) models, structured light, or any of the other processes we've covered in class or in the readings. You should test your program using several views of your object as well as views of other objects. You should choose other objects that share some characteristics with your object to make it a "fair" test. It is OK for the system to make mistakes, both in failing to recognize your object and also in erroneously classifying other objects as your object. In your summary, you should describe your positive and negative results, and in the latter case, describe how you might improve your recognition rate given more time on the project. Indicate the limitations you believe exist in your program in terms of the robustness under different viewing and lighting conditions and in the viewing of similar objects. Please submit results as in Projects 2 and 3.

Project 5 - Research Paper:

Computer Vision is a broad and active research area. For this project (which can be done at any time during the semester), you should read, summarize, and compare three articles by different authors on a specific topic in computer vision that have been published in refereed journals or conferences in the past 5 years. You should choose a topic that is of interest to you; if you are wondering if a particular topic is acceptable, just ask me. Your resulting paper should be 5-10 pages in length, and can include figures from the articles (properly cited, of course) to help a reader better understand the concepts and techniques described in the papers. Some good sources of papers are provided below and are accessible via the library, either in physical or electronic form.

Sources for Articles on Computer Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
Computer Vision and Image Understanding
International Journal of Computer Vision
Image and Vision Computing
Pattern Recognition
IEEE Conference on Computer Vision and Pattern Recognition
International Conference on Pattern Recognition
International Conference on Computer Vision
http://www.cs.cmu.edu/~cil/vision.html
http://homepages.inf.ed.ac.uk/rbf/CVonline/

About this document ...

Next: About this document ...

Matthew Ward 2006-08-17