CS549 - Computer Vision - Fall 2006, Wednesdays from 6pm to 9pm
Prof. Matthew Ward
FL134, 831-5671, matt@cs.wpi.edu
Office Hours: Monday and Thursday at 11, Tuesday and Friday at 2:00
others by appointment
Textbook: Computer Vision, by L. Shapiro and G. Stockman (Prentice Hall)
Additional Resources: All documents for the course will be made available via this web page. I will also put other books on computer vision on reserve in the library, in case you are interested in alternate presentations of a given topic.
Overview: Computer Vision is the study of the theory and practice of extracting knowledge from digital images. It is sometimes also referred to as Image Understanding. It draws on concepts and techniques from several fields, including computer graphics, image processing, pattern recognition, and artificial intelligence, and indeed prior exposure to any of these other fields is beneficial to anyone interested in computer vision.
In this course we will study and, in some cases, implement a number of algorithms for extracting features from images and matching them to 2-D and 3-D models. This, in a sense, is the inverse of what we do in a course on 3-D graphics, where we start with the model and try to render a realistic scene. Thus those of you who have taken CS543 or its equivalent will recognize some of the math and data structures in this course.
Projects: Project due dates are given in the schedule. I've tried to leave adequate time to finish the projects, but you should be certain to not delay starting them. Late penalties will be assessed unless you get permission (for good cause) at least one week in advance to turn your project in late. If you find you are having difficulties figuring out how to get started on a project, please see me ASAP. The projects are described below.
Exams: There will be two exams given for this course. The first exam will be held on the ninth week of class (the actual date may vary, due to snow or illness-based cancellations) and will be worth 40% of your exam grade. The second will be held on the last class and count for the remaining 60%. For each exam you will be permitted one sheet of 8.5'' x 11'' paper for notes. If you do poorly on the first exam and much better on the second exam, I will count the second exam more. The converse, however, is not true.
Grading: You MUST obtain a passing average on both the exams and the projects in order to pass the course, which generally means obtaining a grade of 65 or more after scaling. Your projects can contribute plus or minus one grade to your final grade. Thus if you receive a B average on the exams, your project grade could elevate this to an A or drop it to a C (or leave it as a B). Note that you could pass the exams and fail the course if you do very poorly on the projects.
Facilities: You can use whatever computer you have at your disposal, as long as your projects can be demonstrated on a machine on campus. You will need to collect a number of images for your projects. For some, you might be able to find suitable images on the web. For others, you will need to capture images of different objects in different orientations, locations, and lighting conditions. The ATC has digital cameras that you can borrow if you don't have one at your disposal. You may need to convert the images from one format to another, depending on how they are captured and what platform you are using for your projects. ImageMagick (http://www.imagemagick.org/script/index.php) is freeware code for converting between formats that runs on most platforms. It is already running on the linux machines in the CS department, though that might not be the most recent release.
Software Resources: Projects must be implemented as stand-alone executable programs, not just calls to MatLab or other such environments. For C++, I suggest the CImg Library (http://cimg.sourceforge.net/). It runs on Linux, Windows, and Macs, and comes with many sample programs that can be used as templates for class projects. For Java, the Image Class supports an extensive selection of methods, including reading, displaying, and performing pixel operations. In both of these cases, I expect students to implement their own image operators rather than just searching for an existing module that does all the work. Some of the projects would be too easy otherwise! If you wish to do your projects in a different language or environment from the ones listed above, please clear it with me first. For example, you could certainly use the ImageMagick libraries to do your basic reading and writing of images and do all the other processing yourself. It contains bindings to more than 10 different programming languages.
Schedule:
September 6: Introduction, Image Formation and Representation
Readings: Ch. 1 and 2
September 13: Binary Image Analysis
Readings: Ch. 3
September 20: Intro to Pattern Recognition
Readings: Ch. 4
Project 1 due
September 27: Image Processing
Readings: Ch. 5
October 4: Color and Shading
Readings: Ch. 6
October 11: Texture and Motion
Readings: Ch. 7 and 9
Project 2 due
October 18: Image Segmentation
Readings: Ch. 10
October 25: 2-D Matching
Readings: Ch. 11
November 1: Exam 1 (Prof. Ward is away)
Project 3 due
November 8: 3-D from 2-D Images
Readings: Ch. 12
November 15: 3-D Transformations and Reconstruction
Readings: Ch. 13
November 29: 3-D Models and Matching
Readings: Ch. 14
Project 4 due
December 6: Case Studies
Readings: Ch. 16
December 13: Exam 2
Project 5 due
Project Details:
For this assignment, you will implement your own smoothing and edge detection operators and test them on images (both binary and grey-scale/color) in your gallery. Clearly, for the binary images, smoothing will result in values between black and white, and edge detection should find complete boundaries around each region. For grey-scale or color images, the results may be not as satisfying, especially with edge detection. You can choose any of the smoothing and edge detection convolution patterns found in the book or discussed in class. If your software environment already has such operations built-in, you can use these to verify that your code works, but I expect you to implement the convolutions on your own. Once you have completed the implementation and testing, run the operators on the images in your gallery. Write a brief summary of your tests, identifying the configurations that you believe produced the best results. You may want to generate some new images or try a couple different edge detection operators to get the best results. You can put your results on a web page and send me the URL. Also, please e-mail me your source code for this project (the subject line should give your name followed by "Project 2").
There are several general strategies to segmentation, and you may need to implement more than one to find a method that gives you good results for your object. Histogram-based methods look for good colors or intensities at which to separate your object into groups. These methods, while somewhat easy to implement, often suffer in situations with uneven lighting, shadows, or textured surfaces. Boundary tracking methods start with the strongest edges (highest contrast) and try to follow the boundary until the starting point is encounterred again. The problems with these methods include false edges and edges wider than one pixel. Region growing techniques start with seed locations and merge pixels into groups based on similarities in color or texture. These techniques often need to be post-processed with splitting or merging operations to reduce large regions enveloping several adjacent object parts as well as small regions that are really part of a single surface patch. In all cases, the result of segmentation should be an array that assigns each pixel to a particular segment. Run your resulting algorithm on the images in your gallery (not the binary ones!), and summarize what combination of algorithm and image configurations seem to give the most consistent results. Please submit results as in Project 2.
Sources for Articles on Computer Vision
~
cil/vision.html