CS180 Project 4A

In this project, we learned more about image homographies, and building mosaics of images together, similar to the "Panorama" camera setting. I think it was really exciting to learn more about how to stitch images together so that they look like a panorama, since I've always wondered how our phones can do that.

Shoot and Digitize Pictures

First, I shot pictures around campus, particularly in the CSUA, in BWW, and in my apartment. I used my iPhone 15 to take the pictures, and used Adobe to resize them. All relevant pictures are included in the images below alongside corresponding rectification/mosaic, so for ease of reading I won't include them again here. I did make sure to follow the center of projection thing mentioned in lecture, and think about taking photos through the camera's center of projection, rather than the person further back.

Recovering Homographies

I started by first computing the homography. The way I did this is by first picking points in the image (in this case, the corners of the picture on the wall) in the Correspondence tool. Then, from lecture, we had the math of how a homography multiplies by our point to get the resulting point in the other image. I set up a system of equations to solve A = HB where A is the original image points and B is the new image corresponding points -- and then used numpy linear least squares to solve it to get the homography.

Warp the Images

From there, I computed image warping -- this was similar to project 3 except that instead of using an affine transformation, I wanted to use the homography matrix that I had defined. The way I did this was first taking the corners of the image, and applying the homography onto them (warping the corners). From there, I found the minimum and maximum X and Y, and was able to then set up a translation matrix to ensure that other points when transformed were able to be seen in the resulting image. Then, I used the computed transformation applied to the corners and corresponding height and width alongside skimage polygon to get the relevant portions of the image, apply the homography onto them in the same way as last project's affine transform, and then display the output.

Image Rectification

To prove that the warping worked, we had to then do rectification -- which meant warping the images onto a rectangle. This is where selecting the four points of the rectangle in the image came into play, then applying it to rectification. So basically, I selected four points on my image that were a rectangle, then compared to 4 points on a rectangle that I manually entered as pixel values. I then warped the image onto the rectangle, to get a clear rectification. I did this for two images, and below are the results.

Original BWW wall (left) and rectification (right)

Zyrtec box (left) and rectification (right)

Blend the images into a mosaic

Then, finally, came the most exciting part of the project. Blending the images together! This step took a lot of conceptual understanding to do. The way that I set it up was as follows. Let's say I wanted to blend image1 and image2. First, I warped image1 onto image2, and took the resulting translation that the warp required. Then, I applied that translation onto image2. Then, I took the max of the sizes of the two images to get the size of the total blended campus. I then added those images together -- aligned due to using the same translation -- into the created canvas. Unfortunately, there would be an area of overlap between the two images which would look very strange. As a result, I used a two-band Laplacian pyramid blending mechanism to ease the blending in this overlap. To do this, I first computed the area of overlap by looking at pixels in the blended canvas that contained parts of both images. Then, for each image, I applied a 2D Gaussian convolution to get the low-frequency version. I subtracted the low frequency from the original image to get the high-frequency version. Then, I applied a distance transform for both images in that region of overlap. Using the distance transform, for the high-frequency parts of the images I chose the pixel that had the higher distance transform result (furthest from edge). For the low-frequency components, I computed a weighted average based on the corresponding distance transforms. Then, I added the low and high frequency together in the overlap region, to make the transition more smooth between the two images. Below are the outcomes.

Both images on either side, and the blended mosaic BWW wall in the middle.

Next to the whiteboard area, the little side table in the CSUA. Can you tell I spend a lot of time in the CSUA?

CS180 Project 4B

In Part B of the Project, we now follow the paper “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al. This allows us to do automatic blending, instead of having to rely on manually clicking on points in the images and matching them up. The coolest thing I learned from this project was how to use ANMS to pick the corners that were key to the image -- oftentimes these corresponded to the corners that I had manually chosen, and sometimes they didn't, and it was cool seeing how the algorithm decides which corner to pick based on the relative strengths of pairs of coordinates. I hadn't really thought of this kind of approach before, and it was pretty cool to implement it in order to find the corners that were more evenly spaced out that provided good corners to compare across images.

Detecting Corner Features in an Image

We start on the work to detect the corner features in the image. We use the Harris Corner code provided to generate Harris corners and display them on the BWW image like so.

The Harris Corners on the left BWW image.

The Harris Corners on the right BWW image.

Then, we use the ANMS algorithm. This algorithm allows us to condense down the number of corner points we look at. Basically, we take the distance between every pair of points using the provided dist2 function. Then, we compute for each point, the suppression radius, and keep only the points that are the maximum in the neighborhoods of a certain radius. We sort the points to look at the maximums, then return the 100 largest maximums, which then becomes our new set of corners that we use for feature extraction and matching. Here, we use a c_robust value of 0.9 and return the top 500 interest points, as shown in the paper. Choosing points that are spatially further apart on the image, which this algorithm causes, then mean that it's more likely that we will be able to do feature matching with the pairs of points and they won't be dropped, as explained in the paper.

The AMNS Corners on the right BWW image.

Extracting a Feature Descriptor for each Feature Point

Now, for each corner, we take the 40x40 region around the point, resize it to 8x8, and standardize it (subtracting the mean and dividing by the standard deviation). Then, we flatten each feature and append them all together. This gets back the extracted feature descriptors from each corner. Below are some examples of what the first six feature descriptors look like.

Matching these Feature Descriptors between Two Images

Using these feature descriptors on both images, we then implement feature matching, as described in the paper. The way that we do feature matching is by looking at the SSD between all pairs of feature descriptors from image 1 and image 2. Then, we compare the 1-NN and 2-NN distances between each pair of features. If the 1-NN distance divided by the 2-NN distance is less than a Lowe's threshold of 0.4 -- which I got from the graph in the paper -- then we consider this to be a match between images. The reason this works is because Lowe's assumes the 1-NN is a potential correct match, and 2-NN is potentially incorrect (the outlier distance) and can use the ratio to describe how likely it is for this to be a correct match. For the two images that we are visualizing here, these are the points that we get from feature matching -- ie the corners that match across features. You can look at them from left to right to see where the correspondences line up, and they seem relatively correct.

The points left after doing features matching on the images. These look generally correct (ie look at the points on the edges of the faces) but there are some that seem incorrect, for example points on the walls.

Use RANSAC to Compute a Homography

Next, we use the RANSAC algorithm. We take the pairs of matched points that we got from the previous steps. Then, we randomly select four pairs of points. For those automatically chosen points, we compute a homography using the same logic as Proj4A, and transform the images. Then, we look at the result. If the SSD distance between the transformed feature points and the target feature points is less than a threshold, which I chosen through some parameter tuning to be 1, then we keep those feature points. If the number of feature points that are "close enough" between the transformed image and the target image is greater than the number of points kept from the previous best random selection, then we use the homography matrix generated by those points for the image. We keep going and run this 100 times, and then output the best homography. There's no images to see here, since this is then directly applied to the next part.

Produce a Mosaic

Finally, we use the RANSAC algorithm to produce a Mosaic. The way that we do this is we first get the homography that we want to use from RANSAC. Then, we use that homography to warp the Image (like in Proj4A) and create the blended mosaic from that warped image -- as exactly described in Proj4A. The only difference here is that we used automatically generated points rather than manually selected ones. Then, we output our blended mosaics from Proj4B methods, side by side with the blended mosaics from Proj4A methods that were manually selected.

From the set of images that we see, there are some kind of cool observations to make here. First, we notice that for the CSUA board image, looking at the whiteboard area, the blended image that comes out of Proj4B using the automatically detected corners is actually better than the one in Project 4A, since the side of the whiteboard appears less blurry and is more clear. The reason for this is because when we go back and look at the feature matching output of the image, we see that there were points selected that were on the whiteboard side, which then allowed the whiteboard to become more clearly aligned, whereas when I was manually selecting I only selected points on the left side, which caused the output to be a little more blurry. The algorithm was able to more clearly detect good alignment points here than me. However, this is not necessarily true for the other images. For example, for the box, the image with manually selected points looks better than the image with automatically selected points, since I was able to select better corner points on the actual box to allow the box to align, whereas it becomes very blurry in the image output from my methods in Project 4B. Ultimately, I think that manually selecting points, when you have the time and effort to put into selecting many different points, is going to be better than automatically detecting, but automatic detection is a quicker solution when you have less time, and can sometimes select some correspondences that you might miss if you're rushing.

The BWW wall with manually selected points.

The BWW wall with automatically selected points.

The CSUA whiteboard with manually selected points.

The CSUA whiteboard with automatically selected points.

The CSUA box with manually selected points.

The CSUA box with automatically selected points.

The CSUA back TV area with manually selected points.

The CSUA back TV area with automatically selected points.