Augmented Reality is no longer confined to our dreams and has now become a part of our tech infused lives . So we won’t take the trouble of explaining what this piece of Utopian art is. But what we will do is flaunt our pretty legs and talk about one AR app in particular – Rev Eye. You can say we are being shameless, or maybe not, because at the moment Rev Eye is the most feature rich AR app in the market! So, let’s talk cool, geeky tech details!
We use few feature detection algorithms as raw ingredients on the first layer. It is cooked by app making languages and garnished by best UI turning into a beautiful Dish called Rev Eye. Guys and girls can give it a shot for the damn high prize of 0$ but the feel you get is around 10^6 Dollars (Yea, I got a technical figure over here woooHoHOOOO).
Couple of years back a cool watch was all you needed to pass for a grown-up in a student bar, no you need smart wearable technologies, not any old watch that can just show the time and sparkle some diamonds. So, what are wearable technologies all about? What is Augmented Reality about? What are gaming consoles like Oculus Rift and Project Morpheus about? Voice search? Gesture commands? Let’s explore together, my fellow technoholics
Now, lets strictly get into technical aspects. We use SIFT and STAR for Object or Detection.When we say object, it can be anything which our database is trained to recognize. Bon voyage, now we are getting into deep blue sea. SIFT before entering into the actual algorithm one has to be aware of two concepts:
Key point detection: They are Points of interest which are identified across both image and scale dimensions using a “saliency”(similarity) criterion in order to boost efficiency of computation. Keypoints are detected in octave layers of the image pyramid as well as in layers in-between. The location and the scale of each keypoint is obtained in the continuous domain via quadratic function fitting. Whoa! Lots of technical specs!
Key point description: A sampling pattern consisting of points lying on appropriately scaled concentric circles is applied at the neighborhood of each keypoint to retrieve gray values processing local intensity gradients, the feature characteristic direction is determined.
Images may vary in sizes, clarity and lot of noise may interfere. To sustain such variations and give the same results we subject the real time image to various procedures. First thing is to do Gaussian Blur and subject the image to scaling.
Octaves and Scales:
The number of octaves depends on the size of the original image. While programming SIFT, you’ll have to decide for yourself how many octaves you want. However, the creator of the algorithm SIFT David Lowe, suggests that 4 octaves and 5 blur levels are ideal for the algorithm. In the next step, we’ll use all these octaves to generate Difference of Gaussian images.
Difference of Gaussian:
Once DoGs are found, images are searched for local extrema over scale and space. For example, one pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that keypoint is best represented in that scale. It is shown in below image:
Once potential keypoints locations are found, they have to be refined to get more accurate results. They used Taylor series expansion of scale space to get more accurate location of extrema, and if the intensity at this extrema is less than a threshold value then maxima or minima is found.
Now an orientation is assigned to each keypoint to achieve invariance to image rotation. A neigbourhood is taken around the keypoint location depending on the scale and the gradient magnitude. Direction is calculated in that region. An orientation histogram with 36 bins covering 360 degrees is created. (It is weighted by gradient magnitude and gaussian-weighted circular window with \sigma equal to 1.5 times the scale of keypoint. The highest peak in the histogram is taken and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints with same location and scale, but different directions. It contributes to stability of matching.
Now keypoint descriptor is created. A 16×16 neighbourhood around the keypoint is taken. It is divided into 16 sub-blocks of 4×4 size. For each sub-block, an 8 bin orientation histogram is created. So a total of 128 bin values are available. It is represented as a vector to form keypoint descriptor. In addition to this, several measures are taken to achieve robustness against illumination changes, rotation etc.
Keypoints between two images are matched by identifying their nearest neighbours. But in some cases, the second closest-match may be very near to the first. It may happen due to noise or some other reasons. In that case, ratio of closest-distance to second-closest distance is taken.
This is what we are efficiently and persuasively envisioning for Rev Eye in the future:
1) Real time training
2) Reverse image search against database
3) Scale and rotation invariant
4) 360 degree orientation with minimum tracking error
Basically, Rev Eye is set to become a physical search engine – Google for the real world.We are gonna change the way you see your world
Drop us a line or comment, what do you think about it all! email@example.com (C.T.O – Rev Eye) , firstname.lastname@example.org (RnD Engineer – Rev Eye)