Image recognition systems require high performance, and an album cover database is very large to store on a phone. For these reasons, we chose to implement our system as a client-server architecture (Fig.1).
Figure 1 - MusicGuide Architecture
of cover & send it to server
Send Data to Phone
The user interface on the phone lets the user take a photo with a resolution of 500×500 pixels using the camera on the device. This photo is then automatically uploaded to our server where we perform the matching and data aggregation. The server then pushes the results as an HTML file back to the phone. This HTML file contains product rating and track samples from Amazon, and average critic score and excerpt reviews from Metacritic with links to full reviews, and it is also saved on the phone for subsequent views if desired.
Our server has two main tasks: image matching and data collection. The image matching component uses David Lowe’s SIFT. SIFT has been shown to have good precision, and it is invariant to various image transforms. The implementation we use is a modified version of David Lowe’s implementation. Once the matching is performed, the server gets user rating and track samples for the album from Amazon using the Amazon Web Services API, and parses the HTML data from the product page on Metacritic to get the review excerpts, links to full versions of these reviews and average critic score for the album. Since the input image is small, the total data communicated between the phone and the server is on average 100Kb, and most of this data is the input image and the matching cover image from our database that the server sends back so that the user can evaluate the correctness of the match.
The user interface is implemented in Python. Data collection on the server was implemented on VB .Net, whereas the object recognition algorithm is written in C for efficiency purposes. A web interface to the server was also built using ASP .Net. While the user interface we implemented provides a practical mechanism to perform the search, through this web interface our server can be accessed by any web-capable phone with a browser.