$30
QU analytics has hired you to as an Algorithmic marketing analyst. QU is a consulting organization specializing in marketing analytical solutions. Your client is a large e-tailer (CouchSmart) who has millions of products in its catalog. They intend to enhance the user-experience of their clientele by providing rich and engaging interfaces without leaving their couches! They are considering implementing Visual Search and has reached out to QU Analytics assist in prototyping such a solution.
Since the data is still being collected, they intend to use Cdistance’s data as a proxy. (https://www.kaggle.com/c/cdiscount-image-classification-challenge
). They intend to try out a couple of different approaches and get recommendations on which approach to implement.
The specification of the project is as follows:
Ingestion and pre-processing:
Download the dataset and process it using the sample code provided[1]
Sample data using xsv[5] so you can prototype. Get at least 100 categories and 100 products in each category Similarity Search:
Preprocess the images as needed by each method. You can have 3 separate files.
Implement Version 1 of Similarity search using method proposed in [3]
Implement Version 2 of Similarity search using the Facebook method proposed in [6]. See [2] for how they used it.
Implement Version 3 of Similarity search using the Spotify-Annoy method proposed in [4] Methods:
You should implement 2 methods:
Single lookup: Given an image identifier, retrieve k-similar images
Bulk: Generate a json with k-similar images for each image See [4] and [7] for examples:
Search:
Install and configure Elasticsearch
Using the output from the Bulk output Json, index the data so you can query Elasticsearch for k similar images Reference app:
Build a reference app, (simple app using streamlit or flask[8]) to enable searches.. See [9] for ideas
Your reference app have the 2 modes:
2
When an indexed image (from a dropdown or randomly shown on the site) is selected, you return back k images from the elastic search index.
Provide an interface so you can upload a new image, this will call the function to lookup k images similar to the uploaded image and returns back the k images similar to the new image.. (You can put some samples on S3/Google drive and use those as links to point to the images)