In this project, you will be designing and implementing a mini search engine. You are probably familiar with Google, Bing or Yahoo, which are some of the most popular search engines that you can find on the Web. The task performed by a search engine is, as the name says, to search through a collection of documents. Given a set of texts and a query, the search engine will locate all documents that contain the keywords in the query. The problem may be therefore reduced to a search problem, which can be efficiently solved with the data structures.
Your task is to design and implement an algorithm that searches a collection of documents. A minimum of 10 documents should be used. You have the freedom to select the data structures and algorithms that you consider to be more efficient for this task. Of course, you will have to justify your decisions.
First, you will process the documents and store their content (i.e. words / tokens) in the data structures that you selected (in information retrieval, this phase is called indexing). Next, for every input query, you will process the query and search its keywords in the documents, using the previously implemented data structures and an algorithm of your choice. (This phase is called retrieval). For each such query, you will have to display the documents that satisfy the query.
The queries may contain simple Boolean operators, that is AND and OR, which act in a similar manner with the well known analogous logical operators. For instance, a query: “Keyword1 AND Keyword2” should retrieve all documents that contain both these keywords (elements). “Keyword 1 OR Keyword 2” instead will retrieve documents that contain either one of the two keywords.