Table of Contents
Introduction
In this tutorial, we will learn how to create a simple search engine using Python. We will build a basic search engine that allows users to enter a query and retrieve relevant documents based on the query. By the end of this tutorial, you will have a good understanding of how search engines work and how to implement one using Python.
Prerequisites
To follow along with this tutorial, you should have a basic understanding of Python programming and web development concepts. Familiarity with HTML, CSS, and JavaScript will be beneficial but not required.
Setup
Before we start coding, let’s make sure we have the necessary tools and libraries set up.
-
Install Python: If you don’t have Python installed on your system, you can download and install it from the official Python website (https://www.python.org/downloads/).
-
Install Flask: Flask is a lightweight web framework for Python. We will use it to create a simple web interface for our search engine. To install Flask, open your terminal or command prompt and enter the following command:
pip install flask
-
Install Whoosh: Whoosh is a fast search engine library for Python. We will use it to perform the actual search operations in our search engine. To install Whoosh, enter the following command:
pip install whoosh
With our setup complete, let’s move on to creating our search engine.
Creating the Search Engine
-
Initialize the Flask App: Create a new Python file and import the necessary modules.
from flask import Flask, render_template, request import os from whoosh.index import create_in, open_dir from whoosh.fields import Text, Schema app = Flask(__name__) app.config['UPLOAD_FOLDER'] = 'index'
-
Define the Search Route: Create a route in your Flask app that will handle the search request.
@app.route('/search', methods=['POST']) def search(): query = request.form['query'] # Perform search operation results = perform_search(query) return render_template('search_results.html', results=results)
-
Define the Indexing Route: Create another route that will handle the indexing of documents.
@app.route('/index', methods=['POST']) def index(): files = request.files.getlist('files[]') # Create or open the index directory if not os.path.exists(app.config['UPLOAD_FOLDER']): os.mkdir(app.config['UPLOAD_FOLDER']) ix = create_in(app.config['UPLOAD_FOLDER'], Schema(content=Text)) for file in files: # Index the document doc = ix.writer() doc.add_document(content=file.read().decode('utf-8')) doc.commit() return 'Indexing completed successfully!'
-
Implement the Search Function: Inside the
perform_search
function, perform the search operation using Whoosh.def perform_search(query): ix = open_dir(app.config['UPLOAD_FOLDER']) # Perform the search operation with ix.searcher() as searcher: query = QueryParser("content", ix.schema).parse(query) results = searcher.search(query) # Extract the relevant information from the search results output = [] for result in results: output.append(result['content']) return output
-
Create HTML Templates: Create two HTML templates,
index.html
andsearch_results.html
, to display the search form and search results.<!-- index.html --> <html> <body> <h2>Search Engine</h2> <form action="/index" method="post" enctype="multipart/form-data"> <input type="file" name="files[]" multiple> <input type="submit" value="Index"> </form> <form action="/search" method="post"> <input type="text" name="query"> <input type="submit" value="Search"> </form> </body> </html>
<!-- search_results.html --> <html> <body> <h2>Search Results</h2> <ul> </ul> </body> </html>
-
Run the Application: Finally, run the Flask application by adding the following lines at the end of your Python file.
if __name__ == '__main__': app.run(debug=True)
Congratulations! You have created a simple search engine using Python. You can test it by running the app and accessing the search form through your browser. Upload some files to index, and then enter a query to retrieve the relevant documents.
Conclusion
In this tutorial, we have learned how to create a simple search engine using Python. We started by setting up the necessary tools and libraries. Then, we implemented the search engine functionality using Flask for the web interface and Whoosh for the search operations. Finally, we created HTML templates to display the search form and search results.