Creating an Open Source Search Platform: Search Engines with AI - Swirl
Swirl is an Open-Source Search Engine written in Python. Powered by Large Language Models (LLMs) & ChatGPT along with ML & NLP Algorithms.
Table of contents
- Swirl on GitHub 👇
- The Hunt for Open Source Search Platforms 🕵️♀️
- Dealing with Many Data Sources, Data Bases, and Data Siloes ☹
- Using LLMs with Search and Bringing Multiple Data Sources Together 🤝
- Retrieval Augmented Generation (RAG) using Swirl ✨
- Get Started with Swirl 💻
Before wasting time, let me introduce Swirl. This unified open-source search platform built with Python and Django seamlessly unifies searches across databases (SQL and NoSQL), cloud services, search providers, data siloes, and tools like Miro, Jira, GitHub, etc.
With Swirl, users can conduct a single query, instantly pooling and presenting relevant data from multiple platforms in one consolidated UI.
The Hunt for Open Source Search Platforms 🕵️♀️
Search is everywhere in our daily lives. Giants like Google Search, Bing, and Duck Duck make it easy for us to find information at the click of a button.
However, the choices are somewhat limited for businesses, startups, and developers looking to incorporate search functionality into their platforms without being bound to these major players. Most enterprise-level search engines come with licensing fees or restrictions. Integration options such as Google's Programmable Search Engine and Algolia are powerful. Still, they might not always cater to the specific needs of all businesses, especially when customization and self-hosting are concerned.
This brings out the need for open-source search engines. For all the users trying to integrate search into their platforms, Swirl serves as one of the best choices. Built on Python, and it's highly customizable. Being free and open-source, it carries the Apache 2.0 License, which means developers and businesses can utilize and modify it without any licensing costs. The teams can contribute towards its development by doing improvements, bug fixes, and feature enhancements.
Dealing with Many Data Sources, Data Bases, and Data Siloes ☹
As any startup or company grows, so does the size of its data and databases. Eventually, the complexity of finding the correct information increases as well. As these organizations expand, they inevitably accumulate data in various forms – traditional documents, code repositories, spreadsheets, or more structured databases like SQL and NoSQL. The real challenge, however, is not just storing this large volume of data but efficiently retrieving the information when needed.
So, in which database is our document lying? Is it in USE_CASE_1 or USE_CASE_2?
A typical scenario is when you need data and don't know where.
The diversity of data sources adds another layer of complexity. Imagine sifting through a vast library where books, journals, handwritten notes, and digital records are all stored haphazardly. Sounds daunting, right? That’s precisely the scenario many businesses face today. Different data types, coupled with isolated data siloes, can make it a Herculean task for employees to locate the correct information promptly.
Swirl connects to multiple data sources and can search in all. Swirl is a centralized hub, enabling streamlined searches across all integrated data sources. This simplifies the search process and ensures that no crucial information gets overlooked because of its origin or format.
Using LLMs with Search and Bringing Multiple Data Sources Together 🤝
Swirl distributes user queries to search engines, databases, and other enterprise cloud services using their existing APIs and standards-based security mechanisms like OAuth2. Swirl asynchronously normalizes and re-ranks the unified results using large language models.
Let's understand how Swirl works.
The user provides the data sources to which Swirl integrates.
The user creates a query to search for.
Swirl sends those queries to each source.
Get the response and find the best using LLMs.
Swirl then gets the citations in an async pipe.
It then fetches the top results and creates a prompt.
It sends the data + prompt to ChatGPT (or any LLM).
Swirl returns answers with ChatGPT insights.
Diagram Explaining Swirl Search with ChatGPT as a configured LLM
A diagram explaining how Swirl can work with multiple data sources and provide results with ChatGPT insights.
The whole search process is simplified, and setting up Swirl is pretty straightforward. To state Raman Ramanenkou of Sense.
“Setting up and running Swirl in a Docker container is incredibly straightforward—it takes just a few minutes.
~ Raman Ramanenkou of Sense
Data privacy and security are essential while searching. People should only be able to search for information they can access in any corporation. Swirl incorporates OAuth2 authentication. This means that the access and visibility of data are tightly controlled based on the credentials of the individual searching. It ensures that sensitive information remains restricted, and only those with the necessary permissions can view it.
Therefore, someone who doesn't have access to critical files cannot search for them or even know that they exist.
Retrieval Augmented Generation (RAG) using Swirl ✨
Retrieval Augmented Generation (RAG) is a technique where information retrieval is combined with text generation. In simpler terms, RAG first fetches relevant data by searching and then crafts an answer based on that data using Large Language Models; a widely popular example is Bing AI Chat. You can create a dynamic knowledge base by implementing RAG on your data sets.
Swirl helps you create a chatbot for your data with ease. You can integrate the power of ChatGPT enterprise with Swirl and then generate answers on the fly. You don't need any extra database to store any information. Just search.
Swirl can retrieve accurate information and obtain answers complete with citations to the original documents. This will supercharge your productivity. And you will have reliable and referenced information at your fingertips.
An example is when we search for Sid Probstein using ChatGPT and then search again using Swirl's RAG pipeline. We get the document used for generating the answer in the link at the bottom left.
Get Started with Swirl 💻
If you want to try Swirl and get it up and running at no cost. Head to the GitHub page and check it out. 👇
GitHub 🔗: Swirl on GitHub 🌌
💿Installing instructions: Getting started with Swirl
📃 Documentation: Swirl Wiki
A snapshot of Swirl UI when running
Swirl in Action with the Galaxy UI
Swirl is a community-driven 👩💻 open-source project. We welcome individuals interested in creating a search platform and contributing to the project's development. If you are interested in learning about the project and want to contribute or begin your open-source journey with us. We'd gladly guide you and help you understand and create your first open-source contribution. 🤗
Please give us a Star 🌟 on GitHub.
Follow us on Twitter/𝕏 for updates.