InternGPT: Expanding Interactions with ChatGPT Beyond Pointing
While ChatGPT has revolutionized the way we communicate with AI, a new paradigm is emerging that expands on this foundation. This development, InternGPT, offers a more enriched interactive experience, utilizing not just language but also pointing in order to control ChatGPT. This system expands the frontiers of visual communication with AI.
InternGPT, often abbreviated as iGPT, goes beyond being a simple visual interactive system. InternGPT stands for interaction, nonverbal, and ChatGPT, and it blends these components to create a unique interaction experience. At its core, it relies on a pointing device for operations such as clicking, dragging, and creating, effectively extending the capabilities of ChatGPT.
Developed as an open-source project by researchers from the University of Chinese Academy of Sciences' OpenGVLab, InternGPT builds upon the ChatGPT model. As a transformative extension of this large language model chatbot, InternGPT promises a host of possibilities, from text generation and language translation to crafting creative material and offering helpful answers to queries.
Although InternGPT is still in the development stage, it shows promise to become a powerful tool in a broad range of applications. For instance, its ability to interact with other software like image editing tools amplifies its potential impact in various industries. This interactivity will enable a seamless integration between text-based communication and visual manipulation, unlocking numerous possibilities.
The installation of InternGPT requires some basic prerequisites including Linux, Python 3.8+, PyTorch 1.12+, CUDA 11.6+, and GPU Memory of at least 17G for loading basic tools. Once the Python environment has been created and activated, the Python dependencies are installed using pip command.
Subsequently, the iChat Gradio service is launched with the necessary components such as HuskyVQA, SegmentAnything, and ImageOCRRecognition. For those seeking to enable the voice assistant feature, an additional step of generating a certificate using OpenSSL is necessary.
InternGPT is more than just an innovative way of interacting with ChatGPT. It provides an array of features including a multi-modal dialogue that enables image-related interactions. For instance, users can upload an image and then engage in a conversation about the uploaded image, generating commands like "What is in the image?" or "What color is the background of the image?"
Furthermore, InternGPT supports interactive image operations. Users can select a specific area of an image, perform optical character recognition at that place, or even remove or replace the masked area in an image. These commands can significantly enhance image manipulation and generation, based on the users' input.
InternGPT also allows users to create new images from a single audio file, or from a combination of audio and text. This image creation feature is further expanded with the ability to draw on a digital whiteboard and generate images from these scribbles, opening up creative possibilities.
You can access InternGPT GitHub here (opens in a new tab).
As a new interactive interface for ChatGPT, InternGPT is a pioneering effort that promises to change the way we interact with AI. With its ability to weave together language and pointing for visual communication, InternGPT extends the frontiers of what is possible with AI, breaking down barriers between text and visual communication. As the development progresses, we look forward to seeing the range of applications this innovative tool can empower.