parSir

un-chat the GPT : Direct touch interface for reading and comprehension.

As AI continues to shape our tools and workflows, one of the most exciting frontiers is redefining how we interact with large language models (LLMs). While the dominant paradigm in AI-driven interfaces is conversational—think chatbots and virtual assistants—this approach is often a mismatch for workflows that demand focus, exploration, and depth. In Parsir, we set out to build an alternative: a document interaction tool that reimagines reading through multi-touch gestures, AI-enhanced visualizations, and spatial exploration.

Why Move Beyond Chat-Based Interfaces?

Chat-based interfaces excel at single-turn queries or simple interactions, but they fall short in contexts requiring nuanced, multi-faceted engagement. When reading complex documents—be they research papers, contracts, or manuals—users don’t just ask questions; they navigate, annotate, restructure, and synthesize. Traditional PDF readers fail to leverage the potential of LLMs in these scenarios, and chatbots, while powerful, impose cognitive overhead by forcing users to translate their intent into conversational queries.

Parsir offers an alternative vision: an interface where interaction is direct and exploratory, designed to amplify the user’s cognitive flow rather than disrupt it. By integrating gestures, touch, and contextual AI tools, Parsir moves away from “talking to” AI and toward collaborating with it.


Designing Parsir: Key Features and Their Foundations

Parsir is fundamentally an interaction-first application, where AI augments natural user actions rather than dictating the interface. Here’s how it works:

1. Zoom as a Semantic Tool: Summarization and Detail Retrieval

Zooming isn’t just for adjusting the viewport in Parsir—it’s a semantic tool. When users zoom out, Parsir generates summaries of the visible text using LLMs, distilling content into concise overviews. Conversely, zooming in dynamically reveals additional layers of detail, such as contextual notes, references, or AI-driven explanations.

This approach leverages the physical metaphor of focus and scale, aligning user actions with the mental processes of abstraction and exploration. By integrating LLM capabilities into spatial gestures, we created an intuitive way for users to engage with large, dense texts without breaking their flow.


2. Swipe-to-Transform: Restructuring Information

Often, the challenge of working with documents is reshaping the information for better understanding or presentation. In Parsir, users can swipe across paragraphs to transform them:

  • Swiping left condenses the text into bulleted lists, ideal for summarizing key points.
  • Swiping right reimagines the content visually, generating images or infographics using diffusion models tailored to the document’s themes.

These transformations are designed to be reversible and iterative, giving users the freedom to experiment with restructuring without fear of losing the original content. This feature emerged from user studies showing a common need to distill dense materials into more digestible formats.


3. Gestures as Queries: Direct, Actionable Interaction

Parsir treats gestures not as simple navigation tools but as queries in themselves. For example:

  • Highlighting a phrase can trigger real-time expansions or definitions, depending on user preferences.
  • Dragging across a section enables AI-driven comparative analysis, perfect for identifying patterns or contradictions in reports or papers.

The goal is to make interaction as seamless as flipping a page, ensuring users remain immersed in the document while accessing powerful AI tools.


Engineering Challenges and Innovations

Building Parsir posed several unique challenges at the intersection of HCI and AI:

  1. Real-Time Responsiveness: Integrating LLMs and diffusion models into a fluid, touch-driven interface required optimizing latency to ensure gestures felt natural and instantaneous. This involved creating lightweight, on-device models for common tasks while reserving cloud-based inference for more complex computations.
  2. Context-Aware AI: Unlike chatbots that rely on explicit queries, Parsir had to infer user intent from spatial and temporal interaction patterns. This necessitated a multi-layered design combining gesture recognition with contextual prompts tailored to document structure and user history.
  3. Balancing Automation and Control: One key design principle was ensuring users always felt in control of the AI. By emphasizing reversible transformations and offering visual previews, Parsir encourages experimentation without locking users into AI-generated outputs.

Why This Matters: The Future of Reading

At its core, Parsir is an exploration of what it means to read and work with documents in a world increasingly mediated by AI. Traditional interfaces treat documents as static artifacts, offering tools for navigation but little else. Chat-based AI systems, on the other hand, treat documents as black boxes, accessible only through specific queries.

Parsir bridges this gap. By embedding AI into the act of reading itself, it empowers users to interact with information in ways that feel both natural and transformative. The result is not just a more powerful document reader, but a new model for how we might think about AI interfaces: as collaborative partners, embedded seamlessly into the tools we use every day.

In 2024, reading is no longer passive. With Parsir, it becomes dynamic, interactive, and deeply personal—a glimpse into the future of human-computer interaction.