Visualizing SPARQL Query Plans with Python and Textualize
Welcome to my little journey of visualizing profiled SPARQL query plans using Python and Textualize! In this blog post, I’ll walk you through my experience building a text-based user interface (TUI) to bring profiled SPARQL query plans from Stardog to life. But before we dive into the details, let me give you a sneak peek of the final product:
A sneak peek of what I built
All the source code for the project can be found on Github.
The Challenge of Reading Profiled SPARQL Query Plans in Stardog
I have been working at Stardog now for almost 4 years and while I don’t work on the platform’s internals (e.g. the query engine), I have had to interact with query plans numerous times - usually while trying to optimize my poorly written SPARQL queries 😊. Stardog has always been able to produce and surface SPARQL query plans to users and semi-recently added a capability to add profiling information to the query plan to help users reduce and eliminate bottlenecks in their queries. This capability is exposed as part of Stardog’s query plan service, and like many other features, is exposed via an HTTP endpoint, the Stardog’s CLI, and Stardog Studio (an IDE-like experience for interacting with Stardog). This is all good and well, except that the output from the profiler can get pretty verbose and a bit unwieldly to visually inspect.
For example, here’s the abbreviated CLI output from a profiled query.
The Stardog CLI and Stardog Studio by will return the profiled query plan in plain text containing information about the profiled plan, the query profiled, and a tree representing the query plan with profiler information about individual nodes, also known as “operators”.
From the Stardog Docs:
The plan is arranged in a hierarchical, tree-like structure which grows left to right. The nodes, called operators, represent units of data processing during evaluation. They correspond to evaluations of graph patterns or solution modifiers as defined in SPARQL 1.1 specification. All operators can be regarded as functions which may take some data as input and produce some data as output. All input and output data is represented as streams of solutions, that is, sets of bindings of the form
x -> value
wherex
is a variable used in the query and value is some RDF term (IRI, blank node, or literal). Examples of operators include scans, joins, filters, unions, etc.
While the text format is packed with useful information, it is unstructured and hard to parse programmatically. The larger the plan is, the harder these get to visually inspect as well. Wouldn’t it be great if we could get the profiled query in some sort of a structured format like JSON so we could potentially make a tool to more easily inspect the plans? Well it turns out the HTTP endpoint can actually return the plan in JSON in addition to plain text!
Introducing Textualize
For the past couple years, I’ve had a lot of fun building TUIs like jqp and guard-dog. All of my TUIs to date have been built using Go leveraging Charmbracelet’s bubbletea framework, but I’ve been using quite a bit of Python at work and was looking for an excuse to leverage it in additional project. I’d heard interesting things about Textualize, which is a Python library for building TUIs and thought I’d give it a shot in my attempt to better visualize profiled Stardog query plans.
“Rapid Application Development”
Textualize positions itself as a “Rapid Application Development” framework - a point that really resonated with me during the creation of my SPARQL profiler TUI. While there was an initial learning curve grasping Textualize’s framework, once I understood the main concepts, the experience was really remarkable. Working within Python and employing Textual’s CSS-like styles proved to be a real delight. While I hold deep admiration for bubbletea (and everything the Charmbracelet team makes) in my previous TUI endeavors, Textualize not only provides an exceptional developer experience but just being able use Python is a big plus for prototyping (at least for me). This became especially advantageous when dealing with sizable JSON payloads representing profiled SPARQL query plans. While achieving similar results in Go is feasible I’m sure, I found Textualize enabled me to rapidly prototype and implement a functional solution over a weekend of hacking.
What I Built
I developed a SPARQL Profiler TUI to make it easier to inspect and analyze the SPARQL query plans produced by Stardog. I tried to take all that really useful information returned by Stardog and organize it in more navigable and intuitive way.
See the high-level features of the TUI in the sections below.
CLI Entrypoint
I quickly constructed a CLI using click in order to take input from the user to communicate with their Stardog server and launch the TUI. This is the entrypoint. Below is the usage of the CLI:
Foldable Tree-Like User Interface
The heart of the profiler TUI is an intuitive, foldable tree-like user interface. Leveraging Textualize’s powerful tree widget, I created a visual representation of the SPARQL query plan. This interface offers users a structured and navigable view of the plan’s hierarchy.
The heavy lifting here perse was transforming the JSON output from Stardog containing the profiler information and query plan into my custom tree widget.
See the full code for the custom widget on Github.
Plan Tree Nodes
Each node in the tree provides essential information for debugging query performance:
- Cardinality Estimations: An estimate of the number of results actually generated. These are contained in square brackets
[]
- the same way Stardog displays them in query plans.A common cause of suboptimal query plans is cardinality mis-estimations, i.e. when the number of results that the optimizer thought an operator would generate and the actual number of results are several orders of magnitude apart.
- Memory: The amount of allocated managed memory used by a node/plan operator. This is only reported for pipeline-breaking node/plan operators which materialize intermediate results in memory.
- Intermediate Number of Results: The number of intermediate query results generated by each node/ plan operator. The profiler detects when some part of the node/plan operator’s output is skipped over. In this case, “with gaps” is appended to the results.
- Wall time for each plan operator - The time spent on the server evaluating the node/plan operator. The percentage the plan operator uses of the total wall time is also displayed next to the amount of time in parentheses
()
. To provide a comprehensive view of wall time when folding section of the tree, the TUI dynamically calculates and displays the total wall time of a node’s children when a parent node is folded.
Here’s a snippet from the source code showcasing the wall time calculation. This is fairly straightward to do in Python.
Upon folding a parent node, the TUI calculates and displays the wall time of the plan node/operator's children on the parent node's label.
Color-Coded Output for Key Information
To enhance the interpretability of the query plan, I color-coded the label for each node/plan operator. By leveraging Rich (also from Textualize), I applied distinct colors and styles to various elements in the tree view. This approach helps users quickly identify and understand key aspects such as memory usage, cardinality estimations, and more.
Moreover, I prepended a visual indicator (💥) to a node/plan operators label to highlight nodes that serve as pipeline breakers. Identifying these pipeline breakers is crucial for optimizing query performance, as they can have significant implications, often related to memory pressure.
From the Stardog Docs:
Not all operators can produce output solutions as soon as they get first input solutions from their children nodes. Some need to accumulate intermediate results before sending output. Such operators are called pipeline breakers, and they are often the culprits for performance problems, typically resulting from memory pressure. It is important to be able to spot them in the plan since they can suggest either: a) a way to re-formulate the query to help the planner, or b) a way to make the query more precise by specifying extra constants where they matter.
Tabs to Switch Between Query and Profiled Plan
It can be useful to view the original query you profiled with the query plan. Once again, I was able to leverage Textual’s built in widgets to accomplish this. Textual has a Tabbed Content widget which allows you switch between content panes via a row of tabs. You can actually click on the tabs to switch between tabs or use the keyboard shortcuts displayed in the footer to do so.
The footer is yet another built-in widget. It’s like the control center for all your Textual bindings, neatly displaying all the bindings you’ve set up in your app. All you’ve got to do to setup bindings is utilize the “magic” BINDINGS
attribute in your Textual App
and/or widget and define some actions for those key bindings.
Details Panels
High-Level Profiler Information Panel
The TUI primarily focuses on the tree widget for navigating the query plan. However, I also added a panel that sits right above the tree widget displaying high level profiler information. This panel provides users with a concise overview of the query execution, displaying key performance metrics such as memory usage, execution time, and the number of results returned. The information is extracted from the JSON payload returned by Stardog, specifically from the profiler
top level key.
Plan Node Details Panel
The foldable tree-like user interface provides a comprehensive visual representation of the entire query execution flow, showcasing detailed information about each node. However, due to its expansive nature, it may occupy a substantial amount of horizontal space.
To address this, you can utilize the Plan Node Details Panel, which offers a more focused view. This panel can be toggled to hide or show, allowing you to concentrate specifically on the details of the highlighted tree node. This proves especially beneficial when node labels extend beyond the visible area, requiring horizontal scrolling to explore the complete content of the tree.
I was able to just add a message
handler such that when a tree node is
highlighted (in focus) the “Node Details” panel (PlanNodeDetails
widget) is
updated displaying the node information.
Takeaways
In tackling this project, I discovered that Textualize proved to be an excellent library for what I set out to build. Its abundance of built-in widgets simplifies the initial stages of app composition, while its flexibility allows for the relatively straightforward creation of custom widgets. While I wouldn’t categorize this project as entirely production-ready, it only demanded a weekend’s worth of effort to achieve useful functionality. For those well-versed in Python, averse to dealing with frontend complexities, and inclined towards constructing command-line tools, Textualize comes highly recommended from me.