A summary of a larger story; tells a story. If you just cranked through 40,000 words, what did you write? This experiment is a follow-on to an article written in 2023, feels like an age ago. Going back in time, large language models (LLMs) possessed limited context windows, which made summarizing long, complicated documents somewhat challenging due to training and other costs. We’ve even built our own model to offset some of these challenges. And then Microsoft dropped the Phi series of models.
Phi-3 is advertised as not just another LLM (technically, it’s an SLM); it’s designed to be accurate, small, and lightweight, able to handle tasks like summarization—fairly easily on the surface. And the released benchmarks speak to high accuracy while using fewer computational resources. If you’re building applications, it’s meant to achieve results comparable to larger models without prohibitive costs or hardware demands.
Efficiency often takes a backseat to spectacle. So our team wanted to give this a spin on older machines and smaller devices, where resource constraints are a reality. Also, many authors are concerned about data privacy and having their work stolen for training, data locality matters. And, let’s be transparent, API (tokens) costs are expensive depending on document size.
However, Phi-3 can introduce an element of variability into its results. Each time the same document is ran through Phi-3, subtle changes in phrasing and sentence choice are possible, even when using the same input. This stems from Phi-3’s design as a language model that generates text based on probability, making its summaries dynamic and more human-like in their composition, yet less consistent. Sure, we love creativity, but it can be challenging for users looking for deterministic results. While this flexibility can add depth, it also means the summaries may not be identical across multiple runs.
BERT and Sumy are more consistent. BERT provides deterministic summaries by pulling the most contextually significant sentences. This extractive approach, guided by clustering, ensures that each time a text is processed, the same sentences will be selected. Similarly, Sumy, relying on traditional algorithms like TextRank, efficiently selects key sentences based on word frequency and sentence positioning. Without the influence of randomness, Sumy guarantees consistent, predictable outputs.
In our previous application, the design point was summarization of long text documents, novel length or longer—we work with writers that do that sort of task. What also makes PHI intriguing is that it has a long context window. The expanded window helps process more information at once, reducing the need for chunking and reprocessing to achieve results.
We’re summarizing documents, not solving world peace or sequencing the genome. However, each time we ran our initial test document—The Durham Report—through the model, it either took an immense amount of time or our small hardware rig spun out. Or it was just costing too much to run. Even with improved efficiency, LLMs are still not equipped for every task, mostly due to cost. For these to work locally, everyone will need a new computer. Maybe, that’s the goal.
So, we took a different approach, augmenting with TF-IDF.
TF-IDF (Term Frequency-Inverse Document Frequency) distills what really matters in a document by balancing the familiar against the rare. It’s not just about how often a word shows up—there’s a deeper calculus at play. By comparing the frequency of terms within the document to how often they appear, TF-IDF gives you a clearer sense of what makes the document unique. This isn’t just filtering out stop words like and or the; it’s spotlighting what portions of the text stand apart mathematically. TF-IDF effectively prioritizes the words and phrases that have the most impact, cutting through repetitive clutter.
It also runs fast. But also requires a comparison point—it’s used to compare documents.
Here, we adapted the approach by segmenting the document using python, either by text block or paragraph, and then pulled the Top 10 paragraphs within the text. This mathematical approach may sacrifice some context, as it’s not guided by the deeper semantic understanding that an LLM like Phi-3 can provide. However, it reduces the monstrosity into a manageable size.

The text in the Durham report is complex with footnotes and formatting challenges as it was converted from a PDF document. By using this method, it’s downsized to a few K, which lowers the input cost and processing time of the LLM, reducing cost and time. Then, we ran Phi over this condensed file to summarize the report in two sentences—similar to our Sumy and Bert approaches. This is what it came up with:
The FBI abruptly opened Crossfire Hurricane without rigorous analysis or crediting Papadopoulus’s statements which were deemed potentially unreliable; the investigation prioritized broader political suspicions over a focused inquiry into potential Russian coordination.
Perfect? No. But it’s efficient. And it completed the task more quickly than BERT and our experiment with K-Scoring (Sumy still wins the efficiency game). Depending on your point of view, there is more than one way to tell a story.
Old Tech and Future Trends
We can expect more innovations that continue to bridge the gap between performance and efficiency. However, mathematical approaches like TF-IDF will always win the speed battle—it’s just math. These deterministic methods will likely stay relevant at scale for preprocessing and other tasks, at least for now.
Other Notes:
- Phi-3 generated the summary, we used the smallest model. Note, technically, it’s a SLM.
- TF-IDF.
- There is always the efficiency game; this could replace deterministic models. Or call them on their own.
- Theoretically, Sumy would probably have the best results combined with a LLM.
- Benchmarks? Deterministic models run fast, so the results come back within seconds even on older laptops or underpowered machines. Phi-3 did complete its task, but it took more than a few cycles.
- Haven’t decided if any of this product will be open-sourced in the future. It’s not complicated to setup.
