An Awful Story

Well, there is always a story to find, especially in the summarization of long-form documents. This is a follow-on to a previous article on Old School Summarization using LLMs and TF-IDF. Previously, we leveraged the Fable Forge, an author-assist tool under development designed to analyze long-form text documents.

But what about CRM datasets?

Where do we find them, and how do we start? The beauty of paying taxes is that government agencies produce massive amounts of data, publishing it for consumption. Despite calls to displace the Deep State, there is an undeniable value in transparency. The challenge is knowing where to find use cases of interest and sorting through what’s provided.

The Consumer Financial Protection Bureau (CFPB), which manages the Consumer Complaint Database (CCDB), was created as part of the Dodd-Frank Wall Street Reform Act. This entity has faced numerous legal challenges—many large banks have supported these suits, particularly centered on mortgage lending, credit reporting, and consumer protection rules. We’re not stepping into these political waters, at least not intentionally.

But the dataset is immense. Again, did we mention it was freely available? Yeah, we did.

Here, consumer complaints are published only after the company responds, confirming a business relationship. However, the database is not a statistical sample of all consumer experiences. Obviously, the sentiment is skewed because, well, these are complaints. Ultimately, it gives insights into problems with consumer experiences. Politicians and agencies can choose to use this feedback or not. Either way, it’s our tax dollars in action.

Acropolis Photo Altered With An Image Generator To Resemble A Bank

In pulling the data, we grabbed a snapshot starting in 2023 to the present. For reference, any person or organization can take the time machine back to 2011. Consumer complaints have been on the rise since COVID—the CFPB highlights these trends in their annual reports. These are infuriating. Cheating and stealing are prevalent, along with widespread consumer frustration.

For analysis, we took thousands of entries and converted them to raw text. A Python script reviewed each, converting it into a paragraph. This collection of paragraphs was subsequently compiled into a text file. It’s still raw and unfiltered, similar to the government reports highlighted in the previous article.

Then, the Fable Forge sorted using a TF-IDF document comparison method. It’s fast. But note, this dataset is immensely larger than the previous—hundreds of thousands of entries.

Once that was completed, it reduced the highlights to a cluster of entries, what we like to call the heart of the story. A couple of LLMs then poured over that document. We used a local version of Phi and the OpenAI API. Here is the summary:

The dominant trend is a growing dissatisfaction and frustration from consumers towards financial institutions, particularly in relation to their handling of fraud and financial scams. Consumers consistently express feelings of betrayal, citing how financial institutions, like Bank of America, Wells Fargo, and others, failed to protect them despite their legal duty and the red flags associated with fraudulent transactions. The complaints emphasize negligence in the banks’ risk management and fraud detection systems, as well as a lack of accountability and transparency in their responses. Victims argue that these financial institutions are more concerned with protecting their own interests than addressing the vulnerabilities of their customers, especially in the evolving digital landscape where scams are becoming more sophisticated.

But if we were going to tell a truly awful story based on the data, it might look like this:


They trusted the bank. Every cent they had went into an investment that wasn’t real. When the truth came, they called, begging for help.

The bank said no.

Now, the money’s gone, and the future they dreamed of is gone with it. All that’s left is the silence after the call, when all hope fades into a nothing.


It’s not a Hemingway micro-tale. Do you know the one? “For sale: baby shoes, never worn.” Unfortunately, our awful story is real life; it’s not some fictional apocryphal attribution. Hundreds of thousands of entries, and the connecting thread isn’t uplifting. But by casting a light and telling a simple story, things can change. One can hope.

Notes:


  • On performance, the tooling script took a few minutes to run vs. seconds—it’s a larger document than any special counsel report, as noted in previous summarization articles.
  • One could use a deterministic approach using Anthorpic’s Haiku, OpenAI-Mini, or Phi to flag portions of the text with certain sentiment levels. And then, query that using a different model for summarization. However, that approach would be expensive, despite lower token costs.
  • The summarization highlighted certain offenders—notably Bank of America and Wells Fargo. The results may be skewed due to the dataset and methodology.
  • CFPB funding lawsuits head to the Supreme Court.
  • What’s a fictional apocryphal attribution? We couldn’t find any attribution of Hemingway to the baby shoe story.