Using LLM to automatically summarize long RFC discussions

Could LLMs be used for this?

GPT-4 is the best for now and would probably make for the most fitting prototype, but a summarizer utility of this kind could eventually be completely open source:

1 Like

I don't think that is a good idea. LLM are very good at convincingly telling bullshit (hallucinating) and misrepresenting your stand points. It can't take blog posts and other discussion on other platforms into account unless you explicitly feed it in, which requires you to read a lot of the discussion on the rfc anyway. It also probably isn't as capable of distinguishing between trusted community members and trolls as this requires context across multiple platforms. And finally I don't want what I say being fed into an LLM.


if rust started using LLMs we think folks would start poisoning their posts so it trips up the LLM. similar to what artists started doing with art lately.

whether doing so is ethical is, ofc, debatable, but at the same time so is the use of LLMs. so... lose-lose?

I would believe there could be some good use cases at some point in the future; the LLM summary should not replace reading the actual discussion, but the LLM could give a structured overview about what points were being discussed where in the thread. Often, long thread discuss many points in a somewhat unstructured manner, so an auto-generated kind of “overview + table of contents” into the thread seems potentially quite useful. Of course, that's only my personal take; and I also assume that existing, especially free, LLMs might not excel at doing what I’m describing reliably either, and if or when they can do it, such tooling would need to be tested and its practical value demonstrated before being actually integrated.[1]

Regarding concerns of “I don’t want my words being read and processed by a LLM”, I don’t believe there’s much that can be done against that in the context of a public discussion on the internet. If it’s public, anyone can read it and copy it into any AI of their liking already anyways.

  1. Also, even when Rust doesn't adopt such tooling directly, with Microsoft’s newfound love for LLMs, judging by their latest presentations aiming to integrate them into essentially as many products as possible, auto-generated summaries/overview for issue threads might sooner or later become a tool that GitHub offers to all users and all (long) issue threads anyways. ↩︎


So, I suppose the value proposition here is "sure, it will be worse than a human-written summary, but no human is reading the whole discussion and writing a good summary, because it is too much work – an LLM would be less work, making a summary happen when it otherwise wouldn't"?

I suggest the following:

If there is no human who could summarize the discussion, that's a problem which we ought to solve, rather than just mitigating the downsides by getting a single (unreliable, exploitable) answer from an opaque LLM.

I think there are, theoretically, some very interesting possibilities for ML-assisted tools to help people understand long discussions. Perhaps a tool could [similar to what steffahn suggested] automatically group posts into categories, highlighting what it thinks are the most salient points and grouping long tangents into smaller boxes. As long as the tool's users can easily examine and correct its decisions, that could be very valuable, including in helping a human understand the discussion enough that they could legitimately write a good summary.

But the current generation of LLMs are opaque, and the opaqueness is a major problem: They don't give you a way to check their work, other than rereading the whole discussion yourself; and if you do detect that they got the wrong answer, they don't give you a way to fix it, other than writing the whole summary yourself. So the low hanging fruit will be to leave the LLM summary exactly as-is, and then some people will read the LLM summary instead of reading the discussion, and in response, some participants will learn how to post in a way that makes the LLM highlight their own posts rather than others (whether maliciously or, maybe worse, innocently). Of course, similar perverse incentives exist with human summarists, but I'm not enthusiastic about inviting in a batch of new-and-different problems, when the benefits are equally not-necessarily-better.