LLMs : The Noise And The Value

December 12, 2023

robot

Everyone is into generative AI these days. We all suffer from ChatGPT syndrome. ChatGPT entered into our lives and now nothing will ever be the same. Scammers and entrepreneurs alike, students and marketers, tech bro influencers and YouTubers, everyone has something to say about generative AI. Everyone is a prompt engineer now. The hype is strong. Maybe less than ten months ago, but still…

I rode the hype like a napoleonic steed, demo after demo, tutorial after tutorial. Here, I will try to present what I learned in the process.

First of all, is it me or have we exhausted all the possible demo apps we could imagine ? I did a lot of them. And now, when I look at my AI-dominated Twitter feed (sorry not sorry Elon), I can’t help but notice the same demos or slightly different versions of the same demos over and over again. With ChatGPT, OpenAI opened Pandora’s box, and it’s empty now, it seems. Here are the core ideas you will see again and again:

  1. RAG like demos: Chat with PDF/ website content / YouTube video / whatever. I am guilty here. I recently built (Discute)[www.discute.co] to help people chat with their knowledge base.

  2. AI writing tools: Summarise documents, generate essays, articles, marketing copy thanks to LLMs. Here I seriously do not know why someone would pay for an additional solution when ChatGPT or Bard or whatever proprietary LLM chat interface can already help you do that. I guess the key here is marketing and laziness. With enough marketing spend, you can essentially convince people to pay for your product to get a slightly more convenient experience (a bunch of forms for users to state their preferences) than ChatGPT, capture a nice part of the market and just ride along till people realize how low quality the articles generated are and churn or stay (if they do not care about quality).

  3. AI component to an existing type of product: Here I put all the AI-powered things. AI-powered customer service tool, AI-powered HR tool, etc. The problem with these solutions is that they have no moat since every incumbent in their market can easily add these sprinkles of AI to their product, and they have already figured out the distribution. In the short to medium term, with enough marketing investment thanks to VC backing, you can probably capture enough users to feel like your idea may be working. And if you are an indiehacker with a significant audience on social media, you can probably build something profitable with your small cost structure. But in the long term, the only way to win is to build an intrinsically better product, not just an AI-powered one.

  4. Search + LLM: Here you will find perplexity.ai and similar solutions (for example, the equivalent of SERPAPI but with a semantic search approach). These apps have some value. It is frankly nice to just ask a question and get an answer without having to go through a lot of links first. As a user, you have to stay aware of hallucination issues. But I find it manageable here, since you can check sources most of the time. The only problem for the promoters of these software solutions is that ChatGPT can browse the internet now. Bing Chat and Bard can do that too. So, I don’t see a world where Perplexity takes over Google. Google can do what Perplexity does and it has already figured out the distribution and branding. I do think this will have an impact on Google search ad revenues. I can’t remember the last time I used Google search instead of ChatGPT + browse to get a quick search result on a complex issue. I bet I am not the only one whose search behavior has changed dramatically.

  5. AI Agents: They probably represent the top of the hype curve. The idea behind LLM-based agents is to use LLMs as reasoning engines capable of finding the optimal way to solve some problems, with little or no supervision (autonomous). People are really excited about this. Who wouldn’t want an AI agent that gives you the ability to choose your next destination, buy your airline tickets, and book the best hotel according to your request and your budget ? It would certainly be a huge productivity boost, right ? The only problem: LLMs are not reasoning engines and even the most capable of them (GPT-4) exhibits unpredictable failure modes when prompted with reasoning tasks. Don’t take my word for it. Just read research articles like the one below:

The reversal curse

How can we trust a system that doesn’t know that “A is B” means “B is A”, to make decisions on our behalf ? Prompt engineering, you might be tempted to say, will help us steer the model in the right direction. But given that the same prompt doesn’t even yield the same result all the time, it is legitimate to doubt that. And here is what Yann LeCun said back in February:

"My unwavering opinion on current (auto-regressive) LLMs

  1. They are useful as writing aids.
  2. They are "reactive" & don't plan nor reason.
  3. They make stuff up or retrieve stuff approximately.
  4. That can be mitigated but not fixed by human feedback.
  5. Better systems will come" (Yann LeCun)[https://x.com/ylecun/status/1625118108082995203]

Believers and scam artists

There are a lot of believers. People who genuinely think they can manage hallucinations better than anyone else, through smart prompting strategies. Or people who really think that there is some clever combination of AI agents that can make them useful at scale. And maybe that will be the case in some distant future. But after playing extensively with LLMs, my conviction is that it is not yet the case, not even close. There are so many products out there with powerful marketing videos, but once you try the app, you immediately see the limitations.

Limits of transformers on compositionality

There are scam artists. People who see the hype as an opportunity to make a few bucks, reality be damned. I was recently contacted by a company claiming they found a way to reduce hallucinations in LLM responses. They wanted me to write an article about their product, since I have managed to grow a not so small audience on Tech Medium. I was immediately skeptical. When I tried their solution, I realized they were just using another LLM call to check the consistency of the answers from the LLM. And it was obviously not effective, because hallucinations are intrinsic characteristics inherited from how LLMs are trained. Read my article on AGI if you want to better understand that.

(There will be no AGI)[https://fsndzomga.medium.com/there-will-be-no-agi-d9be9af4428d]

Another company contacted me, claiming they had found a way to essentially get rid of all hallucinations. I was immediately skeptical, but decided to play along out of curiosity. They had vague language like: “we have found the way to build a system that will take all the good things from different LLMs, get rid of all the bad things from them, and finally put all those together in a coherent way.” Quite a project ! The only problem is there is no way to systematically “get rid of all the bad things” from LLMs. I quickly realized these people were bullshitters incapable of even knowing that what they pretended to have accomplished was simply impossible. They just wanted me to do some free work for them. The proof: they vanished into silence as soon as I asked for consulting fees for subsequent work. Always ask for consulting fees. It is the best way to deter scam artists.

So who are the winners ?

There are always winners in a hype cycle. The big winners for now are primarily companies that train and/or serve foundational models (OpenAI, Anthropic, Google, Microsoft…). They are the big winners in terms of VC funding (for startups) and additional revenues (for OpenAI and established companies like Google and Microsoft). That is probably why even Perplexity is trying to position itself as an LLM serving platform too.

Perplexity now offering API access

Still, the additional revenue part is quite shady for now. I recently read an article about Microsoft trying to find ways to reduce the cost of serving LLMs. It hints to the fact that there might be some profitability issues there. At least for now.

Every time you build a SaaS that uses API calls to OpenAI, the more people use your SaaS, the more OpenAI wins, and if you are lucky, you get a piece of the pie in the process. Also, every time you launch a process via an autonomous AI agent that will take minutes or hours, and most importantly, consume a lot of tokens in the process, before giving you a faulty answer, OpenAI makes money. My hypothesis is that part of what fuels the AI agent hype is the fact that foundational model companies and their influential DevRels know well that all that excitement and useless usage is good for their bottom line. So, they ride along.

Other winners are indiehackers or small teams of bootstrappers with a sizeable audience. Since these people are well versed in the art of rapidly producing web apps, they were able to capitalize on the hype with the so-called GPT-wrappers, with little or no marketing spend (because their audience on social media can be useful to quickly validate ideas before pushing ads). And since they do not incur a lot of overhead, they are certainly more profitable than their VC-backed counterparts who rose to sky-high valuations in January, only to have massive layoffs in July.

Incumbents are also winners. They can easily add some sprinkles of LLMs to their existing products and surf on the hype to capture new users.

So what’s next ?

I recently felt like LLMs are becoming a solution, or technology in search of a problem. We want to put them everywhere, even where they are not useful. We are still at the peak of inflated expectations. But I feel like the trough of disillusionment is no longer far away.

Gartner Hype Cycle

A chat interface on top of a foundational model, that’s the most useful application of LLMs in my opinion. With that interface, you can easily add web search, tool usage or code execution (like the data analysis mode of ChatGPT). That interface is powerful and versatile. I get good value for my money by using the premium version of ChatGPT. GPT wrappers that can be used as AI aids, to summarize or paraphrase are also useful. But they depend on a lot of marketing, since everything they do can be done using ChatGPT directly. Their value is basically the UI. The following use cases are underrated: classification, entity recognition, question answering, data labeling, paraphrasing, and machine translation. But LLMs can be very useful in such cases, allowing companies to deploy features that make existing products better. A company that has an email client software can, for example, boost its email classification algorithms by using private LLMs in accordance to RGPD.

A big hurdle for the increased usage of LLMs in enterprises is data privacy. Despite foundational model companies’ reassurances, most companies are still, rightfully, worried about the prospect of exposing so much of their data through an external company’s API. The deployment of open source models in a secure and private way might be a solution. But then comes the cost of serving the model, unless we collectively move into smaller, expert models. I have come to the conclusion that LLMs can help build features of a product. But if all your product has to offer is centered around some API calls to a LLM, then it probably won’t survive after the hype recedes.

If you want to start a powerful and enduring business, you have to think about the problem you want to solve first. During a hype cycle, people think about technology first and try to fit that technology into every problem they imagine might exist. That approach may work, especially at the beginning of the hype, and for first movers. But my opinion is that it is not a robust way to build great startups.