

Re LLM summaries: I’ve noticed that too. For some of my classes shortly after the ChatGPT boom we were allowed to bring along summaries. I tried to feed it input text and told it to break it down into a sentence or two. Often it would just give a short summary about that topic but not actually use the concepts described in the original text.
Also minor nitpick but be wary of the term “accuracy”. It is a terrible metric for most use cases and when a company advertises their AI having a high accuracy they’re likely hiding something. For example, let’s say we wanted to develop a model that can detect cancer on medical images. If our test set consists of 1% cancer inages and 99% normal tissue the 99% accuracy is achieved trivially easy by a model just predicting “no cancer” every time. A lot of the more interesting problems have class imbalances far worse than this one too.
In my experience, it is good at simple to medium complexity regex. For the harder ones it starts being quite useless though, at best providing a decent starting point to begin debugging from.