Skip to content

Fix/trailing commas from chatgpt#239

Closed
faceleg wants to merge 5 commits into
microsoft:mainfrom
faceleg:fix/trailing-commas-from-chatgpt
Closed

Fix/trailing commas from chatgpt#239
faceleg wants to merge 5 commits into
microsoft:mainfrom
faceleg:fix/trailing-commas-from-chatgpt

Conversation

@faceleg
Copy link
Copy Markdown

@faceleg faceleg commented Apr 18, 2024

ChatGPT sometimes returns JSON with trailing spaces, which breaks the parser. The repair attempts do not take this into account.

I've copied in the strip trailing comma function from here: https://github.com/nokazn/strip-json-trailing-commas/blob/main/src/index.ts (MIT) and added a test to prove it works:

Trailing commas are stripped here:

const jsonText = stripJsonTrailingCommas(responseText.slice(startIndex, endIndex + 1));

Example broken response:

{
  "items": [
    {
      "id": 1,
      "text": "驳回",
      "exampleSentences": [
        "法官驳回了他的上诉请求。",
        "公司决定驳回他的辞职申请。",
        "政府部门驳回了他的建议。",
      ],
      "partsOfSpeech": "verb"
    },
    {
      "id": 2,
      "text": "驳回",
      "exampleSentences": [
        "他对这个提案的驳回感到失望。",
        "这个决定的驳回引起了公众的不满。",
        "他的建议被驳回了,让他感到沮丧。",
      ],
      "partsOfSpeech": "noun"
    }
  ]
}

Example prompt that generated this response:

You are a helpful vocabulary learning assistant who helps users generate example sentences in Mandarin for language learning. You understand that in Mandarin, words can serve different parts of speech depending on context.

Please find the possible usages this word: 休想, and generate 3 example sentences for each usage.

The sentences should be medium or longer length and complexity of HSK5 or higher. Each sentence must contain the the word. All sentences provided for the word must be unique.

JSON must be returned as an array of objects, with one object per part of speech for the word. You must return valid JSON. The array of sentences must not have a trailing comma.

This is the project I'm using TypeChat on: https://github.com/faceleg/ankiai, forked from https://github.com/mhujer/ankiai.

@faceleg
Copy link
Copy Markdown
Author

faceleg commented Apr 18, 2024

@microsoft-github-policy-service agree

@robgruen
Copy link
Copy Markdown
Contributor

robgruen commented Jun 1, 2026

I'm wary of adding arbitrary regex parsing to the LLM generated text. What if my payload is invalidly formatted JSON (by design), this will, without hesitation modify that text as well potentially breaking my use-case. A more correct way to fix it would be in JSON parsing

LLMs have also gotten much better since this initial PR was submitted, I wonder if this is still occurs as much? Also, there's some merge conflicts. Therefore I will close it out and if you'd like to pursue it please submit a new PR. I wonder if we could prompt engineer this rather than taking indiscriminate action on the text of the response?

@robgruen robgruen closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants