OpenAI Assistants with citations like【4:2†source】and citeturnXfileY

OpenAI Assistants with citations like【4:2†source】and citeturnXfileY
python
Ethan Jackson

When streaming with OpenAI Assistants

openai.beta.threads.messages.create( thread_id=thread_id, role="user", content=payload.question ) run = openai.beta.threads.runs.create( thread_id=thread_id, assistant_id=assistant_id, stream=True, tool_choice={"type": "file_search"}, ) streamed_text = "" for event in run: if event.event == "thread.message.delta": delta_content = event.data.delta.content if delta_content and delta_content[0].type == "text": text_fragment = delta_content[0].text.value streamed_text += text_fragment yield {"data": text_fragment} if event.event == "thread.run.completed": break

the citations are coming in the formats like 【4:2†source】 or citeturnXfileY

OpenAI weird citation 1 OpenAI weird citation 2

How to fix it?

Answer

The approach I've used was to get the final message after streaming

messages = openai.beta.threads.messages.list(thread_id=thread_id)

and then apply the following regex

def replace_placeholder(match): nonlocal citation_index citation_index += 1 return f"[{citation_index}]" pattern = r"(citeturn\d+file\d+|【\d+:\d+†source】)" citation_index = 0 assistant_reply_cleaned = re.sub(pattern, replace_placeholder, raw_text)

to replace the placeholders (like 【4:2†source】 or citeturnXfileY) with [1], [2], etc

Related Articles