fix: instruct LLM to trust title/summary over garbled OCR content

Scanned passport PDFs have completely garbled OCR text that makes
the LLM think they're not passports, even though the AI-generated
title and summary correctly identify them. Added explicit instruction
to trust title/summary fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Christian Gick
2026-03-05 16:43:13 +02:00
parent ae059749c4
commit 0d83d3177e

18
bot.py
View File

@@ -425,15 +425,19 @@ class DocumentRAG:
parts.append("") # blank line between docs parts.append("") # blank line between docs
parts.append("IMPORTANT INSTRUCTIONS FOR DOCUMENT RESPONSES:\n" parts.append("IMPORTANT INSTRUCTIONS FOR DOCUMENT RESPONSES:\n"
"1. Answer the user's question using ALL the document content above.\n" "1. Answer the user's question using ALL the documents above.\n"
"2. These are FRESH search results — they override anything from chat history.\n" "2. These are FRESH search results — they override anything from chat history.\n"
" If previous messages said 'only one passport' but documents show more, trust the documents.\n" " If previous messages said 'only one passport' but documents show more, trust the documents.\n"
"3. You MUST include a source link for EVERY document you reference.\n" "3. TRUST the document TITLE and SUMMARY — they are AI-generated and accurate.\n"
"4. Format links as markdown: [Document Title](url)\n" " The Content field may be garbled OCR from scanned PDFs (random characters, broken text).\n"
"5. Place the link right after mentioning or quoting the document.\n" " If the title says 'Christian's Passport' and summary says 'passport belonging to Christian',\n"
"6. If a document has no link, skip the link but still reference the title.\n" " then it IS a passport — even if the content looks like gibberish.\n"
"7. Never show raw URLs without markdown formatting.\n" "4. You MUST include a source link for EVERY document you reference.\n"
"8. List ALL matching documents, not just the first one.") "5. Format links as markdown: [Document Title](url)\n"
"6. Place the link right after mentioning or quoting the document.\n"
"7. If a document has no link, skip the link but still reference the title.\n"
"8. Never show raw URLs without markdown formatting.\n"
"9. List ALL matching documents, not just the first one.")
return "\n".join(parts) return "\n".join(parts)