fix: instruct LLM to trust title/summary over garbled OCR content
Scanned passport PDFs have completely garbled OCR text that makes the LLM think they're not passports, even though the AI-generated title and summary correctly identify them. Added explicit instruction to trust title/summary fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
18
bot.py
18
bot.py
@@ -425,15 +425,19 @@ class DocumentRAG:
|
|||||||
parts.append("") # blank line between docs
|
parts.append("") # blank line between docs
|
||||||
|
|
||||||
parts.append("IMPORTANT INSTRUCTIONS FOR DOCUMENT RESPONSES:\n"
|
parts.append("IMPORTANT INSTRUCTIONS FOR DOCUMENT RESPONSES:\n"
|
||||||
"1. Answer the user's question using ALL the document content above.\n"
|
"1. Answer the user's question using ALL the documents above.\n"
|
||||||
"2. These are FRESH search results — they override anything from chat history.\n"
|
"2. These are FRESH search results — they override anything from chat history.\n"
|
||||||
" If previous messages said 'only one passport' but documents show more, trust the documents.\n"
|
" If previous messages said 'only one passport' but documents show more, trust the documents.\n"
|
||||||
"3. You MUST include a source link for EVERY document you reference.\n"
|
"3. TRUST the document TITLE and SUMMARY — they are AI-generated and accurate.\n"
|
||||||
"4. Format links as markdown: [Document Title](url)\n"
|
" The Content field may be garbled OCR from scanned PDFs (random characters, broken text).\n"
|
||||||
"5. Place the link right after mentioning or quoting the document.\n"
|
" If the title says 'Christian's Passport' and summary says 'passport belonging to Christian',\n"
|
||||||
"6. If a document has no link, skip the link but still reference the title.\n"
|
" then it IS a passport — even if the content looks like gibberish.\n"
|
||||||
"7. Never show raw URLs without markdown formatting.\n"
|
"4. You MUST include a source link for EVERY document you reference.\n"
|
||||||
"8. List ALL matching documents, not just the first one.")
|
"5. Format links as markdown: [Document Title](url)\n"
|
||||||
|
"6. Place the link right after mentioning or quoting the document.\n"
|
||||||
|
"7. If a document has no link, skip the link but still reference the title.\n"
|
||||||
|
"8. Never show raw URLs without markdown formatting.\n"
|
||||||
|
"9. List ALL matching documents, not just the first one.")
|
||||||
return "\n".join(parts)
|
return "\n".join(parts)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user