Abstract: Software reverse engineering plays a crucial role in identifying design patterns and reconstructing software architectures by analyzing system implementations and producing abstract ...
We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose Lossless HTML Cleaning and Two-Step ...