OBPIH-7884 Test downloaded documents' content#90
Conversation
…d row value capture
…add writeBufferToFile function
…dler in PutawayDetailsPage
| const recipientName = rowValues.recipient?.name; | ||
| if (!_.isNil(recipientName)) { | ||
| await this.recipientSelect.findAndSelectOption(recipientName); | ||
| } |
There was a problem hiding this comment.
There was a typescript error introduced a long time ago, so here's the fix
| "eslint-plugin-playwright": "~1.0.1", | ||
| "eslint-plugin-promise": "~6.0.0", | ||
| "eslint-plugin-simple-import-sort": "~10.0.0", | ||
| "pdfjs-dist": "~3.11.174", |
There was a problem hiding this comment.
Is this https://github.com/mozilla/pdfjs-dist ? It looks like it is no longer supported as of 2024
| return content.items | ||
| .map((item) => ('str' in item ? (item as TextItem).str : '')) | ||
| .join(' '); | ||
| }) |
There was a problem hiding this comment.
So if I'm understanding, this is pulling everything out of the PDF as a String, then we do a String search in pdfContainsValues to see if the PDF contains some text?
ewaterman
left a comment
There was a problem hiding this comment.
The tests themselves look good from what I can tell. The only thing is that the new dependency is deprecated. I don't know if there's a similar alternative with proper support.
A brief google search tells me mozilla pdf.js is a standard solution for reading PDF contents.
The example code I found:
async function extractTextFromPdf(urlOrBuffer) {
const loadingTask = pdfjsLib.getDocument(urlOrBuffer);
const pdf = await loadingTask.promise;
let fullText = "";
// Loop through every page to extract text snippets
for (let i = 1; i <= pdf.numPages; i++) {
const page = await pdf.getPage(i);
const textContent = await page.getTextContent();
// Concatenate individual text items into a single page string
const pageText = textContent.items.map(item => item.str).join(" ");
fullText += pageText + "\n";
}
return fullText;
}
No description provided.