Skip to content

[api-minor] Remove the normalizeWhitespace option in the PDFPageProxy.{getTextContent, streamTextContent} methods (issue 14519, PR 14428 follow-up) #14527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions src/core/document.js
Original file line number Diff line number Diff line change
Expand Up @@ -438,7 +438,6 @@ class Page {
extractTextContent({
handler,
task,
normalizeWhitespace,
includeMarkedContent,
sink,
combineTextItems,
Expand Down Expand Up @@ -469,7 +468,6 @@ class Page {
stream: contentStream,
task,
resources: this.resources,
normalizeWhitespace,
includeMarkedContent,
combineTextItems,
sink,
Expand Down
4 changes: 1 addition & 3 deletions src/core/evaluator.js
Original file line number Diff line number Diff line change
Expand Up @@ -2163,7 +2163,6 @@ class PartialEvaluator {
task,
resources,
stateManager = null,
normalizeWhitespace = false,
combineTextItems = false,
includeMarkedContent = false,
sink,
Expand Down Expand Up @@ -2642,7 +2641,7 @@ class PartialEvaluator {
textChunk.prevTransform = getCurrentTextTransform();
}

if (glyph.isWhitespace && normalizeWhitespace) {
if (glyph.isWhitespace) {
// Replaces all whitespaces with standard spaces (0x20), to avoid
// alignment issues between the textLayer and the canvas if the text
// contains e.g. tabs (fixes issue6612.pdf).
Expand Down Expand Up @@ -3023,7 +3022,6 @@ class PartialEvaluator {
task,
resources: xobj.dict.get("Resources") || resources,
stateManager: xObjStateManager,
normalizeWhitespace,
combineTextItems,
includeMarkedContent,
sink: sinkWrapper,
Expand Down
1 change: 0 additions & 1 deletion src/core/worker.js
Original file line number Diff line number Diff line change
Expand Up @@ -740,7 +740,6 @@ class WorkerMessageHandler {
handler,
task,
sink,
normalizeWhitespace: data.normalizeWhitespace,
includeMarkedContent: data.includeMarkedContent,
combineTextItems: data.combineTextItems,
})
Expand Down
10 changes: 6 additions & 4 deletions src/display/api.js
Original file line number Diff line number Diff line change
Expand Up @@ -1069,8 +1069,6 @@ class PDFDocumentProxy {
* Page getTextContent parameters.
*
* @typedef {Object} getTextContentParameters
* @property {boolean} normalizeWhitespace - Replaces all occurrences of
* whitespace with standard spaces (0x20). The default value is `false`.
* @property {boolean} disableCombineTextItems - Do not attempt to combine
* same line {@link TextItem}'s. The default value is `false`.
* @property {boolean} [includeMarkedContent] - When true include marked
Expand Down Expand Up @@ -1585,11 +1583,13 @@ class PDFPageProxy {
}

/**
* NOTE: All occurrences of whitespace will be replaced by
* standard spaces (0x20).
*
* @param {getTextContentParameters} params - getTextContent parameters.
* @returns {ReadableStream} Stream for reading text content chunks.
*/
streamTextContent({
normalizeWhitespace = false,
disableCombineTextItems = false,
includeMarkedContent = false,
} = {}) {
Expand All @@ -1599,7 +1599,6 @@ class PDFPageProxy {
"GetTextContent",
{
pageIndex: this._pageIndex,
normalizeWhitespace: normalizeWhitespace === true,
combineTextItems: disableCombineTextItems !== true,
includeMarkedContent: includeMarkedContent === true,
},
Expand All @@ -1613,6 +1612,9 @@ class PDFPageProxy {
}

/**
* NOTE: All occurrences of whitespace will be replaced by
* standard spaces (0x20).
*
* @param {getTextContentParameters} params - getTextContent parameters.
* @returns {Promise<TextContent>} A promise that is resolved with a
* {@link TextContent} object that represents the page's text content.
Expand Down
1 change: 0 additions & 1 deletion test/driver.js
Original file line number Diff line number Diff line change
Expand Up @@ -644,7 +644,6 @@ class Driver {
// The text builder will draw its content on the test canvas
initPromise = page
.getTextContent({
normalizeWhitespace: true,
includeMarkedContent: true,
})
.then(function (textContent) {
Expand Down
1 change: 0 additions & 1 deletion test/unit/api_spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -1966,7 +1966,6 @@ describe("api", function () {
it("gets text content", async function () {
const defaultPromise = page.getTextContent();
const parametersPromise = page.getTextContent({
normalizeWhitespace: true,
disableCombineTextItems: true,
});

Expand Down
4 changes: 1 addition & 3 deletions web/pdf_find_controller.js
Original file line number Diff line number Diff line change
Expand Up @@ -551,9 +551,7 @@ class PDFFindController {
return this._pdfDocument
.getPage(i + 1)
.then(pdfPage => {
return pdfPage.getTextContent({
normalizeWhitespace: true,
});
return pdfPage.getTextContent();
})
.then(
textContent => {
Expand Down
1 change: 0 additions & 1 deletion web/pdf_page_view.js
Original file line number Diff line number Diff line change
Expand Up @@ -701,7 +701,6 @@ class PDFPageView {
return finishPaintTask(null).then(() => {
if (textLayer) {
const readableStream = pdfPage.streamTextContent({
normalizeWhitespace: true,
includeMarkedContent: true,
});
textLayer.setTextContentStream(readableStream);
Expand Down