-
Notifications
You must be signed in to change notification settings - Fork 195
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When a PDF has a table, rga does not find matches for foo.*bar
where bar
is in a later column of the same row than foo
.
To Reproduce
All the PDFs I have are my financial documents, so I found a publicly accessible one that shows the difference:
$ wget -o /dev/null https://assets.accessible-digital-documents.com/uploads/2017/01/sample-tables.pdf
$ ls
sample-tables.pdf
$ rga 'Financial.*22.5'
As you can see this produces nothing, but actually there is something to be found, if you consider the .*
to span columns:
$ pdfgrep 'Financial.*22.5' -r
./sample-tables.pdf:Policy functions Financial 22.5 30.57
./sample-tables.pdf:Policy functions Financial 22.5 30.57
./sample-tables.pdf:Policy functions Financial 22.5 30.57
Operating System and Version
Xubuntu 24.04 LTS
Output of rga --version
ripgrep-all 0.10.9
What I searched
I did search open issues for "table" and "column" before posting. Found #232 which did not help me. (Edit: I wasn't quite sure if that was the same issue). Apologies if there already was a posted issue and I missed it.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working