Skip to content

[Transaction.hpp] restore trans->is_closing_cached struct member #2070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: stable
Choose a base branch
from

Conversation

christopherlam
Copy link
Contributor

@christopherlam christopherlam commented Mar 28, 2025

because trans->is_closing is heavily accessed during account balance calculations, therefore it's reasonable to cache the kvp value.

was removed in 8a7c539

time ./gnucash-cli datafile.gnucash -R run --name "Account Summary"

benchmarks improve from 5.5s to 5.0s

@christopherlam christopherlam force-pushed the txn-isclosing-cached branch 3 times, most recently from c1181a0 to 0ed5e12 Compare March 28, 2025 14:06
because trans->is_closing is heavily accessed during account balance
calculations, and it's reasonable to cache the kvp value.
@jralls
Copy link
Member

jralls commented Mar 28, 2025

Where's the profiling data showing that a) there's a problem b) that caching improves it, and c) that caching is better than the
kvp direct-access you wrote?

@christopherlam
Copy link
Contributor Author

Where's the profiling data showing that a) there's a problem b) that caching improves it, and c) that caching is better than the kvp direct-access you wrote?

See first comment which is the only profiling I've done.

#2070 (comment)

@jralls
Copy link
Member

jralls commented Mar 28, 2025

Either I need a bigger file or you need a faster computer. ;-) With a 54MB (uncompressed) book, average of the last 4 of 5 runs each (because the first run was several hundred ms longer in both cases):

stable: PR2070:
real: 1.954 1.905
user: 1.684 1.631
system: 0.385 0.388

Not exactly compelling. BTW, profiling with Xcode instruments shows 75% of the time is loading the db. The report takes only 388 ms.

@christopherlam
Copy link
Contributor Author

I've done more profiling, and the modes are still 5.5s without caching, and 5.1s with caching.
Maybe let's park this. I'm still trying to optimize loading which is heavily dragged by the xaccTransBegin|CommitEdit burden.

@jralls
Copy link
Member

jralls commented Apr 1, 2025

If you're always getting the same times it means that you're always running the same code on the same data file. That I got different results is probably down to my having a different data file and might have to do that it doesn't have any closing transactions because I never close the book in that file. Mind, even your 9% improvement isn't exactly compelling.

Running gnucash-cli wrapped in time isn't profiling. Profiling means using a profiler, and to improve user experience that has to be timed profiling. The Linux way is explained in excruciating detail in that link. The easy way is to get a Mac and use its free (part of Xcode) and very easy to use Instruments application. It's really easy, too: A single app that does all the work and doesn't require a special build that can mess with the compiler and linker optimizations and so yield misleading results.

What you get when you use a profiler is an extended stack trace that shows you how long the program spent in each function of each execution branch. That enables you to quickly identify problem functions--or recognize that there aren't any problem functions and that what you're asking the computer to do just takes that long. That might lead you to reconsider the design to see if you can get the same result with less work; see my comment on loading the file being the dominant time-user. IIRC it's because xaccAccountCommitEdit is sorting its splits after every transaction insertion. Sorting calls xaccTransOrder and one of the things it orders on is isClosingTrans. Some observations:
1: When loading the splits are probably already in order.
2. You changed the Account split container from a GList to a std::vector last year. Sorting linked lists is faster than sorting vectors. That's largely ameliorated by 1, but
3. Even when the splits are already in order a sort is doing a pairwise comparison of every split in the list/vector. The split and transaction comparison functions are pretty expensive so it's slow.

An obvious way to speed that up is calling xaccAccountBeginEdit on every account before starting to load the transactions, but while our XML files are laid out with the transactions last--and have to be because the accounts have to all be created before the transactions can be--there's no way for a SAX parser to know that. The SQL backend knows what in order it's lloading objects so it can do those optimizations.

That aside, caching KVP merely works around GValue's inefficiency. You've already written an actual fix that gets rid of GValue. Use it. It's a maintainability improvement that doesn't need profiling for support.

@christopherlam
Copy link
Contributor Author

From my understanding the xaccAccountBegin/CommitEdit is already happening... maybe in

and the split sorting isn't a hotspot. Whenever I've profiled loading the slowdowns have always been in xaccTransBegin/CommitEdit.

I'll see about porting the GValue->kvp to transaction.cpp functions, but I feel it's more of a code prettify than CPU optimization exercise. IIUC GValue is a simple struct with a datatype and union data see https://github.com/GNOME/glib/blob/f1a498e178709dcd5c26303a5b54ae5007733fc5/gobject/gvalue.h#L113

@jralls
Copy link
Member

jralls commented Apr 16, 2025

From my understanding the xaccAccountBegin/CommitEdit is already happening.

OK, that's good.

IIUC GValue is a simple struct with a datatype and union data

Right, a discriminated or tagged union. It still has to be initialized, have a value copied in, have its type tested, and the value copied out. std::variant is another example of the same with the advantage that C++ can resolve the type testing at compile time and maybe optimize out the extra copying. But using a templated retrieval function instead is faster still.

The other time waster here is resolving the KVP path. We should be able to move that to compile time with some constexpr pixie dust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants