You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the speed for "completion on type name" is not very fast, and one major part of time is spent by unit.codeComplete(). In the codeComplete implementation, it will call the search engine to search the available types with the prefixed string. By digging deeper into the underlying index engine, i found the bottleneck of the index engine is IO. Here is the CPU profiling result for PatternSearchJob, you can see that most of CPU time is spent on reading index files from disk.
86.9% CPU time is to read Document Names from Index file.
9.9% CPU time is to read typeDecls category from Index file.
This conclusion matches the metrics I observed by adding logpoint to the search engine source code. In my macOS (MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports), 2.9 GHz Dual-Core Intel Core i5), I tested with the index file of the JDK library jrt-fs.jar (which has a 46M index file) as an example. PatternSearchJob takes more than 130ms (IO time is not stable, sometimes it could be 200~300ms) to get query results from this JDK index file. And out of the total time, 40ms is used to read the typeDecls category table from the index file, and 90ms is used to read the document name from the index file. Since each dependency has a separate index file, and a typical spring-boot project may have more than 100 index files, the cumulative query time to query all index files can be slow even if we have a search engine with a parallel search strategy.
Optimization:
Reduce the frequency of FileInputStream.open
The profiling shows the operation FileInputStream.open under EntryResult.getDocumentNames() takes 72.6% CPU time. The reason is EntryResult.getDocumentNames() is a high frequency operation. For example, the EntryResult list to search types with prefix S could be 2k+. In current implementation of DiskIndex.readDocumentName(), it will open/close the inputStream to get a document name, that means every call to EntryResult.getDocumentNames() probably trigger a FileInputStream.open operation. The optimization is, can we read each index file only once to get document names of all the EntryResults in that index?
Explore adding cache to DiskIndex to avoid IO.
The core index query class of JDT comes from DiskIndex, looks like its design intention is to save memory space. It uses quite little cache and has to re-read the index file for each index query. In particular, its IO is expensive for some large index files, such as JDK's index files.
The text was updated successfully, but these errors were encountered:
See a typical case for "complete on type name":

Currently the speed for "completion on type name" is not very fast, and one major part of time is spent by
unit.codeComplete()
. In thecodeComplete
implementation, it will call the search engine to search the available types with the prefixed string. By digging deeper into the underlying index engine, i found the bottleneck of the index engine is IO. Here is the CPU profiling result for PatternSearchJob, you can see that most of CPU time is spent on reading index files from disk.This conclusion matches the metrics I observed by adding logpoint to the search engine source code. In my macOS (MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports), 2.9 GHz Dual-Core Intel Core i5), I tested with the index file of the JDK library jrt-fs.jar (which has a 46M index file) as an example. PatternSearchJob takes more than 130ms (IO time is not stable, sometimes it could be 200~300ms) to get query results from this JDK index file. And out of the total time, 40ms is used to read the typeDecls category table from the index file, and 90ms is used to read the document name from the index file. Since each dependency has a separate index file, and a typical spring-boot project may have more than 100 index files, the cumulative query time to query all index files can be slow even if we have a search engine with a parallel search strategy.
Optimization:
The profiling shows the operation
FileInputStream.open
underEntryResult.getDocumentNames()
takes 72.6% CPU time. The reason is EntryResult.getDocumentNames() is a high frequency operation. For example, the EntryResult list to search types with prefixS
could be 2k+. In current implementation of DiskIndex.readDocumentName(), it will open/close the inputStream to get a document name, that means every call to EntryResult.getDocumentNames() probably trigger a FileInputStream.open operation. The optimization is, can we read each index file only once to get document names of all the EntryResults in that index?SearchPattern.findIndexMatches():
EntryResult.getDocumentNames():
DiskIndex.readDocumentName():
The core index query class of JDT comes from DiskIndex, looks like its design intention is to save memory space. It uses quite little cache and has to re-read the index file for each index query. In particular, its IO is expensive for some large index files, such as JDK's index files.
The text was updated successfully, but these errors were encountered: