completion: optimize the index engine for the scenario "complete on type name" #1846

testforstephen · 2021-08-19T02:22:09Z

See a typical case for "complete on type name":

Currently the speed for "completion on type name" is not very fast, and one major part of time is spent by unit.codeComplete(). In the codeComplete implementation, it will call the search engine to search the available types with the prefixed string. By digging deeper into the underlying index engine, i found the bottleneck of the index engine is IO. Here is the CPU profiling result for PatternSearchJob, you can see that most of CPU time is spent on reading index files from disk.

86.9% CPU time is to read Document Names from Index file.
9.9% CPU time is to read typeDecls category from Index file.

This conclusion matches the metrics I observed by adding logpoint to the search engine source code. In my macOS (MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports), 2.9 GHz Dual-Core Intel Core i5), I tested with the index file of the JDK library jrt-fs.jar (which has a 46M index file) as an example. PatternSearchJob takes more than 130ms (IO time is not stable, sometimes it could be 200~300ms) to get query results from this JDK index file. And out of the total time, 40ms is used to read the typeDecls category table from the index file, and 90ms is used to read the document name from the index file. Since each dependency has a separate index file, and a typical spring-boot project may have more than 100 index files, the cumulative query time to query all index files can be slow even if we have a search engine with a parallel search strategy.

Optimization:

Reduce the frequency of FileInputStream.open
The profiling shows the operation FileInputStream.open under EntryResult.getDocumentNames() takes 72.6% CPU time. The reason is EntryResult.getDocumentNames() is a high frequency operation. For example, the EntryResult list to search types with prefix S could be 2k+. In current implementation of DiskIndex.readDocumentName(), it will open/close the inputStream to get a document name, that means every call to EntryResult.getDocumentNames() probably trigger a FileInputStream.open operation. The optimization is, can we read each index file only once to get document names of all the EntryResults in that index?

SearchPattern.findIndexMatches():

// It's from org.eclipse.jdt.core.search.SearchPattern.class
public void findIndexMatches(Index index, IndexQueryRequestor requestor, SearchParticipant participant, IJavaSearchScope scope, IProgressMonitor monitor) throws IOException {
	if (monitor != null && monitor.isCanceled()) throw new OperationCanceledException();
	try {
		index.startQuery();
		SearchPattern pattern = currentPattern();
		EntryResult[] entries = pattern.queryIn(index);
		if (entries == null) return;

		String containerPath = index.containerPath;
		char separator = index.separator;
		long start = System.currentTimeMillis();
		for (int i = 0, l = entries.length; i < l; i++) {
			if (monitor != null && monitor.isCanceled()) throw new OperationCanceledException();

			EntryResult entry = entries[i];
			SearchPattern decodedResult = pattern.getBlankPattern();
			decodedResult.decodeIndexKey(entry.getWord());
			if (pattern.matchesDecodedKey(decodedResult)) {
				// TODO (kent) some clients may not need the document names
				String[] names = entry.getDocumentNames(index);
				for (int j = 0, n = names.length; j < n; j++)
					acceptMatch(names[j], containerPath, separator, decodedResult, requestor, participant, scope, monitor);
			}
		}
		Util.verbose(index.getIndexLocation().fileName() + " - cook result: " + (System.currentTimeMillis() - start) + "ms");
	} finally {
		index.stopQuery();
	}
}

EntryResult.getDocumentNames():

// It's from org.eclipse.jdt.internal.core.index.EntryResult.class
public String[] getDocumentNames(Index index) throws java.io.IOException {
	if (this.documentTables != null) {
		int length = this.documentTables.length;
		if (length == 1 && this.documentNames == null) { // have a single table
			Object offset = this.documentTables[0];
			int[] numbers = index.diskIndex.readDocumentNumbers(offset);
			String[] names = new String[numbers.length];
			for (int i = 0, l = numbers.length; i < l; i++)
				names[i] = index.diskIndex.readDocumentName(numbers[i]);
			return names;
		}

		for (int i = 0; i < length; i++) {
			Object offset = this.documentTables[i];
			int[] numbers = index.diskIndex.readDocumentNumbers(offset);
			for (int j = 0, k = numbers.length; j < k; j++)
				addDocumentName(index.diskIndex.readDocumentName(numbers[j]));
		}
	}

	if (this.documentNames == null)
		return CharOperation.NO_STRINGS;

	String[] names = new String[this.documentNames.elementSize];
	int count = 0;
	Object[] values = this.documentNames.values;
	for (int i = 0, l = values.length; i < l; i++)
		if (values[i] != null)
			names[count++] = (String) values[i];
	return names;
}

DiskIndex.readDocumentName():

// It's from org.eclipse.jdt.internal.core.index.DiskIndex
synchronized String readDocumentName(int docNumber) throws IOException {
	if (this.cachedChunks == null)
		this.cachedChunks = new String[this.numberOfChunks][];

	int chunkNumber = docNumber / CHUNK_SIZE;
	String[] chunk = this.cachedChunks[chunkNumber];
	if (chunk == null) {
		boolean isLastChunk = chunkNumber == this.numberOfChunks - 1;
		int start = this.chunkOffsets[chunkNumber];
		int numberOfBytes = (isLastChunk ? this.startOfCategoryTables : this.chunkOffsets[chunkNumber + 1]) - start;
		if (numberOfBytes < 0)
			throw new IllegalArgumentException();
		this.streamBuffer = new byte[numberOfBytes];
		this.bufferIndex = 0;
		InputStream file = this.indexLocation.getInputStream();
		try {
			file.skip(start);
			if (file.read(this.streamBuffer, 0, numberOfBytes) != numberOfBytes)
				throw new IOException();
		} catch (IOException ioe) {
			this.streamBuffer = null;
			throw ioe;
		} finally {
			file.close();
			this.indexLocation.close();
		}
		int numberOfNames = isLastChunk ? this.sizeOfLastChunk : CHUNK_SIZE;
		chunk = new String[numberOfNames];
		try {
			readChunk(chunk, null, 0, numberOfNames);
		} catch (IOException ioe) {
			this.streamBuffer = null;
			throw ioe;
		}
		this.cachedChunks[chunkNumber] = chunk;
	}
	this.streamBuffer = null;
	return chunk[docNumber - (chunkNumber * CHUNK_SIZE)];
}

Explore adding cache to DiskIndex to avoid IO.
The core index query class of JDT comes from DiskIndex, looks like its design intention is to save memory space. It uses quite little cache and has to re-read the index file for each index query. In particular, its IO is expensive for some large index files, such as JDK's index files.

The text was updated successfully, but these errors were encountered:

testforstephen · 2021-08-23T03:28:25Z

The upstream issue https://bugs.eclipse.org/bugs/show_bug.cgi?id=575562

// cc: @rgrunber

fbricon · 2021-10-11T22:22:43Z

fixed with #1902

testforstephen added the performance label Aug 19, 2021

Eskibear added the completion label Aug 19, 2021

testforstephen added the upstream label Aug 19, 2021

testforstephen self-assigned this Aug 20, 2021

fbricon mentioned this issue Oct 11, 2021

Update TP to 4.22-I-builds/I20211010-1800 #1902

Merged

fbricon closed this as completed Oct 11, 2021

fbricon added this to the Mid October 2021 milestone Oct 11, 2021

fbricon mentioned this issue Oct 11, 2021

completion: calculating textEdit for constructor proposals is slow #1864

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

completion: optimize the index engine for the scenario "complete on type name" #1846

completion: optimize the index engine for the scenario "complete on type name" #1846

testforstephen commented Aug 19, 2021 •

edited

Loading

testforstephen commented Aug 23, 2021

fbricon commented Oct 11, 2021

completion: optimize the index engine for the scenario "complete on type name" #1846

completion: optimize the index engine for the scenario "complete on type name" #1846

Comments

testforstephen commented Aug 19, 2021 • edited Loading

testforstephen commented Aug 23, 2021

fbricon commented Oct 11, 2021

testforstephen commented Aug 19, 2021 •

edited

Loading