completion: optimize the index engine for the scenario "complete on type name"

See a typical case for "complete on type name":
![image](https://user-images.githubusercontent.com/14052197/129871625-f9d134b0-377b-4b05-9658-bdf80a788f1f.png)

Currently the speed for "completion on type name" is not very fast, and one major part of time is spent by `unit.codeComplete()`. In the `codeComplete` implementation, it will call the search engine to search the available types with the prefixed string. By digging deeper into the underlying index engine, i found the bottleneck of the index engine is IO. Here is the CPU profiling result for PatternSearchJob, you can see that most of CPU time is spent on reading index files from disk. 
- **86.9%** CPU time is to read Document Names from Index file.
- **9.9%** CPU time is to read typeDecls category from Index file.

![image](https://user-images.githubusercontent.com/14052197/129993714-3cfd1553-7de6-47b9-aff3-03bd9301d9db.png)

This conclusion matches the metrics I observed by adding logpoint to the search engine source code. In my macOS (MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports), 2.9 GHz Dual-Core Intel Core i5), I tested with the index file of the JDK library jrt-fs.jar (which has a 46M index file) as an example. PatternSearchJob takes more than 130ms (IO time is not stable, sometimes it could be 200~300ms) to get query results from this JDK index file. And out of the total time, 40ms is used to read the typeDecls category table from the index file, and 90ms is used to read the document name from the index file. Since each dependency has a separate index file, and a typical spring-boot project may have more than 100 index files, the cumulative query time to query all index files can be slow even if we have a search engine with a parallel search strategy.

**Optimization:**
- Reduce the frequency of FileInputStream.open
  The profiling shows the operation `FileInputStream.open` under `EntryResult.getDocumentNames()` takes 72.6% CPU time. The reason is EntryResult.getDocumentNames() is a high frequency operation. For example, the EntryResult list to search types with prefix `S` could be 2k+. In current implementation of [DiskIndex.readDocumentName()](https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/tree/org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/DiskIndex.java#n782), it will open/close the inputStream to get a document name, that means every call to [EntryResult.getDocumentNames()](https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/tree/org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/EntryResult.java#n47) probably trigger a FileInputStream.open operation. The optimization is, can we read each index file only once to get document names of all the EntryResults in that index?

[SearchPattern.findIndexMatches()](https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/tree/org.eclipse.jdt.core/search/org/eclipse/jdt/core/search/SearchPattern.java#n2490):
```java
// It's from org.eclipse.jdt.core.search.SearchPattern.class
public void findIndexMatches(Index index, IndexQueryRequestor requestor, SearchParticipant participant, IJavaSearchScope scope, IProgressMonitor monitor) throws IOException {
	if (monitor != null && monitor.isCanceled()) throw new OperationCanceledException();
	try {
		index.startQuery();
		SearchPattern pattern = currentPattern();
		EntryResult[] entries = pattern.queryIn(index);
		if (entries == null) return;

		String containerPath = index.containerPath;
		char separator = index.separator;
		long start = System.currentTimeMillis();
		for (int i = 0, l = entries.length; i < l; i++) {
			if (monitor != null && monitor.isCanceled()) throw new OperationCanceledException();

			EntryResult entry = entries[i];
			SearchPattern decodedResult = pattern.getBlankPattern();
			decodedResult.decodeIndexKey(entry.getWord());
			if (pattern.matchesDecodedKey(decodedResult)) {
				// TODO (kent) some clients may not need the document names
				String[] names = entry.getDocumentNames(index);
				for (int j = 0, n = names.length; j < n; j++)
					acceptMatch(names[j], containerPath, separator, decodedResult, requestor, participant, scope, monitor);
			}
		}
		Util.verbose(index.getIndexLocation().fileName() + " - cook result: " + (System.currentTimeMillis() - start) + "ms");
	} finally {
		index.stopQuery();
	}
}
```

[EntryResult.getDocumentNames()](https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/tree/org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/EntryResult.java#n47):
```java
// It's from org.eclipse.jdt.internal.core.index.EntryResult.class
public String[] getDocumentNames(Index index) throws java.io.IOException {
	if (this.documentTables != null) {
		int length = this.documentTables.length;
		if (length == 1 && this.documentNames == null) { // have a single table
			Object offset = this.documentTables[0];
			int[] numbers = index.diskIndex.readDocumentNumbers(offset);
			String[] names = new String[numbers.length];
			for (int i = 0, l = numbers.length; i < l; i++)
				names[i] = index.diskIndex.readDocumentName(numbers[i]);
			return names;
		}

		for (int i = 0; i < length; i++) {
			Object offset = this.documentTables[i];
			int[] numbers = index.diskIndex.readDocumentNumbers(offset);
			for (int j = 0, k = numbers.length; j < k; j++)
				addDocumentName(index.diskIndex.readDocumentName(numbers[j]));
		}
	}

	if (this.documentNames == null)
		return CharOperation.NO_STRINGS;

	String[] names = new String[this.documentNames.elementSize];
	int count = 0;
	Object[] values = this.documentNames.values;
	for (int i = 0, l = values.length; i < l; i++)
		if (values[i] != null)
			names[count++] = (String) values[i];
	return names;
}
```

[DiskIndex.readDocumentName()](https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/tree/org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/DiskIndex.java#n782):
```java
// It's from org.eclipse.jdt.internal.core.index.DiskIndex
synchronized String readDocumentName(int docNumber) throws IOException {
	if (this.cachedChunks == null)
		this.cachedChunks = new String[this.numberOfChunks][];

	int chunkNumber = docNumber / CHUNK_SIZE;
	String[] chunk = this.cachedChunks[chunkNumber];
	if (chunk == null) {
		boolean isLastChunk = chunkNumber == this.numberOfChunks - 1;
		int start = this.chunkOffsets[chunkNumber];
		int numberOfBytes = (isLastChunk ? this.startOfCategoryTables : this.chunkOffsets[chunkNumber + 1]) - start;
		if (numberOfBytes < 0)
			throw new IllegalArgumentException();
		this.streamBuffer = new byte[numberOfBytes];
		this.bufferIndex = 0;
		InputStream file = this.indexLocation.getInputStream();
		try {
			file.skip(start);
			if (file.read(this.streamBuffer, 0, numberOfBytes) != numberOfBytes)
				throw new IOException();
		} catch (IOException ioe) {
			this.streamBuffer = null;
			throw ioe;
		} finally {
			file.close();
			this.indexLocation.close();
		}
		int numberOfNames = isLastChunk ? this.sizeOfLastChunk : CHUNK_SIZE;
		chunk = new String[numberOfNames];
		try {
			readChunk(chunk, null, 0, numberOfNames);
		} catch (IOException ioe) {
			this.streamBuffer = null;
			throw ioe;
		}
		this.cachedChunks[chunkNumber] = chunk;
	}
	this.streamBuffer = null;
	return chunk[docNumber - (chunkNumber * CHUNK_SIZE)];
}
```
- Explore adding cache to DiskIndex to avoid IO.
  The core index query class of JDT comes from DiskIndex, looks like its design intention is to save memory space. It uses quite little cache and has to re-read the index file for each index query. In particular, its IO is expensive for some large index files, such as JDK's index files.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

completion: optimize the index engine for the scenario "complete on type name" #1846

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

completion: optimize the index engine for the scenario "complete on type name" #1846

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions