Open
Description
When profiling my application, I noticed a significant performance difference between the following two usages of ObjectOpenHashSet
:
new ObjectOpenHashSet<>();
and
new ObjectOpenHashSet<>(initialCapacity);
Unfortunately, I can't share the code with you (it's from an actual application), however, I can provide you with the minimalized example:
public class WordsProvider {
public static List<String> getWords() {
List<String> words = new ArrayList<>();
InputStream inputStream = Thread.currentThread().getContextClassLoader().getResourceAsStream("words.txt");
if (inputStream == null) {
throw new RuntimeException("words.txt not found");
}
try (BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream))) {
String line;
while ((line = reader.readLine()) != null) {
words.add(line);
}
} catch (Exception e) {
throw new RuntimeException("Error reading words.txt", e);
}
return words;
}
}
Set<String> words = new ObjectOpenHashSet<>(WordsProvider.getWords());
// case 1: very slow ~40 seconds
Set<String> copy = new ObjectOpenHashSet<>();
for(String word: words) {
if (word != null || !word.isBlank()) {
copy.add(word);
}
}


Set<String> words = new ObjectOpenHashSet<>(WordsProvider.getWords());
// case 2: very fast ~80 ms
Set<String> copy = new ObjectOpenHashSet<>(words.size());
for(String word: words) {
if (word != null || !word.isBlank()) {
copy.add(word);
}
}


I understand that the performance issue was caused because of the bug in our code. However, the performance difference (from 40 seconds down to 80 milliseconds) was quite surprising and felt worth taking a look at it. Please also find the list of words that I used to test these scenarios (8MB or ~1.1M words)
Metadata
Metadata
Assignees
Labels
No labels