Skip to content

improve String parsing performance with CharSequenceReader #916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Simulant87
Copy link
Contributor

I implemented a CharSequenceReader from scratch, which is noticable faster in parsing a String input than the current default implementation.

performance test implementation below.

Measurement on master branch:
I disabled my CPU turbo boost, to have a better comparison and less variance between the runs. Also with turbo boost enabled the results are comparable, just faster and with more variance.

435ms per parsing on average.

Measurement on improved branch:

354ms per parsing on average.

--> 18% performance improvement.

import java.io.*;
import java.nio.charset.StandardCharsets;

public class Main {

    public static void main(String[] args) throws IOException {
        // TODO: update path to 30mb input file train-v1.1.json from https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset
        InputStream inputStream = new FileInputStream("C:/JSON-java/train-v1.1.json");
        String content = inputStreamToString(inputStream);
        JSONObject object = new JSONObject(content);
        // JVM warm up: 10 iterations
        for (int i = 0; i < 10; i++) {
            object = new JSONObject(content);
            System.out.println("warmup: " + i);
        }

        // measured runs
        long start = System.currentTimeMillis();
        int iterations = 100;
        for (int i = 0; i < iterations; i++) {
            object = new JSONObject(content);
            System.out.println("test: " + i);
        }
        long end = System.currentTimeMillis();

        long totalTime = end - start;
        System.out.println(totalTime + "ms for " + iterations + " iterations of parsing JSONObject. " + totalTime / iterations + "ms per parsing on average.");
    }

    private static String inputStreamToString(InputStream inputStream) throws IOException {
        ByteArrayOutputStream result = new ByteArrayOutputStream();
        byte[] buffer = new byte[1024];
        int length;
        while ((length = inputStream.read(buffer)) != -1) {
            result.write(buffer, 0, length);
        }
        return result.toString(StandardCharsets.UTF_8.name());
    }

}

@stleary
Copy link
Owner

stleary commented Nov 22, 2024

Appreciate the effort, but no, thanks. Improved performance is not itself a compelling reason to modify the code.

@stleary stleary closed this Nov 22, 2024
@Simulant87
Copy link
Contributor Author

I am sad to hear that and in the end I will accept your decision and not submit any further PRs for it, but please hear me out: I totally understand if there is currently no time to review it. But maybe you can give it a try later on.

I know that simplicity is one of the core values of this project, and therefore the default decision is the reject unnecessary changes. But also a performant implementation is one of the main goals, which this change continues to. There are many other projects depending on this library, which I assume would be happy to have this improvement integrated. As this part of the code is heavily called internally, it is not possible to use this change as an external extension. My implementation is straight forward to review and you accepted other performance improvements as well. I would be very happy if you rethink your initial decision maybe later on.

@Simulant87
Copy link
Contributor Author

For everyone interested: The same performance improvement can also be achieved by passing the improved Reader without synchronisation to an explicit JSONTokener in the constructor of the JSONObject. And there is also an open implementation with the same benefits available by apache commons io.

new JSONObject(new JSONTokener(new org.apache.commons.io.input.CharSequenceReader(content)));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants