Skip to content

gradio 4 critical bug -- all return messages truncated to 65k #6601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
pseudotensor opened this issue Nov 28, 2023 · 14 comments · Fixed by #6693
Closed
1 task done

gradio 4 critical bug -- all return messages truncated to 65k #6601

pseudotensor opened this issue Nov 28, 2023 · 14 comments · Fixed by #6693
Assignees
Labels
API Related to the one of the client libraries or usage of Gradio via API bug Something isn't working Regression Bugs did not exist in previous versions of Gradio

Comments

@pseudotensor
Copy link
Contributor

Describe the bug

In gradio 3 there were never any issues with returning large amounts of text or data. However, in gradio 4 this is totally broken.

This is super critical bug given how broadly it applies to all API calls of any return type (text, audio, etc.).

Related: #6319

that is, even once heartbeat bug is fixed, long output still hits json error mentioned there.

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Reproduction

server:

import gradio as gr
import random
import time

with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("Clear")

    def user(user_message, history):
        return "", history + [[user_message, None]]

    def bot(history):
        bot_message = ' '.join(['a'] * 100000)
        history[-1][1] = bot_message
        yield history

    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        fn=bot, inputs=chatbot, outputs=chatbot, api_name='bot',
    )
    clear.click(lambda: None, None, chatbot, api_name='clear')

demo.queue()
demo.launch()

client:

import time
from gradio_client import Client

client = Client('http://localhost:7860', serialize=False)

args = [[['Who are you?', None]]]
res = client.predict(*tuple(args), api_name='/bot')
print(res)

Error:

(/data/conda/h2ogpt) jon@pseudotensor:~/h2ogpt$ python testchat_nostream_client.py 
Loaded as API: http://localhost:7860/ ✔
Traceback (most recent call last):
  File "/home/jon/h2ogpt/testchat_nostream_client.py", line 7, in <module>
    res = client.predict(*tuple(args), api_name='/bot')
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 305, in predict
    return self.submit(*args, api_name=api_name, fn_index=fn_index).result()
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 1456, in result
    return super().result(timeout=timeout)
  File "/data/conda/h2ogpt/lib/python3.10/concurrent/futures/_base.py", line 445, in result
    return self.__get_result()
  File "/data/conda/h2ogpt/lib/python3.10/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/data/conda/h2ogpt/lib/python3.10/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 869, in _inner
    predictions = _predict(*data)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 894, in _predict
    result = utils.synchronize_async(self._sse_fn, data, hash_data, helper)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/utils.py", line 665, in synchronize_async
    return fsspec.asyn.sync(fsspec.asyn.get_loop(), func, *args, **kwargs)  # type: ignore
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 1075, in _sse_fn
    return await utils.get_pred_from_sse(
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/utils.py", line 342, in get_pred_from_sse
    return task.result()
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/utils.py", line 374, in stream_sse
    resp = json.loads(line[5:])
  File "/data/conda/h2ogpt/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/data/conda/h2ogpt/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/data/conda/h2ogpt/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 70 (char 69)

When checking the details, the issue is that all messages of any kind are truncated to no more than 65k bytes. This was never a problem with gradio 3 and I can see large messages work perfectly fine.

For audio and video this is a DOA for gradio 4 and its API.

Screenshot

No response

Logs

No response

System Info

gradio==4.7.1
gradio_client==0.7.0

Severity

Blocking usage of gradio

@pseudotensor pseudotensor added the bug Something isn't working label Nov 28, 2023
@pseudotensor
Copy link
Contributor Author

pseudotensor commented Nov 28, 2023

If It try to debug the clent for the above running server, and step through stream_sse(), eventually for no obvious reason it hits on client:

Traceback (most recent call last):
  File "/data/conda/h2ogpt/lib/python3.10/concurrent/futures/_base.py", line 445, in result
    return self.__get_result()
  File "/data/conda/h2ogpt/lib/python3.10/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/data/conda/h2ogpt/lib/python3.10/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 869, in _inner
    predictions = _predict(*data)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 894, in _predict
    result = utils.synchronize_async(self._sse_fn, data, hash_data, helper)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/utils.py", line 665, in synchronize_async
    return fsspec.asyn.sync(fsspec.asyn.get_loop(), func, *args, **kwargs)  # type: ignore
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 1075, in _sse_fn
    return await utils.get_pred_from_sse(
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/utils.py", line 342, in get_pred_from_sse
    return task.result()
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/utils.py", line 407, in stream_sse
    req.raise_for_status()
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/httpx/_models.py", line 758, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://localhost:7860/queue/data'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500

and on server:

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fastapi/applications.py", line 1106, in __call__
    await super().__call__(scope, receive, send)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fastapi/routing.py", line 274, in app
    raw_response = await run_endpoint_function(
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio/routes.py", line 668, in queue_data
    blocks._queue.attach_data(body)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio/queueing.py", line 161, in attach_data
    raise ValueError("Event not found", event_id)
ValueError: ('Event not found', 'c4d06de1de954425bc4fe30e64eee7fe')

It's like the client-server connection is unstable to timing choices that come from how the async stream is handled. That's not good.

@pseudotensor
Copy link
Contributor Author

pseudotensor commented Nov 28, 2023

My guess is this is the offending changes: #6069

FYI @aliabid94

It seems to be both unstable (timing dependent) and wrong (truncation)

@pseudotensor
Copy link
Contributor Author

Actually I can't tell where things changed. Maybe make_predict changes by @pngwn

@pseudotensor
Copy link
Contributor Author

pseudotensor commented Nov 28, 2023

If I start messing with the stream_sse() code, e.g. just adding a print or something, I can see random changes in behavior. Sometimes the print shows full correct output, sometimes not. All a mess.

i.e. just this debug:

            async for line in response.aiter_text():
                print(len(line), flush=True)
                if line.startswith("data:"):

Then one sees the last number before failure is all over the place. sometimes 32761, 65529, both fail.

But if put instead:

            async for line in response.aiter_text():
                if len(line) > 65000:
                    print(line, flush=True)
                    continue
                if line.startswith("data:"):

Even though doesn't make it work (not intended), sometimes I see the full message printed with size 65536

but of course the "continue" is not valid and leads to other issues. But without the continue I never see the right length.

@pseudotensor
Copy link
Contributor Author

This seems to work:

            async for line in response.aiter_lines():
                print(len(line), flush=True)
                if len(line) == 0:
                    continue
                if line.startswith("data:"):

i.e. aiter_lines() instead of aiter_text() and ignoring 0 length lines.

Related: encode/httpx#2310

pseudotensor added a commit to h2oai/h2ogpt that referenced this issue Nov 28, 2023
@pseudotensor
Copy link
Contributor Author

But it's not a perfect fix. I sometimes see:


Traceback (most recent call last):
  File "/home/jon/h2ogpt/gradio_utils/grclient.py", line 264, in submit
    self.refresh_client_if_should()
  File "/home/jon/h2ogpt/gradio_utils/grclient.py", line 210, in refresh_client_if_should
    server_hash = self.get_server_hash()
  File "/home/jon/h2ogpt/gradio_utils/grclient.py", line 203, in get_server_hash
    return super().submit(api_name="/system_hash").result()
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 1456, in result
    return super().result(timeout=timeout)
  File "/data/conda/h2ogpt/lib/python3.10/concurrent/futures/_base.py", line 445, in result
    return self.__get_result()
  File "/data/conda/h2ogpt/lib/python3.10/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/data/conda/h2ogpt/lib/python3.10/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 869, in _inner
    predictions = _predict(*data)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 894, in _predict
    result = utils.synchronize_async(self._sse_fn, data, hash_data, helper)
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/utils.py", line 667, in synchronize_async
    return fsspec.asyn.sync(fsspec.asyn.get_loop(), func, *args, **kwargs)  # type: ignore
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/client.py", line 1075, in _sse_fn
    return await utils.get_pred_from_sse(
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/utils.py", line 342, in get_pred_from_sse
    return task.result()
  File "/data/conda/h2ogpt/lib/python3.10/site-packages/gradio_client/utils.py", line 413, in stream_sse
    raise ValueError(f"Unexpected message: {line}")
ValueError: Unexpected message: {"detail":"Not Found"}

pseudotensor added a commit to h2oai/h2ogpt that referenced this issue Nov 29, 2023
@abidlabs
Copy link
Member

Thanks for the detailed report @pseudotensor. Is this only an issue when you are using the Client to make a prediction, or even if you use the Gradio app via UI?

@abidlabs abidlabs added the Regression Bugs did not exist in previous versions of Gradio label Nov 29, 2023
@pseudotensor
Copy link
Contributor Author

So far only seen in API use, and the only changes to work-around so far are in client side.

@pseudotensor
Copy link
Contributor Author

Even with the work-arounds I did so far, still hit this and get hangs:

Traceback (most recent call last):
  File "/home/jon/h2ogpt0/gradio_utils/grclient.py", line 268, in submit
    self.refresh_client_if_should()
  File "/home/jon/h2ogpt0/gradio_utils/grclient.py", line 214, in refresh_client_if_should
    server_hash = self.get_server_hash()
  File "/home/jon/h2ogpt0/gradio_utils/grclient.py", line 207, in get_server_hash
    return super().submit(api_name="/system_hash").result()
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/site-packages/gradio_client/client.py", line 1456, in result
    return super().result(timeout=timeout)
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/site-packages/gradio_client/client.py", line 869, in _inner
    predictions = _predict(*data)
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/site-packages/gradio_client/client.py", line 894, in _predict
    result = utils.synchronize_async(self._sse_fn, data, hash_data, helper)
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/site-packages/gradio_client/utils.py", line 670, in synchronize_async
    return fsspec.asyn.sync(fsspec.asyn.get_loop(), func, *args, **kwargs)  # type: ignore
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/site-packages/gradio_client/client.py", line 1075, in _sse_fn
    return await utils.get_pred_from_sse(
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/site-packages/gradio_client/utils.py", line 343, in get_pred_from_sse
    return task.result()
  File "/home/jon/miniconda3/envs/h2ogpt0/lib/python3.10/site-packages/gradio_client/utils.py", line 417, in stream_sse
    raise ValueError("Did not receive process_completed message.")
ValueError: Did not receive process_completed message.

@pseudotensor
Copy link
Contributor Author

Will probably have to revert back to gradio 3. Just too unstable.

@abidlabs
Copy link
Member

We're working through a few issues related to the Client, e.g. #6602

Btw was just looking at your Client code, and noticed that you have:

import time
from gradio_client import Client

client = Client('http://localhost:7860', serialize=False)

In Gradio 4.x, you don't need to set serialize=False, since the Chatbot component now returns a list of tuples by default. You can just do:

import time
from gradio_client import Client

client = Client('http://localhost:7860')

Will work through the other issues you mentioned here soon.

@pseudotensor
Copy link
Contributor Author

pseudotensor commented Nov 30, 2023

I need sanitize=False in general for many other reasons. E.g. if through API I push http link into text box, it gets converted into {'path': filename} like dict with a filename as a temp file made (not sure by client or server).

I don't want any of those conversions done. And since the client is global to any API calls, I must disable serialize entirely and get strictly un-processed inputs from client -> server and server -> client.

@abidlabs
Copy link
Member

Got it okay, we can explore those issues later, let's get this unblocked first

@freddyaboulton freddyaboulton self-assigned this Dec 4, 2023
@freddyaboulton freddyaboulton added the API Related to the one of the client libraries or usage of Gradio via API label Dec 4, 2023
@freddyaboulton
Copy link
Collaborator

Hi @pseudotensor ! Thanks for all of the helpful comments you left on this thread as you investigated the issue. I think I have a fix in #6693. I would appreciate if you could test it out. You can install the client from that PR with

pip install "gradio-client @ git+https://github.com/gradio-
app/gradio@6887fe4e080647f9892b478c35a625b056eddb31#subdirectory=client/python"

That PR only targets the 65k response length issue and the hearbeat issue (#6319). There are still some other issues with the client we are fixing. And #6556 should hopefully fix a lot of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Related to the one of the client libraries or usage of Gradio via API bug Something isn't working Regression Bugs did not exist in previous versions of Gradio
Projects
None yet
3 participants