Skip to content

OverflowError: string longer than 2147483647 bytes for large datasets #1223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
israel-cj opened this issue Feb 17, 2025 · 0 comments
Open

Comments

@israel-cj
Copy link

Hi, I want to upload a large dataset 2.7GB and 4.6M features but I get the next error:

Traceback (most recent call last): File "publish_dataset.py", line 62, in <module> publish_dataset() File "publish_dataset.py", line 49, in publish_dataset openml_dataset.publish() File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\openml\base.py", line 135, in publish response_text = openml._api_calls._perform_api_call( File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\openml\_api_calls.py", line 118, in _perform_api_call response = _read_url_files(url, data=data, file_elements=file_elements) File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\openml\_api_calls.py", line 325, in _read_url_files return _send_request( File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\openml\_api_calls.py", line 383, in _send_request response = session.post(url, data=data, files=files, headers=_HEADERS) File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\requests\sessions.py", line 637, in post return self.request("POST", url, data=data, json=json, **kwargs) File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\requests\sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\requests\sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\requests\adapters.py", line 667, in send resp = conn.urlopen( File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\urllib3\connectionpool.py", line 789, in urlopen response = self._make_request( File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\urllib3\connectionpool.py", line 495, in _make_request conn.request( File "C:\Users\20210595\.conda\envs\tableshift\lib\site-packages\urllib3\connection.py", line 455, in request self.send(chunk) File "C:\Users\20210595\.conda\envs\tableshift\lib\http\client.py", line 972, in send self.sock.sendall(data) File "C:\Users\20210595\.conda\envs\tableshift\lib\ssl.py", line 1237, in sendall v = self.send(byte_view[count:]) File "C:\Users\20210595\.conda\envs\tableshift\lib\ssl.py", line 1206, in send return self._sslobj.write(data) OverflowError: string longer than 2147483647 bytes

What is the constraint for the size of datasets in OpenML? I could not find it (maybe I did not look long enough)
Is there a way to avoid such limitations?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant