Survaize is a tool that automatically converts "paper" questionnaires into interactive survey apps. It uses a combination of OCR and generative AI vision models to understand the structure of survey questionnaires in order to generate survey apps compatible with data collection platforms like CSPro, Open Data Kit and Survey Solutions.
- Read PDF questionnaires
- Intelligent survey structure recognition using Generative AI
- Conversion to intermediate JSON format
- Export to popular survey platforms (CSPro, ODK, Survey Solutions)
Eventually this will be published to PyPI but for now follow the instructions in installation.md.
Survaize requires an OpenAI API key. You can specify it using the --api-key parameter or by setting in the OPENAI_API_KEY environment variable.
If you do not already have an account on the OpenAI developer platform you will need to sign up to get a key.
Survaize should also work with other LLM providers that have OpenAI compatible APIs by providing the appropriate API URL and model name via the --api-url and --api-model arguments or the OPENAI_API_URL and OPENAI_MODEL environment variables. Note that only LLMs that support vision will work.
To use Azure OpenAI you will need to specify the key, URL, API version and deployment name. For example:
OPENAI_API_KEY="XXXXXXXXXXXXXXXXXXXXXXXX"
OPENAI_API_VERSION="2025-04-01-preview"
OPENAI_API_URL="https://myazuredeploy-openai.openai.azure.com/"
OPENAI_API_DEPLOYMENT="my-gpt-4.1-deployment"
Alternatively, you can pass those variables as command line arguments to survaize (run survaize --help
for details).
To run Survaize in interactive mode, execute the ui
command:
survaize ui
This will start a local web server and open the Survaize UI in your default web browser. You can then upload a questionnaire, and Survaize will read it, analyze its structure, and display the results in the browser. From there you can then export the questionnaire to CSPro or other formats.
To convert a PDF questionnaire to CSPro using the command line interface (non-interactive mode), you can use the convert
command. The basic syntax is:
survaize convert input_file output_file --format cspro
For example:
survaize convert examples/PopstanHouseholdQuestionnaire.pdf output/PopstanHouseholdSurvey --format cspro
will generate a complete CSPro application (dictionary, forms...) in the directory output/PopstanHouseholdSurvey
.
Survaize uses JSON as an intermediate format so JSON files can be used as input or output files. The above command could be split into two using an intermediate JSON file:
survaize convert examples/PopstanHouseholdQuestionnaire.pdf output/PopstanHouseholdSurvey.json --format json
survaize convert output/PopstanHouseholdSurvey.json output/PopstanHouseholdSurvey --format cspro
You can even hand edit the intermediate JSON file before generating the CSPro application.
The CSPro export generates a complete CSPro application including data dictionary, forms, and question text. If using the interactive web UI, the CSPro application will be packaged into a zip file for easy download and you will need to extract it before opening it in CSPro. When using the command line interface, the CSPro export will generate a directory with the CSPro application files.
The ODK export generates an xlsform file that can be used with ODK Collect or other ODK-compatible tools. The xlsform will include the questionnaire structure and question text.
The Survey Solutions export generates a zip file containing a JSON file with the questionnaire structure. The Survey Solutions Designers only supports importing files if you are logged in as an administrator. When logged in as an administrator, click on the control panel
button on the top and then Restore Questionnaire
on the left side. You can then upload the zip file generated by Survaize. This not a great solution since most people use the public Survey Solutions Designer and do not have admin access. If this a problem for you, please bring it up with the Survey Solutions team.
This project uses Python and UV as the package manager. To install see installation.md.
For development workflows, see development.md.
For instructions on publishing to PyPI, see publishing.md.
MIT
- Evals (in progress)
- Correctly handle location question type (produce two fields in CSPro)
- Fills in CAPI question text
- Questionnaire edits in the UI
- Combo box questions (numeric/text/date/location with DK options)
- Other (specify) and other write-ins on single/multi-select
- Matrix/table questions
- Partial date questions (e.g. month/year)
- Multiple language support