Skip to content

jhandley/survaize

Repository files navigation

Survaize

Survaize Robot Logo

Survaize is a tool that automatically converts "paper" questionnaires into interactive survey apps. It uses a combination of OCR and generative AI vision models to understand the structure of survey questionnaires in order to generate survey apps compatible with data collection platforms like CSPro, Open Data Kit and Survey Solutions.

Features

  • Read PDF questionnaires
  • Intelligent survey structure recognition using Generative AI
  • Conversion to intermediate JSON format
  • Export to popular survey platforms (CSPro, ODK, Survey Solutions)

Installation

Eventually this will be published to PyPI but for now follow the instructions in installation.md.

Setup

Survaize requires an OpenAI API key. You can specify it using the --api-key parameter or by setting in the OPENAI_API_KEY environment variable.

If you do not already have an account on the OpenAI developer platform you will need to sign up to get a key.

Survaize should also work with other LLM providers that have OpenAI compatible APIs by providing the appropriate API URL and model name via the --api-url and --api-model arguments or the OPENAI_API_URL and OPENAI_MODEL environment variables. Note that only LLMs that support vision will work.

To use Azure OpenAI you will need to specify the key, URL, API version and deployment name. For example:

OPENAI_API_KEY="XXXXXXXXXXXXXXXXXXXXXXXX"
OPENAI_API_VERSION="2025-04-01-preview"
OPENAI_API_URL="https://myazuredeploy-openai.openai.azure.com/"
OPENAI_API_DEPLOYMENT="my-gpt-4.1-deployment"

Alternatively, you can pass those variables as command line arguments to survaize (run survaize --help for details).

Running

Interactive Mode

To run Survaize in interactive mode, execute the ui command:

survaize ui

This will start a local web server and open the Survaize UI in your default web browser. You can then upload a questionnaire, and Survaize will read it, analyze its structure, and display the results in the browser. From there you can then export the questionnaire to CSPro or other formats.

Non-Interactive Mode

To convert a PDF questionnaire to CSPro using the command line interface (non-interactive mode), you can use the convert command. The basic syntax is:

survaize convert input_file output_file --format cspro

For example:

survaize convert examples/PopstanHouseholdQuestionnaire.pdf output/PopstanHouseholdSurvey --format cspro

will generate a complete CSPro application (dictionary, forms...) in the directory output/PopstanHouseholdSurvey.

Survaize uses JSON as an intermediate format so JSON files can be used as input or output files. The above command could be split into two using an intermediate JSON file:

survaize convert examples/PopstanHouseholdQuestionnaire.pdf output/PopstanHouseholdSurvey.json --format json
survaize convert output/PopstanHouseholdSurvey.json output/PopstanHouseholdSurvey --format cspro

You can even hand edit the intermediate JSON file before generating the CSPro application.

CSPro Export

The CSPro export generates a complete CSPro application including data dictionary, forms, and question text. If using the interactive web UI, the CSPro application will be packaged into a zip file for easy download and you will need to extract it before opening it in CSPro. When using the command line interface, the CSPro export will generate a directory with the CSPro application files.

ODK Export

The ODK export generates an xlsform file that can be used with ODK Collect or other ODK-compatible tools. The xlsform will include the questionnaire structure and question text.

Survey Solutions Export

The Survey Solutions export generates a zip file containing a JSON file with the questionnaire structure. The Survey Solutions Designers only supports importing files if you are logged in as an administrator. When logged in as an administrator, click on the control panel button on the top and then Restore Questionnaire on the left side. You can then upload the zip file generated by Survaize. This not a great solution since most people use the public Survey Solutions Designer and do not have admin access. If this a problem for you, please bring it up with the Survey Solutions team.

Development

This project uses Python and UV as the package manager. To install see installation.md.

For development workflows, see development.md.

For instructions on publishing to PyPI, see publishing.md.

License

MIT

TODO

  • Evals (in progress)
  • Correctly handle location question type (produce two fields in CSPro)
  • Fills in CAPI question text
  • Questionnaire edits in the UI
  • Combo box questions (numeric/text/date/location with DK options)
  • Other (specify) and other write-ins on single/multi-select
  • Matrix/table questions
  • Partial date questions (e.g. month/year)
  • Multiple language support

About

Convert survey questionnaires to electronic/mobile surveys using AI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •