Script package for downloading and parsing 'valgprotkoll'/'møtebok'

Scripts run using PHP. They run in sequence and outputs to file.

All PDFs are cached in this Git repo. So step 2 or step 3 does not require any download.

The summary pages is her:

https://hnygard.github.io/valgprotokoller/

The JSON files can be seen here:

https://github.com/HNygard/valgprotokoller/tree/master/docs/data-store/json

Requirements for running

php
pdftotext (step 1 / step 1.2 only)

Ubuntu:

apt install php-cli poppler-utils

Commands

php 1-valgprotokoll-download.php

Reads from urls.txt. Downloads PDFs. Read to txt ()

php 1.2-valgprotokoll-elections-no.php

Reads PDFs in elections.no git repo. Updates Git submodule in PHP script (git submodule update --remote elections-no.github.io)

php 2-valgprotokoll-parser.php

Parses all txt files generated by step 1 / step 1.2. Outputs JSON.
Will ignore any files with errors. Can be turned off with: php 2-valgprotokoll-parser.php throw

php 3-valgprotokoll-html-report.php

Created HTML from JSON ouput in step 2.

Grabbing URLs from Google

Search.
Open dev tools and run the following:

   var list = '';
   for (var i = 0; i < a.length; i++) {
       var that = a[i];
   console.log(that);
       if(
           that.href.indexOf('google.com') === -1
           && that.href.indexOf('google.no') === -1
           && that.href.indexOf('youtube.com') === -1
           && that.href.indexOf('blogger.com') === -1
           && that.href.indexOf('googleusercontent.com') === -1
           && that.href.length > 2) {
           list += "\n" + that.href;
       }
   }
   console.log(list + "\n");

Browse to next page and redo.

Name		Name	Last commit message	Last commit date
Latest commit History 534 Commits
Election-stuff @ 9e176a0		Election-stuff @ 9e176a0
arkiv-php		arkiv-php
docs		docs
elections-no.github.io @ 94c817e		elections-no.github.io @ 94c817e
email-engine-data-store/answer-2023		email-engine-data-store/answer-2023
openai-docker-python		openai-docker-python
.gitignore		.gitignore
.gitmodules		.gitmodules
1-valgprotokoll-downloader.php		1-valgprotokoll-downloader.php
1.5-valgprotokoll-email-engine.php		1.5-valgprotokoll-email-engine.php
2-valgprotokoll-parser.php		2-valgprotokoll-parser.php
2-valgprotokoll-parser.php__valgprotokoll-fylkesvalgtinget.php		2-valgprotokoll-parser.php__valgprotokoll-fylkesvalgtinget.php
2-valgprotokoll-parser.php__valgprotokoll-kommune.php		2-valgprotokoll-parser.php__valgprotokoll-kommune.php
3-valgprotokoll-html-report.php		3-valgprotokoll-html-report.php
7-email-engine__parse-to-text.php		7-email-engine__parse-to-text.php
8-email-engine__parse_using_openai.php		8-email-engine__parse_using_openai.php
9-email-engine__make_report.php		9-email-engine__make_report.php
Bremanger 2023 - manuelt oppgjør.xlsm		Bremanger 2023 - manuelt oppgjør.xlsm
README.md		README.md
composer.json		composer.json
composer.lock		composer.lock
entities.json		entities.json
entitiesNonExisting.json		entitiesNonExisting.json
kommunale-domener.csv		kommunale-domener.csv
mx-records.php		mx-records.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Script package for downloading and parsing 'valgprotkoll'/'møtebok'

Requirements for running

Commands

Grabbing URLs from Google

About

Uh oh!

Releases

Packages

Uh oh!

Languages

HNygard/valgprotokoller

Folders and files

Latest commit

History

Repository files navigation

Script package for downloading and parsing 'valgprotkoll'/'møtebok'

Requirements for running

Commands

Grabbing URLs from Google

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages