Skip to content

rsssf/scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

football.db RSSSF (Rec.Sport.Soccer Statistics Foundation) tools & scripts

command line tools

mirror

use mirror to mirror the complete rsssf.org website (about 40 000+ .html pages - about 800 MB) by following (and recording) all internal links. the command will download (and cache all .html pages converted to utf-8) and build-up a local (sqlite) mirror.db (about 20 MB) with the link structure (ingoing and outgoing) & titles using a pages and links table.

Note

with a one second delay between downloads you can expect a run of 40 000 times 2 seconds, that is, about 22 hours.

$ ruby mirror/mirror.rb

note - the web pages get (by default) cached in ./cache

Tip

Use the report command in the mirror tool to generate a page & diretory statistics summary. Example:

 $ ruby mirror/report.rb

resulting in mirror/SUMMARY.md.

Or use the export command to export all pages to datasets in the comma-separated values (.csv) format. Example:

 $ ruby mirror/export.rb

resulting in pages_html.csv, pages_html_404.csv, pages_pdf.csv, pages_other.csv in the tmp-mirror/ directory.

prepare

use prepare to download (if not cached or forced) and convert tables (in .html to .txt)

pass in a pages config (e.g. eng, de, worldcup, etc.) with a list of table files in a comma-separated values (csv) file.

$ ruby prepare/prepare.rb eng
$ ruby prepare/prepare.rb --force eng       ## (force) redownload all

note - the web pages get (by default) cached in ./cache and the converted tables (in .txt ) get written to the default outdir ../tables

tip: see https://github.com/rsssf/tables for a public online copy / mirror of converted tables in .txt (preserving the original format).

fmtfix

use fmtfix to convert .txt tables (original format only in .txt) to .txt pages (applied "autofixes" for football.txt parsing)

pass in (i) individual table files e.g eng2010.txt or 34f.txt or (ii) a pages config (e.g. eng, de, worldcup, etc.) with a list of table files in a comma-separated values (csv) file.

$ ruby fmtfix/fmtfix.rb eng2010.txt eng2011.txt
$ ruby fmtfix/fmtfix.rb eng

note - the outdir for pages config default to ./tmp-<slug> e.g. eng becomes ./tmp-eng and so on; for individual table files the outdir defaults to ./tmp-fmtfix

tip: see https://github.com/rsssf/clubs, https://github.com/rsssf/world, https://github.com/rsssf/worldcup, for public online copies / mirrors for .txt pages with applied "autofixes" for football.txt parsing (look inside the /pages directories).

About

football.db RSSSF (Rec.Sport.Soccer Statistics Foundation) tools & scripts

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages