How to improve your published open spending data

Suggestions on how to publish better open spending data for use by both people and computers.

1. Publish in CSV format

  • CSV is a universal data exchange format for tabular data and can be read by all spreadsheet software and programming languages
  • Don't use PDF for tabular data - this cannot be accurately processsed by any common software
  • Don't use XLS or XLSX files - not everyone has Microsoft Excel and the formats can cause difficulties for other software

2. Publish easily accessible files

  • Preferably one click to download the file - not a succession of separate pages before you get to the download link
  • Use sensible unique file names which relate to the data eg 'LAname-spend-January-2016' not 'download' for every file!
  • Add a version number to the filename if it has to be republished eg 'LAname-spend-January-2016-v1'

3. Publish clean data

  • Do at least a minimal level of quality control on the data before publishing it *
  • Are the dates in a sensible range? Financial transactions are unlikely to be over 6 years old or several years in the future.
  • Dates should be in a consistent and unambiguous format throughout the file
  • Dates should not in a 5 digit Excel format - who will understand these?
  • Amount fields should only contain numeric amount characters - not £ signs
  • Amount values should normally exclude VAT
  • Key fields (ie date, amount, supplier name) should not be blank or contain #REF
  • Unusual non-ascii characters that cannot easily be translated should be avoided
  • Preferably there should be just one header row at the beginning of the file
  • Preferably the header row should be reasonably consistent across similar files (and without simple spelling errors)
  • Preferably there should not be a total row at the end of the file

4. Publish more data rather than less

  • Publishing just the very basic requirements - Amount, Date and Supplier is not very useful
  • Directorate/Service, Expenditure Type and Procurement/Merchant category are now mandatory fields
  • Adding extra data helps the user and probably involves little extra work to provide it
  • Standard codes eg for SeRCOP or ProClass are helpful but purely internal codes are probably not worth publishing

5. Publish redacted records

  • Redaction is appropriate and necessary in specific circumstances - see pages 7/8 of LGA Local Transparency Guidance document (pdf).
  • Records where data has been redacted should still be published with the sensitive data replaced by text like 'Redacted - personal data'
  • The average overall rate of redaction is around 15% - if you have 0% or 50% then you might wonder why

6. Provide licence and other explanatory information

  • Always provide the licence details - preferably the Open Government Licence
  • Provide additional explanatory notes about the data and any relevant issues eg VAT is included (for some good reason)

7. Make use of published standards and guidance


* AppGov will provide a free quality control service in some circumstances

For more information or to make suggestions please contact info@appgov.org