Technically opening data refers to making the tool that makes the data available for use. This has most typically taken the form of an open data portal, such as data.gov or data.gov.uk. Dataportals.org gives a list of data portals from around the world. Locally, examples of government open data portals include Municipal Money, the Department of Environmental Affairs GID Data Catalogue, or Stats SA's Quarterly Labour Force Survey.
As you can see from those examples and from the list in the link above, there is a high range of variability in current open data in South Africa. There are examples of data catalogues, links to websites where some data is stored, data sitting behind a visualisation tool that may or may not be directly accessible, contact details for where the data can be requested offline, and true open data portals. So what are the standards for technically opening data?
So, to some degree, data's technical openness is completely dependent on the user. If every presumed user of a certain dataset has a computer and access to the internet and the skills to work with spreadsheets and speaks english, then releasing a dataset in CSV format on an open data portal in english is open data.
If some of those users do not have access to the internet, doesn't have a computer, doesn't speak english as a first language, and doesn't have the skills to process and work with a tabular dataset, then open data in this form is not actually open to them.
As mentioned in the beginning of the toolkit, South African is going online at a rapid rate, the skills barrier to a basic tabular dataset is fairly low, and most of our citizens speak conversational english (not to diminish the negative context for why this is the status quo).
This report is a good example of how a city (New York) undertook research to understand "data poverty" and the barriers that exist to reaching everyone with open data.
With this in mind, here is a list of standards from the Open Data Handbook about Technical Openness:
Data should be priced at no more than a reasonable cost of reproduction, preferably as a free download from the Internet. This pricing model is achieved because your agency should not undertake any cost when it provides data for use.
The data should be available as a complete set. If you have a register which is collected under statute, the entire register should be available for download. A web API or similar service may also be very useful, but they are not a substitutes for bulk access.
Re-use of data held by the public sector should not be subject to patent restrictions. More importantly, making sure that you are providing machine-readable formats allows for greatest re-use. To illustrate this, consider statistics published as PDF (Portable Document Format) documents, often used for high quality printing. While these statistics can be read by humans, they are very hard for a computer to use. This greatly limits the ability for others to re-use that data.
Here are a few policies that will be of great benefit:
Keep it simple,
There are many different ways to make data available to others. The most natural in the Internet age is online publication. There are many variations to this model. At its most basic, agencies make their data available via their websites and a central catalog directs visitors to the appropriate source. However, there are alternatives.
When connectivity is limited or the size of the data extremely large, distribution via other formats can be warranted. This section will also discuss alternatives, which can act to keep prices very low.