Austrian Open Data Day: Students Co-Create Open Data Ideas!

World Open Data Day is a global event that has the goal to promote the use of open data world-wide and develop new ideas on improving life quality and empowering local communities through the use of open data. In practice, this is the day, when many people get together to exchange their experience, experiment, have fun and develop practical applications and visualisations with open data.

Vienna University of Economics and Business in collaboration with the Linked Data Lab, TU Vienna, hosted a workshop-hackathon targeted at students, who were encouraged to develop and share their open data ideas. It was one in the Open Data Day event series organised in Vienna on March 3rd.

The focus of the workshop was on the Open Data in Austria represented by the Austria’s Open Data portals: data.gv.at and opendataportal.at. Students together with the university researchers shared their ideas, proposed innovative solutions and voted for the most exciting and promising projects to work on: fake news verification with open data, playlist suggestion to work on the open datasets, putting the local data on the map, etc.

In less than 5 hours the participants exercised in creativity, their programming and design skills and managed to already present the first prototypes of their ideas:

1. Open Data Buildings Map Visualisation

One of the student teams focused on a use case to visualise and search for „local data around you“ on a map. They implemented an open data visualisation in the Jupyter Notebook environment. The work was subdivided into aggregation and conversion of the relevant data sources, for instance, historical buildings and their addresses from the Vienna history wiki (Thanks to Bernhard Krabina from KDZ for the help!) another part of the team looked into the visualisation libraries for Python. The final Python script loads the building coordinates from a CSV file, fixes the formatting issues and marks the locations on the OpenStreetMap using folium library.

Here is how the Open Data Buildings Map of Vienna looks like:

The interactive version of this map visualisation along with the source code for its implementation is available online.

2. News Verification Browser Extension

Another favourite open-data-in-use idea was the fake news verification application, the topic which is in high demand everywhere in the world right now. The students designed a mock-up sketch of a Chrome browser extension that will provide the means to crowd-source online news annotations and also contrast the figures provided in the news articles with the analogous figures available from the open data sources. It is indeed a highly ambitious and relevant project idea that contains several research questions on how to enable the correct alignment between the news text and tabular data.

3. Open Data Fact Quiz Generator

The third team further developed the idea of adding the gamification element to the process of exploring and enriching existing open datasets. They took up a challenge of implementing the first prototype of the game with a purpose that would generate quiz questions from the open data tables to enable and bootstrap their annotation. The users would play the game by interacting with the datasets to test and expand their knowledge, and, at the same time, contribute to the community by verifying and improving the quality of open data.

Big thanks to the organisers and participants of the workshop for the great event!

More info about the Open Data Day ’17 in Vienna:
1. opendataday.org
2. open.wien.gv.at/site/open-data-day-2017-veranstaltungen-in-wien
3. digitalcity.wien/offene-daten-begreifen-open-data-day-2017

Dogs in Vienna. Part 3: Open Data Dashboard Tutorial

This is a sample script showing how open data can be analysed and demonstrated using Jupyter Notebooks and Declarative Widgets. We take the dog statistics data in Vienna as a sample use case to demonstrate common approaches to analyse open data. The final dashboard and an interactive development environment (IDE) with all the tutorial notebooks are available from our temporal Jupyter Notebook Server.

Open Data Story

It is useful to define a set of possible research questions that define the goal of the data study and refine them along the way since the availablity of data suggests possible ways to combine and explore it.

Research Questions

  1. Which Vienna districts are most fond of Wiener dogs?
  2. How many Wiener dogs are there in my district?

This time we do not only find answers to our questions, but also create a web dashboard with interactive visualization to share our findings with others.

Get the Data

We described how to load and preprocess the dataset in the previous post. It is often not that trivial as it may seem and involves a lot of data wrangling and debugging in order to find and eliminate possible errors or inconsistences in the dataset.

This step should not be underestimated since it defines the final result of our data analysis.
Remember: „Garbage in, garbage out!“

In [198]:
# Load libraries
import pandas as pd # CSV file processing
import numpy as np # vector and matrix manipulation

# Load the csv file from the open data portal
# dataset description: https://www.data.gv.at/katalog/dataset/stadt-wien_anzahlderhundeprobezirkderstadtwien/resource/b8d97349-c993-486d-b273-362e0524f98c
data_path = 'https://www.wien.gv.at/finanzen/ogd/hunde-wien.csv'
# Look up the row file and specify the dataset format, e.g. delimiters
data = pd.read_csv(data_path, delimiter=';', skiprows=1, thousands=',', encoding='latin-1')

# Correct individual values in the dataset
data.loc[1914, 'Anzahl'] = 1510
data.loc[5347, 'Anzahl'] = 2460

# Carefully select the string separator, including spaces!
separate_breeds = data['Dog Breed'].str.split(' / ', expand=True)
separate_breeds.columns = ["Breed_1", "Breed_2"]
data = pd.concat([data, separate_breeds], axis=1)

# Correct encoding for special characters in german alphabet
def to_utf(x):
    return x.encode('latin-1').decode('utf8') if isinstance(x, str) else x   
data = data.applymap(to_utf)

# Aggregate
data = data.groupby(['DISTRICT_CODE', 'Breed_1'])['Anzahl'].aggregate(np.sum).reset_index()
data.columns = ["District", "Dog_Breed", "Dog_Count"]

# Check the top of the table to make sure the dataset is loaded correctly 
data.head()
Out[198]:
District Dog_Breed Dog_Count
0 90100 Afghanischer Windhund 1
1 90100 Amerikanischer Pit-Bullterrier 1
2 90100 Amerikanischer Staffordshire-Terrier 5
3 90100 Australian Shepherd Dog 3
4 90100 Australian Terrier 1

Show the Data

Interactive Table

In [199]:
# Load library for the interactive visualizations
import declarativewidgets
declarativewidgets.init()

Import widgets

In [200]:
%%html
<link rel="import" href="urth_components/urth-viz-table/urth-viz-table.html" is='urth-core-import'>
<link rel="import" href="urth_components/paper-input/paper-input.html" is='urth-core-import' package='PolymerElements/paper-input'>

Write functions to load and process data in the table

In [202]:
# Match pattern
def filter_by_pattern(df, pattern):
    """Filter a DataFrame so that it only includes rows where the Dog Breed
    column contains pattern, case-insensitive.
    """
    return df[df['Dog_Breed'].str.contains(pattern, case=False)]

# Load data
def dogs_table(pattern=''):
    """Build a DataFrame.   
    """
    # Use match pattern
    df = data.pipe(filter_by_pattern, pattern)     
    return df
In [218]:
%%html
<template is="urth-core-bind">
    <paper-input value="{{pattern}}" label="Filter by dog breed" ></paper-input>
</template>

<template is="urth-core-bind">

    <urth-core-function ref="dogs_table"  
                        arg-pattern="{{pattern}}" 
                        result="{{dogs_table}}" 
                        limit="1600 "
                        delay="500" 
                        auto>
    </urth-core-function>
    
    <urth-viz-table datarows="{{ dogs_table.data }}" 
                    rows-visible="5" 
                    selection="{{dog_selection}}" 
                    columns="{{ dogs_table.columns }}" 
                    selection-as-object>
    </urth-viz-table>
    
</template>

Interactive Bar Chart

In [204]:
# Create Multi-index
district_stats = data.set_index(['District', 'Dog_Breed'])
# Calculate percentages
breed_percents = (district_stats.div(district_stats.sum(axis=0, level=0), level=0) * 100).round(1).reset_index()
# Rename column
breed_percents = breed_percents.rename(columns = {'Dog_Count':'Dog_Percent'})
# Preview
breed_percents.head()
Out[204]:
District Dog_Breed Dog_Percent
0 90100 Afghanischer Windhund 0.2
1 90100 Amerikanischer Pit-Bullterrier 0.2
2 90100 Amerikanischer Staffordshire-Terrier 1.1
3 90100 Australian Shepherd Dog 0.6
4 90100 Australian Terrier 0.2
In [206]:
breed = 'Dackel'
# Filter
breed_districts = breed_percents[(breed_percents['Dog_Breed'] == breed)]
# Remove column
breed_districts = breed_districts.drop('Dog_Breed', axis=1)
# Sort
breed_districts = breed_districts.sort_values(ascending=False, by='Dog_Percent')
# Rename column
breed_districts = breed_districts.rename(columns = {'Dog_Percent':'Percent_of_' + breed})
breed_districts.head()
Out[206]:
District Percent_of_Dackel
454 90400 3.6
1971 91500 3.3
777 90700 3.2
2266 91700 3.2
573 90500 3.1

Create function to load percents per district given the breed

In [215]:
# Filter data
def dogs_bar_chart(breed='Dackel'):
    """Build a DataFrame.   
    """
    # Filter
    df = breed_percents[(breed_percents['Dog_Breed'] == breed)]
    # Use match pattern
#     df = breed_percents.pipe(filter_by_pattern, breed)
    # Remove column
    df = df.drop('Dog_Breed', axis=1)
    # Sort
    df = df.sort_values(ascending=False, by='Dog_Percent')
    # Rename column
    df = df.rename(columns = {'Dog_Percent':'Percent_of_' + breed})  
    return df

Import bar chart widget

In [216]:
%%html
<link rel="import" href="urth_components/urth-viz-bar/urth-viz-bar.html" is='urth-core-import'>
In [217]:
%%html
<template is="urth-core-bind">
    <urth-core-function ref="dogs_bar_chart"  
                        arg-breed="{{dog_selection.Dog_Breed}}" 
                        result="{{df}}" 
                        limit="1600 "
                        delay="500" 
                        auto>
    </urth-core-function>
    <urth-viz-bar xlabel="Districts" ylabel="% to the total number of dogs in the district" datarows="{{df.data}}" columns="{{df.columns}}"></urth-viz-bar>
</template>

Lessons Learned

Dogs in Vienna

Based on the data available we were able to provide comprehensive answers to the set of research questions proposed in the introduction.

  1. The true fans of Wiener dogs live in the 4th district of Vienna.
  2. Wiener dogs are underreprestented in Leoplodstadt (2nd district). They constitute only 2% of the dog population.

Steps

1. Find datasets, e.g. CSV files from open data portals
2. Refine: identify column separator, thousands separator, rows to skip, string encoding, etc.
3. Aggregate: group by different attributes, e.g. district or type, and sum up the counts.
+ 4. Show the row data table for the user to be able to interact with the data.
5. Calculate proportions to the total sum in the group.
6. Slice: filter out rows, e.g. by district or type.
+ 7. Show sorted stats as a bar chart.

Prerequisites

To run this script on a local machine you need:

  • Python 3.4.
  • pandas
  • numpy
  • jupyter_declarativewidgets

Inspired by

  1. Health Inspections Dashboard
  2. tmpnb: deploy temporal Jupyter Notebook server
  3. Wheelan, Charles J. Naked Statistics. 2013

Bird’s-Eye View upon the Austrian Open Data Landscape with wordcloud

To get a grasp of the content available from the Austrian Open Data portals (data.gv.at and opendataportal.at) we generated two word clouds separately for each of the portals. We retrieved the metadata through the WU’s Open Data Portal Watch API and processed all the dataset titles using wordcloud.

Here is the visualization of the total 2,297 (as of 31 December 2016) dataset titles at data.gv.at :

The 373 titles from the independent Austrian Open Data portal opendataportal.at look as follows.

Check out our open source code on GitHub and stay tuned for further updates!

About the project: The CommuniData project aims to bring Open Data closer to local communities, i.e. enhance usability of Open Data and its accessibility for non-expert users. CommuniData is a cooperation between the WU Vienna, the companies cbased and webLyzard, and the Urban Renewal Office of the Vienna’s Second District. It is supported by the Austrian Research Promotion Agency FFG.