Understanding Open Data: Data Analysis
Working with Data
Open data can be used to answer questions, discover
insights, and create solutions. Here are four main aspects to consider when working with open data or data in general:
1. Data Discovery: Find sources of open data and search for related datasets.
2. Data Comprehension: Understand how datasets are structured and what information they contain.
3. Data Analysis: Use tools and methods to help analyze the data.
4. Data Presentation: Create visualizations and charts.
In this third installment of our Working With Data series we further build on our Understanding Open Data: The Citizen's Guide Story. We will provide you with information
to enhance your understanding of data and introduce some basic skills to
help you use open data. This Story is focused on introducing you to the process
of data analysis using open data.
3. Data Analysis
What is Data Analysis?
Data analysis is a process of preparing and working with data to gain insights.
You may use data analysis to, for example,
- calculate birth and death statistics for in a region Nova Scotia;
- explore the amount of grant funds allotted to organizations in Nova Scotia for greenhouse gas reductions; or
- discover what year saw the highest number of prospective immigrants in particular county.
Data analysis makes it possible to expand your knowledge on a particular topic to inform business, policy, and personal decisions.
Analysis of Portal Data
Many
of the datasets in the Nova Scotia Open Data Portal are aggregated. That is,
the data is “rolled up” or summarized and provides, for example, the annual
graduation rate in Nova Scotia or the total number of licensees in a region. Datasets
may be aggregated to protect the privacy of individuals or to reduce their size
and complexity to make it readable. In some cases, aggregated datasets provide
answers to your questions without the need for further analysis. In other
cases, it may be possible and necessary to analyse the data to gain insights.
The following sections provide some guidance to help
you prepare and start analysing data both within and outside the Open Data
Portal!
Before you begin: What question do you want to answer?
To select the most appropriate datasets and data analysis approach, you need
a clear sense of the question you want to answer. What interests you?
For example, while
preparing to write a business proposal, you may need to answer the following question: "On average, how many babies are
born annually in Shelburne County?"
With this question in mind, you will search for datasets relating to births in Nova Scotia (see Data Discovery) and examine the data (see Data Comprehension) to see whether it can help you answer your question. The following sections will take you through how to answer the baby question and others you may have using the data and built-in functionality available in the portal.
Let's get started!
The portal
has a data tool, the Exploration Canvas, to help you analyse data contained on
the portal. To access the Exploration Canvas, click "Actions" at the
top of the primer page of a dataset and select “Query data” from the drop-down
menu (see screenshot below).
The Exploration Canvas has two main components: the Data Table and the
Visual Query Editor (VQE). Use both the Data Table and Visual Query
Editor to explore and shape the data (see screenshot below).
Data Table
The
dataset, with its rows and columns, appears in the top portion of the Exploration
Canvas. The Data Table can be scrolled up and down, side to side, and page to
page. In the Data Table, you can scroll through the data side-to-side and up-and-down. For larger datasets, there may be multiple pages you can toggle through. There is a search bar at the top right of page to search all the cells of
the dataset; only the rows containing the word or phrase will display. Click on
the Export button to the right of the search bar to export data contained in
the table. For quick access to functions such as filter and sort, click on the
three dots to the right of a column name (see screenshot below).
Visual Query Editor
You
may choose to bypass the Data Table and do all your data querying directly in
the Visual Query Editor. Select the filter, group and aggregate, and column
manager icons at the bottom left of the page to explore functionality. To
adjust amount of space the Visual Query Editor takes on the page, click and
drag to resize the bottom section (see screenshot below).
Functionality
The
Exploration Canvas allows you to filter, shape, and calculate basic descriptive
statistics. Before diving into analysis, it is useful to understand what
functions are available through the Exploration Canvas and what they mean. Functions
include filter, sort, change column order, group, aggregate, and exclude column
from query. Quick start videos
are available to teach you the basics.
Filter: Filtering specific columns helps you reduce the amount of
visible data to help you focus on a subset or portion of the data. For example,
if you are only interested in the data relating to a particular county or a
time, you can filter the data to ensure you only see the data of interest.
Sort: To sort is to arrange data in a specific column in a
meaningful way. For example, you can sort a column of text-based data alphabetically,
numeric data from lowest to highest (Sort Ascending), or the date from most to
least current (Sort Descending). You can use the data table to sort single
columns or use the Column Manager (described below) in the Visual Query Editor
to sort multiple columns.
Group: By grouping a dataset by a column, it’s possible to summarize data
within a column. For example, you can group data by year. The group function is
often used in conjunction with the aggregate function.
Aggregate: Aggregating
data allows you to combine data into summary statistics such as average, count,
sum, and maximum or minimum. For example, you can discover the number of babies
born within a certain period.
Column
Manager: There may be columns included in a dataset
that are not useful to you. You can manage the columns included in the data through
the Column Manager function. You can also sort by multiple columns by specifying
the sort order using the Column Manager.
Data Analysis Examples
The following are three examples of data analysis that can be
completed on the portal using the built-in functionality of the Exploration
Canvas.
Babies Born in Nova Scotia
To answer to the question, “On average, how many babies are
born annually in Shelburne County?," navigate to the NS
Births and Deaths with Rates and Natural Increase dataset. Using the Visual Query Editor of the Exploration Canvas:
- Filter by the County column and select Shelburne.
- Click Apply.
- Aggregate the Live Births column and select the “average of rows” calculation.
- Click Apply.
- View the Data Table for the results.
In Shelburne, for the period including 2014 through
to 2021, the average number of births per year was 114.2. To complete
additional calculations, such as the average number of deaths during same
period in Shelburne, select an additional column to aggregate. The screenshot
below shows the average of both births and deaths in Shelburne County 2014-2021.
Funding granted
If you need to know how much funding to potentially expect if you
apply for a Low Carbon Communities Grant, you can explore the support provided
to other organizations in the past through the Low
Carbon Communities Grant Recipients dataset.
- Group by the Category column.
- Aggregate by the # Amount Awarded by Program column and select the “minimum of” calculation.
- Add the # Amount Awarded by Program column twice more, selecting the “maximum of” and then “average of rows” calculations.
- Click Apply.
- View the Data Table for the results.
The Data Table displays the minimum, maximum,
and average amounts awarded by the program to organizations by Category (e.g.,
Advanced Buildings). See screenshot below.
Prospective immigrants
Prospective immigrants to Nova Scotia
indicate their intended county of residence when completing their application
for a permanent resident visa. To find out what year saw the highest number of
prospective immigrants in Cape Breton County:
- Filter the County Where Nominees Intend to Settle dataset by the column County and select Cape Breton.
- Click Apply.
- Sort Nominees column in descending order.
- Click Apply
The results of this query indicate that more
immigrants indicated they intended to reside in Cape Breton in 2022 than in any
other year (query includes years 2012 to 2022).
Return to your work later?
If you want to continue to explore the
dataset later but don’t want to have to recreate your work, you can 1) export
the data contained in the data table, 2) sign into the portal and save your data
view, or 3) copy the view’s URL.
Export
To explore a derived view using your own
software, you can export it. Click Export at the top right of the screen and
select your preferred data format (e.g., CSV, JSON) or copy the API endpoint.
Save
If
you have created an account and are signed into the Open Data Portal, you can name and
create a derived view and return to it later (see screenshot below). Anyone can
sign up for an account with the Nova Scotia Open Data Portal. You can also
"publish" your derived view, which will allow you to create
visualizations based on your derived views (see Data Presentation). Derived views, whether in
draft or published, created by community members are private and not visible to
the public.
Copy URL
If
you prefer not to export your derived view or sign in and save the view, there
is another option. The URL captures the exploration parameters you set while
using the Exploration Canvas. Click the address bar at the top of your browser,
select the entire URL and copy it. By saving or sharing that URL, you can return
to the view later or allow someone else to see the derived view.
Data analysis outside the Open Data Portal
While
some basic descriptive statistics are possible to calculate on the Nova Scotia
Open Data Portal using the tools and procedures described above, you may need
to do more to answer your question.
Extracting
the data from the portal for use in other applications may be necessary if you
want to combine data from other sources (e.g., Canadian census, research data)
or to complete more complex statistical techniques (e.g., linear regression, analysis
of variance). There are a number of tools available for data analysis
(e.g., QGIS, Google Sheets, Microsoft Excel, Tableau Desktop, R). There are three main ways to extract data from the open data portal: export,
connect via OData, and connect via API.
Export Dataset
Click "Export" to download a dataset in an available format (e.g., CSV, CSV for Excel, SHP)
Connect via OData
Click "..." and select "Access Data via OData" to connect directly
from a tool of your choice to a dataset on the portal (for more, go to Open a Socrata Dataset In...)
Connect via API
Click "API" to gain programmatic access to data through an API (for more, go to SODA Developers)
Use datasets available on the Nova
Scotia Government's Open Data Portal to try analyzing open data!
Nova Scotia Open Data Portal Survey
Thank
you for visiting the Nova Scotia Open Data Portal. Please click on the
link above to provide us information on how you use open data, your
visit today, and your use of open data portals in general. This
information will help to inform us on ways we can improve the portal to
better meet visitor's needs. Your survey responses are anonymous.