Understanding Open Data: Data Analysis

Working with Data

Open data can be used to answer questions, discover insights, and create solutions. Here are four main aspects to consider when working with open data or data in general:

1.   Data Discovery: Find sources of open data and search for related datasets.
2.   Data Comprehension: Understand how datasets are structured and what information they contain.
3.   Data Analysis: Use tools and methods to help analyze the data.
4.   Data Presentation: Create visualizations and charts.
In this third installment of our Working With Data series we further build on our Understanding Open Data: The Citizen's Guide Story. We will provide you with information to enhance your understanding of data and introduce some basic skills to help you use open data. This Story is focused on introducing you to the process of data analysis using open data.

3. Data Analysis

What is Data Analysis?

Data analysis is a process of preparing and working with data to gain insights.
You may use data analysis to, for example,
  • calculate birth and death statistics for in a region Nova Scotia;
  • explore the amount of grant funds allotted to organizations in Nova Scotia for greenhouse gas reductions; or
  • discover what year saw the highest number of prospective immigrants in particular county.
Data analysis makes it possible to expand your knowledge on a particular topic to inform business, policy, and personal decisions.
Decorative image: Young adults in discussion

Analysis of Portal Data

Many of the datasets in the Nova Scotia Open Data Portal are aggregated. That is, the data is “rolled up” or summarized and provides, for example, the annual graduation rate in Nova Scotia or the total number of licensees in a region. Datasets may be aggregated to protect the privacy of individuals or to reduce their size and complexity to make it readable. In some cases, aggregated datasets provide answers to your questions without the need for further analysis. In other cases, it may be possible and necessary to analyse the data to gain insights.
The following sections provide some guidance to help you prepare and start analysing data both within and outside the Open Data Portal!

Before you begin: What question do you want to answer?

To select the most appropriate datasets and data analysis approach, you need a clear sense of the question you want to answer. What interests you?
For example, while preparing to write a business proposal, you may need to answer the following question: "On average, how many babies are born annually in Shelburne County?"

Decorative image: Baby and mom
With this question in mind, you will search for datasets relating to births in Nova Scotia (see Data Discovery) and examine the data (see Data Comprehension) to see whether it can help you answer your question. The following sections will take you through how to answer the baby question and others you may have using the data and built-in functionality available in the portal.

Let's get started!

The portal has a data tool, the Exploration Canvas, to help you analyse data contained on the portal. To access the Exploration Canvas, click "Actions" at the top of the primer page of a dataset and select “Query data” from the drop-down menu (see screenshot below).
The Exploration Canvas has two main components: the Data Table and the Visual Query Editor (VQE). Use both the Data Table and Visual Query Editor to explore and shape the data (see screenshot below).

Data Table

The dataset, with its rows and columns, appears in the top portion of the Exploration Canvas. The Data Table can be scrolled up and down, side to side, and page to page. In the Data Table, you can scroll through the data side-to-side and up-and-down. For larger datasets, there may be multiple pages you can toggle through. There is a search bar at the top right of page to search all the cells of the dataset; only the rows containing the word or phrase will display. Click on the Export button to the right of the search bar to export data contained in the table. For quick access to functions such as filter and sort, click on the three dots to the right of a column name (see screenshot below).

Visual Query Editor

You may choose to bypass the Data Table and do all your data querying directly in the Visual Query Editor. Select the filter, group and aggregate, and column manager icons at the bottom left of the page to explore functionality. To adjust amount of space the Visual Query Editor takes on the page, click and drag to resize the bottom section (see screenshot below).

Functionality

The Exploration Canvas allows you to filter, shape, and calculate basic descriptive statistics. Before diving into analysis, it is useful to understand what functions are available through the Exploration Canvas and what they mean. Functions include filter, sort, change column order, group, aggregate, and exclude column from query. Quick start videos are available to teach you the basics.
Filter: Filtering specific columns helps you reduce the amount of visible data to help you focus on a subset or portion of the data. For example, if you are only interested in the data relating to a particular county or a time, you can filter the data to ensure you only see the data of interest.
Sort: To sort is to arrange data in a specific column in a meaningful way. For example, you can sort a column of text-based data alphabetically, numeric data from lowest to highest (Sort Ascending), or the date from most to least current (Sort Descending). You can use the data table to sort single columns or use the Column Manager (described below) in the Visual Query Editor to sort multiple columns.
Group: By grouping a dataset by a column, it’s possible to summarize data within a column. For example, you can group data by year. The group function is often used in conjunction with the aggregate function.
Aggregate: Aggregating data allows you to combine data into summary statistics such as average, count, sum, and maximum or minimum. For example, you can discover the number of babies born within a certain period.
Column Manager: There may be columns included in a dataset that are not useful to you. You can manage the columns included in the data through the Column Manager function. You can also sort by multiple columns by specifying the sort order using the Column Manager.

Data Analysis Examples

The following are three examples of data analysis that can be completed on the portal using the built-in functionality of the Exploration Canvas.

Babies Born in Nova Scotia

To answer to the question, On average, how many babies are born annually in Shelburne County?," navigate to the NS Births and Deaths with Rates and Natural Increase dataset. Using the Visual Query Editor of the Exploration Canvas:
  • Filter by the County column and select Shelburne.
  • Click Apply.
  • Aggregate the Live Births column and select the “average of rows” calculation.
  • Click Apply.
  • View the Data Table for the results.
In Shelburne, for the period including 2014 through to 2021, the average number of births per year was 114.2. To complete additional calculations, such as the average number of deaths during same period in Shelburne, select an additional column to aggregate. The screenshot below shows the average of both births and deaths in Shelburne County 2014-2021.

Funding granted

If you need to know how much funding to potentially expect if you apply for a Low Carbon Communities Grant, you can explore the support provided to other organizations in the past through the Low Carbon Communities Grant Recipients dataset.  
  • Group by the Category column.
  • Aggregate by the # Amount Awarded by Program column and select the “minimum of” calculation.
  • Add the # Amount Awarded by Program column twice more, selecting the “maximum of” and then “average of rows” calculations.
  • Click Apply.
  • View the Data Table for the results.
The Data Table displays the minimum, maximum, and average amounts awarded by the program to organizations by Category (e.g., Advanced Buildings). See screenshot below.

Prospective immigrants

Prospective immigrants to Nova Scotia indicate their intended county of residence when completing their application for a permanent resident visa. To find out what year saw the highest number of prospective immigrants in Cape Breton County:
 The results of this query indicate that more immigrants indicated they intended to reside in Cape Breton in 2022 than in any other year (query includes years 2012 to 2022).

Return to your work later?

If you want to continue to explore the dataset later but don’t want to have to recreate your work, you can 1) export the data contained in the data table, 2) sign into the portal and save your data view, or 3) copy the view’s URL.

Export

To explore a derived view using your own software, you can export it. Click Export at the top right of the screen and select your preferred data format (e.g., CSV, JSON) or copy the API endpoint.

Save

If you have created an account and are signed into the Open Data Portal, you can name and create a derived view and return to it later (see screenshot below). Anyone can sign up for an account with the Nova Scotia Open Data Portal. You can also "publish" your derived view, which will allow you to create visualizations based on your derived views (see Data Presentation). Derived views, whether in draft or published, created by community members are private and not visible to the public.

Copy URL

If you prefer not to export your derived view or sign in and save the view, there is another option. The URL captures the exploration parameters you set while using the Exploration Canvas. Click the address bar at the top of your browser, select the entire URL and copy it. By saving or sharing that URL, you can return to the view later or allow someone else to see the derived view.

Data analysis outside the Open Data Portal

While some basic descriptive statistics are possible to calculate on the Nova Scotia Open Data Portal using the tools and procedures described above, you may need to do more to answer your question.
Extracting the data from the portal for use in other applications may be necessary if you want to combine data from other sources (e.g., Canadian census, research data) or to complete more complex statistical techniques (e.g., linear regression, analysis of variance). There are a number of tools available for data analysis (e.g., QGIS, Google SheetsMicrosoft ExcelTableau DesktopR). There are three main ways to extract data from the open data portal: export, connect via OData, and connect via API.

Export Dataset

Click "Export" to download a dataset in an available format (e.g., CSV, CSV for Excel, SHP)

Connect via OData

Click "..." and select "Access Data via OData" to connect directly from a tool of your choice to a dataset on the portal (for more, go to Open a Socrata Dataset In...)

Connect via API

Click "API" to gain programmatic access to data through an API (for more, go to SODA Developers)
Screenshot showing option to download data in various formats
Screenshot showing option to access a dataset via OData by copying the OData Endpoint option
Screenshot showing option to access a dataset via SODA API by copying the API Endpoint

Use datasets available on the Nova Scotia Government's Open Data Portal to try analyzing open data!

Nova Scotia Open Data Portal Survey
Thank you for visiting the Nova Scotia Open Data Portal. Please click on the link above to provide us information on how you use open data, your visit today, and your use of open data portals in general. This information will help to inform us on ways we can improve the portal to better meet visitor's needs. Your survey responses are anonymous.