<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>tagjob &amp;mdash; Sean Barnett</title>
    <link>https://seanbarnett.id.au/tag:tagjob</link>
    <description>Coffee, basketball, programming</description>
    <pubDate>Sun, 07 Jun 2026 18:20:20 +0000</pubDate>
    <image>
      <url>https://i.snap.as/FOPXss01.png</url>
      <title>tagjob &amp;mdash; Sean Barnett</title>
      <link>https://seanbarnett.id.au/tag:tagjob</link>
    </image>
    <item>
      <title>Ingesting the Reference Geoscape Datasets</title>
      <link>https://seanbarnett.id.au/ingesting-geoscape-datasets?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[This post forms part of the ongoing #TagJob project.&#xA;&#xA;In the previous post I introduced two Geoscape datasets that have been made available on the Australian Government&#39;s data.gov.au website: National Roads and Administrative Boundaries. The datasets are distributed in two different formats, neither of which is optimal for my intended spatial processing model. A first task is then to transform the data to a common format, and one that has the right performance characteristics for the project.!--more--&#xA;&#xA;My spatial processing model will cache required meta-data and geometry in RAM, trading significantly higher memory requirements in exchange for significantly faster data access. Loading data into memory requires a high-performance storage engine, and for that I have selected DuckDB.&#xA;&#xA;DuckDB is highly performant in terms of storage and execution, and is further recommended for this application by a trait that might often be seen as a limitation: it&#39;s an embedded database. So, while it can&#39;t do the client-server dance, DuckDB will deliver data to my application without an intermediate network and the overheads that brings. Better still, DuckDB&#39;s spatial extension - and particularly GDAL integration - make it reasonably trivial to ingest both National Roads in GDB format and Administrative Boundaries in SHP format. For example:&#xA;create table mapfeaturestatepolygon as&#xA;select from STRead(&#39;ACTSTATEPOLYGONshp.dbf&#39;);&#xA;However, the code I&#39;ve written does get a just little more complicated. Firstly, the datasets are distributed in a hierarchical directory structure, sometimes with separate files (or actually sets of files) for each state or territory. So I&#39;m fishing through the directory hierarchy for those files, and then joining their contents into single tables.&#xA;&#xA;And secondly, I have elected to &#34;normalise out&#34; coded values and recurring text values (e.g. road names), replacing them with integer foreign keys. My rationale is thus:&#xA;&#xA;this is how I&#39;ll store the data in memory once loaded (to save space), and so it avoids doing any such conversion during the load&#xA;notwithstanding DuckDBs storage smarts, I&#39;m still hoping for space efficiency on disk&#xA;&#xA;I am initially focusing on the following datasets / layers, but may add more down the track:&#xA;&#xA;National Roads (4,340,757 rows)&#xA;Administrative Boundaries&#xA;  State Polygon (12,844 rows)&#xA;  Local Government Area Polygon (2,210 rows)&#xA;  Locality Polygon (15,782 rows)&#xA;&#xA;The code for this article is in the TagJobSpatial repository here.&#xA;&#xA;On my MacBook Pro M1 Max processor the load takes approximately 1 minute, and the resultant DuckDB database is about 2.5 gigabytes.&#xA;&#xA;Tags: #TagJob #Geospatial #DuckDB]]&gt;</description>
      <content:encoded><![CDATA[<p><em>This post forms part of the ongoing <a href="https://seanbarnett.id.au/tag:TagJob" class="hashtag"><span>#</span><span class="p-category">TagJob</span></a> project.</em></p>

<p>In <a href="https://seanbarnett.id.au/geoscape-datasets-data-gov-au-https-data-gov-au">the previous post</a> I introduced two Geoscape datasets that have been made available on the Australian Government&#39;s <a href="https://data.gov.au">data.gov.au</a> website: National Roads and Administrative Boundaries. The datasets are distributed in two different formats, neither of which is optimal for my intended spatial processing model. A first task is then to transform the data to a common format, and one that has the right performance characteristics for the project.</p>

<p>My spatial processing model will cache required meta-data and geometry in RAM, trading significantly higher memory requirements in exchange for significantly faster data access. Loading data into memory requires a high-performance storage engine, and for that I have selected DuckDB.</p>

<p>DuckDB is highly performant in terms of storage and execution, and is further recommended for this application by a trait that might often be seen as a limitation: it&#39;s an embedded database. So, while it can&#39;t do the client-server dance, DuckDB will deliver data to my application without an intermediate network and the overheads that brings. Better still, DuckDB&#39;s spatial extension – and particularly GDAL integration – make it reasonably trivial to ingest both National Roads in GDB format and Administrative Boundaries in SHP format. For example:</p>

<pre><code>create table map_feature_state_polygon as
select from ST_Read(&#39;ACT_STATE_POLYGON_shp.dbf&#39;);
</code></pre>

<p>However, the code I&#39;ve written does get a just little more complicated. Firstly, the datasets are distributed in a hierarchical directory structure, sometimes with separate files (or actually sets of files) for each state or territory. So I&#39;m fishing through the directory hierarchy for those files, and then joining their contents into single tables.</p>

<p>And secondly, I have elected to “normalise out” coded values and recurring text values (e.g. road names), replacing them with integer foreign keys. My rationale is thus:</p>
<ul><li>this is how I&#39;ll store the data in memory once loaded (to save space), and so it avoids doing any such conversion during the load</li>
<li>notwithstanding DuckDBs storage smarts, I&#39;m still hoping for space efficiency on disk</li></ul>

<p>I am initially focusing on the following datasets / layers, but may add more down the track:</p>
<ul><li>National Roads (4,340,757 rows)</li>
<li>Administrative Boundaries
<ul><li>State Polygon (12,844 rows)</li>
<li>Local Government Area Polygon (2,210 rows)</li>
<li>Locality Polygon (15,782 rows)</li></ul></li></ul>

<p>The code for this article is in the TagJobSpatial repository <a href="https://bitbucket.org/tagsoftware/tagjobspatial/raw/e38c489b7f21061cec5a31ffadd60bafd470d6b2/etc/geoscape_load.sql">here</a>.</p>

<p>On my MacBook Pro M1 Max processor the load takes approximately 1 minute, and the resultant DuckDB database is about 2.5 gigabytes.</p>

<p>Tags: <a href="https://seanbarnett.id.au/tag:TagJob" class="hashtag"><span>#</span><span class="p-category">TagJob</span></a> <a href="https://seanbarnett.id.au/tag:Geospatial" class="hashtag"><span>#</span><span class="p-category">Geospatial</span></a> <a href="https://seanbarnett.id.au/tag:DuckDB" class="hashtag"><span>#</span><span class="p-category">DuckDB</span></a></p>
]]></content:encoded>
      <guid>https://seanbarnett.id.au/ingesting-geoscape-datasets</guid>
      <pubDate>Sun, 07 Jun 2026 12:17:29 +0000</pubDate>
    </item>
    <item>
      <title>Meet the Reference Geoscape Datasets</title>
      <link>https://seanbarnett.id.au/geoscape-datasets-data-gov-au-https-data-gov-au?pk_campaign=rss-feed</link>
      <description>&lt;![CDATA[This post forms part of the ongoing #TagJob project.&#xA;&#xA;The project uses as reference data Geoscape Datasets for Australian roads and administrative boundaries. Over the years, I have worked with these datasets under commercial licence, but happily they are now available for public use under The Australian Government&#39;s Data and Digital Government Strategy. Let&#39;s take a look at them.!--more--&#xA;&#xA;Geoscape National Roads&#xA;&#xA;Note: National Roads © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International license (CC BY 4.0).&#xA;&#xA;The Geoscape National Roads dataset is a digital representation of the road network of Australia, inclusive of bus ways, walking trails, cycleways, and even ferry routes. The dataset has rich metadata, inclusive of road hierarchy (e.g. national highway, arterial, local road) and travel direction. One facet I like is that each row - and its associated geometry - is defined intersection-to-intserection (e.g. an actual road might be broken into multiple rows, with each row being the segment between two intersections).&#xA;&#xA;The open edition of this dataset varies subtly from the commercially licensed edition. Firstly, the metadata is almost (but not quite) identical:&#xA;&#xA;road\id - primary key&#xA;contributor\id&#xA;jurisdiction\control - for example Aboriginal Land Council, Botanical Gardens, Commonwealth, Council, Department of Environment and Conservation, Department of Transport, Dept of Defence, etc&#xA;operator - for example Port Authority, SAG: Forestry SA, SAG: Water, Southern Rural Water, etc&#xA;date\created, date\modified&#xA;national\route, state\route&#xA;full\street\name (upper case)&#xA;street\name (upper case) / street\name\label (title case)&#xA;street\type (upper case) / street\type\label (title case)&#xA;street\suffix (upper case) / street\suffix\label (title case)&#xA;street\alias\name (upper case) / street\alias\type (upper case) / street\alias\suffix (upper case)&#xA;feature\type - one of DUAL CARRIAGEWAY, FERRY ROUTE, MOTORWAY, PATHWAY, RAMP, ROUNDABOUT, SINGLE CARRIAGEWAY, VEHICULAR\TRACK, or null&#xA;hierarchy - one of ACCESS ROAD, ARTERIAL ROAD, BUSWAY, COLLECTOR ROAD, CYCLEPATH, FERRY, FOOTPATH, LOCAL ROAD, NATIONAL OR STATE HIGHWAY, SUB-ARTERIAL ROAD, VEHICLE TRACK, or null&#xA;subtype - BRIDGE, FERRY, FIRE TRAIL, PATHWAY, RAMP, ROAD, ROUNDABOUT, TUNNEL, or null&#xA;ground\relationship - ABOVE GROUND, BELOW GROUND, ON GROUND, or null&#xA;lane\count - range 1-6 or null&#xA;lane\description - ONE, TWO OR MORE, or null (note all three values apply where lanecount is null)&#xA;one\way - ONE WAY, TWO WAY, null&#xA;status - CLOSED, OPERATIONAL, PROPOSED, UNDER CONSTRUCTION, or null&#xA;surface - SEALED, UNSEALED, or null&#xA;trafficability - 2WD, 4WD, or null&#xA;travel\direction - BOTH, FROM TO, TO FROM, null&#xA;speed - 10 - 110, or null&#xA;state, source - ACT, NSW, NT, QLD, SA, TAS, VIC, WA (note: no OT)&#xA;horizontal\accuracy - range 1-250&#xA;shape\length, shape&#xA;&#xA;Secondly, there are more limited data formats and datums. My intention is to work with GDA2020 in GDB format.&#xA;&#xA;Geoscape Administrative Boundaries&#xA;&#xA;Note: Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International license (CC BY 4.0).&#xA;&#xA;The Geoscape Administrative Boundaries is a comprehensive set of national boundaries, including government, statistical and electoral boundaries.&#xA;&#xA;The dataset comprises seven products:&#xA;&#xA;Localities&#xA;Local Government Areas (LGAs)&#xA;Wards&#xA;Australian Bureau of Statistics (ABS) Boundaries&#xA;Electoral Boundaries&#xA;State Boundaries&#xA;Town Points&#xA;&#xA;The ABS Boundaries product is organised in three themes related to 2011, 2016 and 2021 boundary definitions. Of most interest to me is the 2021 ABS Mesh Blocks and Statistical Areas layer within the ABS Boundaries 2021 theme. This is described in detail on the ABS website product comprises:&#xA;&#xA;Australia&#xA;States and Territories (S/T)&#xA;Statistical Areas Level 4 (SA4s)&#xA;Statistical Areas Level 3 (SA3s)&#xA;Statistical Areas Level 2 (SA2s)&#xA;Statistical Areas Level 1 (SA1s)&#xA;Mesh Blocks (MBs)&#xA;Greater Capital City Statistical Areas (GCCSAs) - not part of main structure&#xA;&#xA;Like National Roads, this dataset is available only in limited datums and formats. I am going to work with GDA2020 downloaded in SHP format.&#xA;&#xA;Tags: #TagJob #Geospatial]]&gt;</description>
      <content:encoded><![CDATA[<p><em>This post forms part of the ongoing <a href="https://seanbarnett.id.au/tag:TagJob" class="hashtag"><span>#</span><span class="p-category">TagJob</span></a> project.</em></p>

<p>The project uses as reference data Geoscape Datasets for Australian roads and administrative boundaries. Over the years, I have worked with these datasets under commercial licence, but happily they are now available for public use under The Australian Government&#39;s <a href="https://www.dataanddigital.gov.au">Data and Digital Government Strategy</a>. Let&#39;s take a look at them.</p>

<h2 id="geoscape-national-roads" id="geoscape-national-roads">Geoscape National Roads</h2>

<p><em>Note: National Roads © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International license (CC BY 4.0).</em></p>

<p>The <a href="https://data.gov.au/data/dataset/national-roads">Geoscape National Roads</a> dataset is a digital representation of the road network of Australia, inclusive of bus ways, walking trails, cycleways, and even ferry routes. The dataset has rich metadata, inclusive of road hierarchy (e.g. national highway, arterial, local road) and travel direction. One facet I like is that each row – and its associated geometry – is defined intersection-to-intserection (e.g. an actual road might be broken into multiple rows, with each row being the segment between two intersections).</p>

<p>The open edition of this dataset varies subtly from the commercially licensed edition. Firstly, the metadata is almost (but not quite) identical:</p>
<ul><li>road_id – primary key</li>
<li>contributor_id</li>
<li>jurisdiction_control – for example Aboriginal Land Council, Botanical Gardens, Commonwealth, Council, Department of Environment and Conservation, Department of Transport, Dept of Defence, etc</li>
<li>operator – for example Port Authority, SAG: Forestry SA, SAG: Water, Southern Rural Water, etc</li>
<li>date_created, date_modified</li>
<li>national_route, state_route</li>
<li>full_street_name (upper case)</li>
<li>street_name (upper case) / street_name_label (title case)</li>
<li>street_type (upper case) / street_type_label (title case)</li>
<li>street_suffix (upper case) / street_suffix_label (title case)</li>
<li>street_alias_name (upper case) / street_alias_type (upper case) / street_alias_suffix (upper case)</li>
<li>feature_type – one of DUAL CARRIAGEWAY, FERRY ROUTE, MOTORWAY, PATHWAY, RAMP, ROUNDABOUT, SINGLE CARRIAGEWAY, VEHICULAR_TRACK, or null</li>
<li>hierarchy – one of ACCESS ROAD, ARTERIAL ROAD, BUSWAY, COLLECTOR ROAD, CYCLEPATH, FERRY, FOOTPATH, LOCAL ROAD, NATIONAL OR STATE HIGHWAY, SUB-ARTERIAL ROAD, VEHICLE TRACK, or null</li>
<li>subtype – BRIDGE, FERRY, FIRE TRAIL, PATHWAY, RAMP, ROAD, ROUNDABOUT, TUNNEL, or null</li>
<li>ground_relationship – ABOVE GROUND, BELOW GROUND, ON GROUND, or null</li>
<li>lane_count – range 1-6 or null</li>
<li>lane_description – ONE, TWO OR MORE, or null (note all three values apply where lane_count is null)</li>
<li>one_way – ONE WAY, TWO WAY, null</li>
<li>status – CLOSED, OPERATIONAL, PROPOSED, UNDER CONSTRUCTION, or null</li>
<li>surface – SEALED, UNSEALED, or null</li>
<li>trafficability – 2WD, 4WD, or null</li>
<li>travel_direction – BOTH, FROM TO, TO FROM, null</li>
<li>speed – 10 – 110, or null</li>
<li>state, source – ACT, NSW, NT, QLD, SA, TAS, VIC, WA (note: no OT)</li>
<li>horizontal_accuracy – range 1-250</li>
<li>shape_length, shape</li></ul>

<p>Secondly, there are more limited data formats and datums. My intention is to work with GDA2020 in GDB format.</p>

<h2 id="geoscape-administrative-boundaries" id="geoscape-administrative-boundaries">Geoscape Administrative Boundaries</h2>

<p><em>Note: Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International license (CC BY 4.0).</em></p>

<p>The <a href="https://www.data.gov.au/data/dataset/geoscape-administrative-boundaries">Geoscape Administrative Boundaries</a> is a comprehensive set of national boundaries, including government, statistical and electoral boundaries.</p>

<p>The dataset comprises seven products:</p>
<ul><li>Localities</li>
<li>Local Government Areas (LGAs)</li>
<li>Wards</li>
<li>Australian Bureau of Statistics (ABS) Boundaries</li>
<li>Electoral Boundaries</li>
<li>State Boundaries</li>
<li>Town Points</li></ul>

<p>The ABS Boundaries product is organised in three themes related to 2011, 2016 and 2021 boundary definitions. Of most interest to me is the 2021 ABS Mesh Blocks and Statistical Areas layer within the ABS Boundaries 2021 theme. This is described in detail <a href="https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs/edition-3-july-2021-june-2026/main-structure-and-greater-capital-city-statistical-areas">on the ABS website</a> product comprises:</p>
<ul><li>Australia</li>
<li>States and Territories (S/T)</li>
<li>Statistical Areas Level 4 (SA4s)</li>
<li>Statistical Areas Level 3 (SA3s)</li>
<li>Statistical Areas Level 2 (SA2s)</li>
<li>Statistical Areas Level 1 (SA1s)</li>
<li>Mesh Blocks (MBs)</li>
<li>Greater Capital City Statistical Areas (GCCSAs) – not part of main structure</li></ul>

<p>Like National Roads, this dataset is available only in limited datums and formats. I am going to work with GDA2020 downloaded in SHP format.</p>

<p>Tags: <a href="https://seanbarnett.id.au/tag:TagJob" class="hashtag"><span>#</span><span class="p-category">TagJob</span></a> <a href="https://seanbarnett.id.au/tag:Geospatial" class="hashtag"><span>#</span><span class="p-category">Geospatial</span></a></p>
]]></content:encoded>
      <guid>https://seanbarnett.id.au/geoscape-datasets-data-gov-au-https-data-gov-au</guid>
      <pubDate>Fri, 05 Jun 2026 22:37:11 +0000</pubDate>
    </item>
  </channel>
</rss>