<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Rtree - Tag - Data Dave's Blog</title><link>https://davidwhittingham.com/tags/rtree/</link><description>Rtree - Tag - Data Dave's Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sat, 10 May 2025 09:00:00 +0200</lastBuildDate><atom:link href="https://davidwhittingham.com/tags/rtree/" rel="self" type="application/rss+xml"/><item><title>High grade geospatial processing (feat. crazy fast ducks) 🦆🔥</title><link>https://davidwhittingham.com/posts/performance/</link><pubDate>Sat, 10 May 2025 09:00:00 +0200</pubDate><author><name>Author</name></author><guid>https://davidwhittingham.com/posts/performance/</guid><description><![CDATA[<div class="featured-image">
                <img src="/media/performance/kreuzungen-benchmark.avif" referrerpolicy="no-referrer">
            </div><h2 id="introduction-" class="headerLink">
    <a href="#introduction-" class="header-mark"></a>Introduction 👋</h2><p>So last year, I created a web app called <a href="https://kreuzungen.world" target="_blank" rel="noopener noreferrer">kreuzungen.world</a> that calculates the number of waterways crossed by a gpx route. It was a fun little project to build and led to some interesting encounters.</p>
<figure><a class="lightgallery" href="/media/performance/kreuzungen-screenrecording.avif" title="/media/performance/kreuzungen-screenrecording.avif" data-thumbnail="/media/performance/kreuzungen-screenrecording.avif" data-sub-html="<h2>Kreuzungen - the app that started it all</h2>">
        <img
            
            loading="lazy"
            src="/media/performance/kreuzungen-screenrecording.avif"
            srcset="/media/performance/kreuzungen-screenrecording.avif, /media/performance/kreuzungen-screenrecording.avif 1.5x, /media/performance/kreuzungen-screenrecording.avif 2x"
            sizes="auto"
            alt="/media/performance/kreuzungen-screenrecording.avif">
    </a><figcaption class="image-caption">Kreuzungen - the app that started it all</figcaption>
    </figure>
<p>The app was built using javascript in a way that required no backend.. Thats right, all the data fetching and processing was done in the browser. (No server costs 💸).</p>
<p>It uses the <code>Overpass API</code> to fetch the waterways data from OpenStreetMap and then uses <code>turf.js</code> to calculate the intersections between the route and the waterways. Pretty simple, right?</p>
<p>What&rsquo;s been surprisingly rewarding is seeing people actually use it! From Japan to Argentina to New Zealand to the USA (where one kayaker uses it to track river crossings). It&rsquo;s humbling to see something I made for fun, being used by people across the globe.</p>
<link href="https://unpkg.com/maplibre-gl@5.5.0/dist/maplibre-gl.css" rel="stylesheet" />
<style>
  #kreuzungen-map {
    width: 100%;
    height: 500px;
    margin-bottom: 1.5em;
    border-radius: 8px;
  }
  .maplibregl-popup {
    max-width: 300px;
    font: 12px/20px 'Helvetica Neue', Arial, Helvetica, sans-serif;
  }
</style>

<div id="kreuzungen-map" class="maplibregl-map"></div>

<script src="https://unpkg.com/maplibre-gl@5.5.0/dist/maplibre-gl.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@turf/turf@6.5.0/turf.min.js"></script>
<script>
  
  function getFlagEmoji(countryCode) {
    return countryCode.toUpperCase().replace(/./g, char =>
      String.fromCodePoint(127397 + char.charCodeAt())
    );
  }

  
  function addGlobeProjection(styleJson) {
    return {
      ...styleJson,
      projection: { type: 'globe' }
    };
  }

  
  function updateStyleJson(styleJson, countriesWaterwayData) {
    const updatedStyle = addGlobeProjection(JSON.parse(JSON.stringify(styleJson)));

    const countryColors = {};
    for (const [countryCode, data] of Object.entries(countriesWaterwayData)) {
      if (data.waterways_crossed === 0) {
        countryColors[countryCode] = 'rgba(230, 230, 250, 1)';
      } else {
        const originalLayer = styleJson.layers.find(layer => layer.id === 'countries-fill' && layer.type === 'fill');
        if (originalLayer) {
          const originalColor = originalLayer.paint['fill-color'];
          countryColors[countryCode] = originalColor;
        }
      }
    }

    updatedStyle.layers = updatedStyle.layers.map(layer => {
      if (layer.id === 'countries-fill' && layer.type === 'fill') {
        layer.paint['fill-color'] = [
          'match',
          ['get', 'ADM0_A3'],
          ...Object.entries(countryColors).flat(),
          '#EAB38F'
        ];
      }
      return layer;
    });

    return updatedStyle;
  }
  
  document.addEventListener('DOMContentLoaded', function() {
    const map = new maplibregl.Map({
      container: 'kreuzungen-map',
      center: [0, 20], 
      zoom: 1.2, 
    });
    
    
    function adjustMapForScreenSize() {
      const width = window.innerWidth;
      if (width < 480) {
        map.setZoom(0.8); 
      } else if (width < 768) {
        map.setZoom(1.0); 
      } else {
        map.setZoom(1.2); 
      }
    }
    
    
    map.on('load', adjustMapForScreenSize);
    
    
    window.addEventListener('resize', adjustMapForScreenSize);
    
    
    let isActive = false;
    let rotationInterval;
    
    
    function startRotation() {
      stopRotation(); 
      isActive = false;
      rotationInterval = setInterval(() => {
        if (!isActive) {
          const currentCenter = map.getCenter();
          map.easeTo({
            center: [currentCenter.lng + 2, currentCenter.lat],
            duration: 1000,
            easing: t => t
          });
        }
      }, 300);
    }
    
    
    function stopRotation() {
      isActive = true;
      if (rotationInterval) {
        clearInterval(rotationInterval);
        rotationInterval = null;
      }
    }
    
    
    function resetViewAndRotate() {
      adjustMapForScreenSize(); 
      startRotation();
    }
    
    
    map.on('mousedown', stopRotation);
    map.on('touchstart', stopRotation);
    map.on('dragstart', stopRotation);
    
    
    document.addEventListener('click', (e) => {
      if (!document.getElementById('kreuzungen-map').contains(e.target)) {
        resetViewAndRotate();
      }
    });
    
    
    function checkVisibility() {
      const mapElement = document.getElementById('kreuzungen-map');
      const rect = mapElement.getBoundingClientRect();
      const isVisible = (
        rect.top >= -rect.height &&
        rect.left >= -rect.width &&
        rect.bottom <= (window.innerHeight + rect.height) &&
        rect.right <= (window.innerWidth + rect.width)
      );
      
      if (!isVisible && !isActive) {
        resetViewAndRotate();
      }
    }
    
    
    document.addEventListener('scroll', checkVisibility);
    
    
    map.on('load', startRotation);

    fetch('https://demotiles.maplibre.org/style.json')
      .then(response => response.json())
      .then(customStyle => {
        fetch('https://fly.storage.tigris.dev/hydro-xpid/modelled/country.json')
          .then(response => response.text())
          .then(text => {
            try {
              const data = text.split('\n').filter(line => line.trim() !== '').map(line => JSON.parse(line));
              const waterwaysDict = data.reduce((acc, country) => {
                acc[country.country_code_3] = {
                  waterways_crossed: country.waterway_realtions_crossed,
                  unique_waterways_crossed: country.unique_waterway_realtions_crossed,
                  country_name: country.country,
                  most_popular_waterway: country.most_popular_waterway || 'N/A',
                  country_code_2: country.country_code_2
                };
                return acc;
              }, {});

              updatedStyle = updateStyleJson(customStyle, waterwaysDict);
              map.setStyle(updatedStyle);

              map.on('load', () => {
                let hoveredCountryId = null;
                const popup = new maplibregl.Popup({
                  closeButton: false,
                  closeOnClick: false
                });

                map.on('mousemove', 'countries-fill', (e) => {
                  if (e.features.length > 0) {
                    if (hoveredCountryId) {
                      map.setFeatureState(
                        { source: 'maplibre', sourceLayer: 'countries', id: hoveredCountryId },
                        { hover: false }
                      );
                    }
                    hoveredCountryId = e.features[0].id;
                    map.setFeatureState(
                      { source: 'maplibre', sourceLayer: 'countries', id: hoveredCountryId },
                      { hover: true }
                    );

                    const countryCode = e.features[0].properties.ADM0_A3;
                    const countryData = waterwaysDict[countryCode];
                    const countryName = e.features[0].properties.ADMIN;

                    if (countryData) {
                      popup.setLngLat(e.lngLat)
                        .setHTML(`
                          <strong>${getFlagEmoji(countryData.country_code_2)} ${countryData.country_name}</strong><br>
                          Waterways Crossed: ${countryData.waterways_crossed}<br>
                          Unique Waterways: ${countryData.unique_waterways_crossed}<br>
                          Popular: ${countryData.most_popular_waterway}
                        `)
                        .addTo(map);
                    } else {
                      popup.setLngLat(e.lngLat)
                        .setHTML(`
                          <strong>${countryName}</strong><br>
                          No waterway data available
                        `)
                        .addTo(map);
                    }
                  }
                });

                map.on('mouseleave', 'countries-fill', () => {
                  if (hoveredCountryId) {
                    map.setFeatureState(
                      { source: 'maplibre', sourceLayer: 'countries', id: hoveredCountryId },
                      { hover: false }
                    );
                  }
                  hoveredCountryId = null;
                  popup.remove();
                });

                map.on('click', 'countries-fill', (e) => {
                  if (e.features.length > 0) {
                    const country = e.features[0];
                    const bbox = turf.bbox(country);
                    map.fitBounds(bbox, {
                      padding: 40,
                      duration: 2000
                    });
                  }
                });
              });
            } catch (error) {
              console.error('Error parsing JSON:', error);
            }
          })
          .catch(error => console.error('Error fetching waterways data:', error));
      })
      .catch(error => console.error('Error fetching style JSON:', error));
  });
</script>
<p><a href="https://kreuzungen.world" target="_blank" rel="noopener noreferrer">kreuzungen.world</a> isn&rsquo;t changing the world, but it is a great example of how open source software and open data can be used to create something (crossing rivers is not going to save lives, but my shred-buddies find it interesting, so that&rsquo;s something).</p>
<h3 id="the-powerful-client-" class="headerLink">
    <a href="#the-powerful-client-" class="header-mark"></a>The powerful client 💪</h3><p>I want to point out my genuine surprise at how fast the browser does the geospatial processing&hellip; It just works, even on old devices, the performance is good enough. We are talking a couple of seconds for most routes. Including fetching the waterway data via OSM, render the calculated intersecting waterways on a vector map. Waterway fans are happy, I am happy too 😁</p>
<figure><a class="lightgallery" href="/media/performance/kreuzungen-loading.avif" title="/media/performance/kreuzungen-loading.avif" data-thumbnail="/media/performance/kreuzungen-loading.avif" data-sub-html="<h2>Demonstrating the performance of the waterway detection</h2>">
        <img
            
            loading="lazy"
            src="/media/performance/kreuzungen-loading.avif"
            srcset="/media/performance/kreuzungen-loading.avif, /media/performance/kreuzungen-loading.avif 1.5x, /media/performance/kreuzungen-loading.avif 2x"
            sizes="auto"
            alt="/media/performance/kreuzungen-loading.avif">
    </a><figcaption class="image-caption">Demonstrating the performance of the waterway detection</figcaption>
    </figure>
<p>But, I am not completely satisfied&hellip; Recently I have had a craving for speed in my life, don&rsquo;t know why, but I feel it. And fittingly I decided to revisit this problem and see how fast I could push this thing 🚴</p>
<h2 id="i-had-an-idea-" class="headerLink">
    <a href="#i-had-an-idea-" class="header-mark"></a>I had an Idea 💡</h2><p>River data doesn&rsquo;t change that often, so by pre-processing the data and using a storage format optimized for querying <em><strong>it should be much faster</strong></em> My train of thought was something like this 🤔</p>
<ul>
<li>Pre-download waterways instead of waiting for the Overpass API 💾</li>
<li>Use duckdb alone for fast geospatial queries 🌍🦆</li>
<li>Implement R-Tree indexing to minimize intersection search space 🎯</li>
<li>Apply two-step filtering: bounding box, then precise intersection 🔍</li>
<li>Lightweight API Layer with endpoint for GPX upload 💻</li>
</ul>
<p>Well folks, buckle up, because I took that idea and ran with it and it turned out to be a sprint! 🏃</p>
<h2 id="the-setup-" class="headerLink">
    <a href="#the-setup-" class="header-mark"></a>The Setup 🛠</h2><p>The entire solution is available on <a href="https://github.com/01100100/wasserwege" target="_blank" rel="noopener noreferrer">GitHub</a>. Feel free to adapt it for your own needs.</p>
<p>The solution has two parts:</p>
<ul>
<li>Preparing the database</li>
<li>Wrapping it in an API</li>
</ul>
<p>Actually thats a lie, there is another part&hellip;</p>
<ul>
<li>The benchmarking! ⏳</li>
</ul>
<p>I also included a benchmark with some different gpx files, because, well timing things makes sense (when you&rsquo;re obsessed with speed 🎽).</p>
<p>The entire solution is built using open source tools and libraries and is available on <a href="https://github.com/01100100/wasserwege" target="_blank" rel="noopener noreferrer">GitHub</a>. Feel free to explore and adapt it for your own projects! If you have any suggestions or feedback, please reach out. I&rsquo;m always eager to learn and improve.</p>
<p>The project structure looks like this:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">├── README.md
</span></span><span class="line"><span class="cl">├── prepare_waterways_data.py   -- data pipeline to prepare the database
</span></span><span class="line"><span class="cl">├── server.py                   -- fastapi wrapper
</span></span><span class="line"><span class="cl">├── benchmark.py                -- benchmark script
</span></span><span class="line"><span class="cl">├── benchmark_logs              -- benchmark logs
</span></span><span class="line"><span class="cl">├── data                        -- data directory
</span></span><span class="line"><span class="cl">│   ├── filtered/               -- filtered waterways <span class="o">(</span>.pbf<span class="o">)</span>
</span></span><span class="line"><span class="cl">│   ├── parquet/                -- filtered Waterways <span class="o">(</span>.parquet<span class="o">)</span>    
</span></span><span class="line"><span class="cl">│   ├── pond.duckdb             -- <span class="nb">local</span> duckdb database
</span></span><span class="line"><span class="cl">│   └── raw/                    -- raw data osm <span class="o">(</span>.pbf<span class="o">)</span>
</span></span><span class="line"><span class="cl">├── quelle                      -- dbt project
</span></span><span class="line"><span class="cl">│   ├── dbt_project.yml
</span></span><span class="line"><span class="cl">│   ├── models                  -- transformation logic
</span></span><span class="line"><span class="cl">│   └── profiles.yml            -- duckdb setup
</span></span><span class="line"><span class="cl">└── test_data
</span></span><span class="line"><span class="cl">    └── gpx                     -- benchmarking gpx files
</span></span></code></pre></td></tr></table>
</div>
</div><h3 id="the-big-data-preprocessing-pipeline-" class="headerLink">
    <a href="#the-big-data-preprocessing-pipeline-" class="header-mark"></a>The &ldquo;Big Data&rdquo; Preprocessing Pipeline 🚰</h3><p>Let me walk you through the pipeline, which is at the heart of the solution, that transforms raw OpenStreetMap data into the DuckDB table optimized for spatial querying. This is the source of the Wasserwege API 🌊</p>
<ol>
<li>
<p><strong>Download OSM Data</strong>: Using extracts from <a href="https://download.geofabrik.de/" target="_blank" rel="noopener noreferrer">Geofabrik</a> rather than processing the entire planet file.</p>
</li>
<li>
<p><strong>Filter Waterway Features</strong>: Using <a href="https://github.com/openstreetmap/osmosis" target="_blank" rel="noopener noreferrer">Osmosis</a>, filter for waterway features from the OSM data. This dramatically reduces the data size.</p>
</li>
<li>
<p><strong>Convert to GeoParquet</strong>: Using <a href="https://github.com/GIScience/ohsome-planet" target="_blank" rel="noopener noreferrer">ohsome-planet</a>. The format is columnar and much more efficient for analytical queries. This Java tool is amazing, it does a great job at converting the OSM data into a format that is easy to work with.</p>
</li>
<li>
<p><strong>Build database for querying</strong>: I leverage <a href="https://github.com/duckdb/dbt-duckdb" target="_blank" rel="noopener noreferrer">dbt</a> with the DuckDB adapter. This framework brings structure to data transformation workflows. It nicely separates the data transformation logic from the data engineering configuration. Makes a setup that&rsquo;s easy to maintain and extend and minimizes boilerplate code.</p>
</li>
</ol>
<figure><a class="lightgallery" href="/media/performance/datapipeline.avif" title="/media/performance/datapipeline.avif" data-thumbnail="/media/performance/datapipeline.avif" data-sub-html="<h2>The whole of Andorra in downloaded, filtered and optimized to serve in &lt;10 seconds</h2>">
        <img
            
            loading="lazy"
            src="/media/performance/datapipeline.avif"
            srcset="/media/performance/datapipeline.avif, /media/performance/datapipeline.avif 1.5x, /media/performance/datapipeline.avif 2x"
            sizes="auto"
            alt="/media/performance/datapipeline.avif">
    </a><figcaption class="image-caption">The whole of Andorra in downloaded, filtered and optimized to serve in &lt;10 seconds</figcaption>
    </figure>
<h4 id="r-tree-indexing-" class="headerLink">
    <a href="#r-tree-indexing-" class="header-mark"></a>R-Tree Indexing 🌳</h4><p>The most critical optimization is the creation of an R-Tree spatial index to organize the geometries in a hierarchical tree structure.</p>
<p>When a query like &ldquo;find all waterways that intersect with this route&rdquo; is executed, the R-Tree allows the database to quickly eliminate vast portions of the dataset without checking each waterway individually. This reduces the time complexity from <code>O(n)</code> to something closer to <code>O(log n)</code>.</p>
<p>An index can be created with a simple SQL statement, thanks to the duckdb <a href="https://duckdb.org/docs/stable/extensions/spatial/overview.html" target="_blank" rel="noopener noreferrer">SPATIAL</a> extension:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">waterways_geom_idx</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">waterways</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">RTREE</span><span class="w"> </span><span class="p">(</span><span class="n">geom</span><span class="p">);</span><span class="w">
</span></span></span></code></pre></td></tr></table>
</div>
</div><p>This single line of code provides dramatic performance improvements for spatial queries.</p>
<p><figure><a class="lightgallery" href="/media/performance/stree.webp" title="R-tree use wikipedia" data-thumbnail="/media/performance/stree.webp">
        <img
            
            loading="lazy"
            src="/media/performance/stree.webp"
            srcset="/media/performance/stree.webp, /media/performance/stree.webp 1.5x, /media/performance/stree.webp 2x"
            sizes="auto"
            alt="R-tree use wikipedia">
    </a></figure></p>
<p>Imagine trying to find a group of friends at a festival&hellip; good luck if you have to search through the entire crowd. But if you know they will be at the beach stage, you can skip out the masses and search through only the people in the near vicinity. The R-Tree index is an ordering of data such that you quickly narrow down the search space to just the relevant geometries.</p>
<h4 id="dbt-ing-it-all-together-" class="headerLink">
    <a href="#dbt-ing-it-all-together-" class="header-mark"></a>DBT-ing it all together 🧩</h4><p>The <code>waterways.sql</code> model combines the transformations and creation of the R-Tree index.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="err">{{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">config</span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">materialized</span><span class="o">=</span><span class="s2">&#34;table&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">file_format</span><span class="o">=</span><span class="s2">&#34;parquet&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">location_root</span><span class="o">=</span><span class="s2">&#34;../data/processed/&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">pre_hook</span><span class="o">=</span><span class="s2">&#34;DROP INDEX IF EXISTS waterways_geom_idx;&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">post_hook</span><span class="o">=</span><span class="s2">&#34;CREATE INDEX waterways_geom_idx ON {{ this }} USING RTREE (geom);&#34;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="err">}}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">with</span><span class="w"> </span><span class="n">waterway_features</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">select</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">osm_id</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">st_geomfromwkb</span><span class="p">(</span><span class="n">geometry</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">geom</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">tags</span><span class="p">[</span><span class="s1">&#39;name&#39;</span><span class="p">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">waterway_name</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">tags</span><span class="p">[</span><span class="s1">&#39;waterway&#39;</span><span class="p">]</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">waterway_type</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">from</span><span class="w"> </span><span class="err">{{</span><span class="w"> </span><span class="k">source</span><span class="p">(</span><span class="s2">&#34;osm&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;waterways&#34;</span><span class="p">)</span><span class="w"> </span><span class="err">}}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">where</span><span class="w"> </span><span class="n">tags</span><span class="p">[</span><span class="s1">&#39;name&#39;</span><span class="p">]</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">from</span><span class="w"> </span><span class="n">waterway_features</span><span class="w">
</span></span></span></code></pre></td></tr></table>
</div>
</div><p>The output of this all is a local duckdb file <code>pond.duckdb</code> with a table <code>waterways</code> that contains all the waterways, configured ready for fast querying.</p>
<h3 id="the-api-" class="headerLink">
    <a href="#the-api-" class="header-mark"></a>The API 🎮</h3><p>The API is built using FastAPI, a modern web framework for building APIs with Python. My aim was to keep this lightweight and hand off the heavy lifting to the database. I added two endpoints to the API:</p>
<ul>
<li><code>/healthcheck</code>: A simple endpoint to check if the API is running and healthy and reports the number of waterways in the database.</li>
<li><code>/process_gpx</code>: This endpoint accepts a GPX file, parses it, and finds the waterways that intersect with the route. It returns the results in a JSON format.</li>
</ul>
<div class="details admonition tip">
        <div class="details-summary admonition-title">
            <i class="icon fas fa-lightbulb fa-fw"></i>Parsing Gpx in Python?<i class="details-icon fas fa-angle-right fa-fw"></i>
        </div>
        <div class="details-content">
            <div class="admonition-content">Parsing the GPX file is done using thanks to the <code>gpxpy</code> library, which is a simple and efficient library for parsing GPX files. The GPX file is converted to a LineString geometry using the <code>gpx_to_linestring</code> function, which extracts the coordinates from the GPX file and creates a LineString object, this is then passed to the <code>find_waterway_crossings</code> function.</div>
        </div>
    </div>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@app.post</span><span class="p">(</span><span class="s2">&#34;/process_gpx&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">async</span> <span class="k">def</span> <span class="nf">process_gpx</span><span class="p">(</span><span class="n">file</span><span class="p">:</span> <span class="n">UploadFile</span> <span class="o">=</span> <span class="n">File</span><span class="p">(</span><span class="o">...</span><span class="p">)):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;Process GPX file and find waterway intersections&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">overall_start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Read and parse the uploaded GPX file</span>
</span></span><span class="line"><span class="cl">    <span class="n">contents</span> <span class="o">=</span> <span class="k">await</span> <span class="n">file</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">gpx</span> <span class="o">=</span> <span class="n">gpxpy</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">contents</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># Convert to linestring and find intersections</span>
</span></span><span class="line"><span class="cl">    <span class="n">linestring</span> <span class="o">=</span> <span class="n">gpx_to_linestring</span><span class="p">(</span><span class="n">gpx</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">crossings</span> <span class="o">=</span> <span class="n">find_waterway_crossings</span><span class="p">(</span><span class="n">linestring</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># Return results with timing information</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;processing_times_ms&#34;</span><span class="p">:</span> <span class="p">{</span><span class="o">...</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;total_crossings&#34;</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">crossings</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;crossings&#34;</span><span class="p">:</span> <span class="n">crossings</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span></code></pre></td></tr></table>
</div>
</div><h4 id="the-query-" class="headerLink">
    <a href="#the-query-" class="header-mark"></a>The Query ✍️</h4><p>The core query that powers the intersection detection is remarkably simple (thanks to the R-Tree index and DuckDB&rsquo;s spatial functions):</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">WITH</span><span class="w"> </span><span class="n">route_geom_cte</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">SELECT</span><span class="w"> </span><span class="n">ST_GeomFromText</span><span class="p">(</span><span class="err">$</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">geom</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">           </span><span class="n">ST_Envelope</span><span class="p">(</span><span class="n">ST_GeomFromText</span><span class="p">(</span><span class="err">$</span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">bbox</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">w</span><span class="p">.</span><span class="n">id</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">w</span><span class="p">.</span><span class="n">waterway_name</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">w</span><span class="p">.</span><span class="n">waterway_type</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">ST_AsGeoJSON</span><span class="p">(</span><span class="n">ST_Intersection</span><span class="p">(</span><span class="n">w</span><span class="p">.</span><span class="n">geom</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">.</span><span class="n">geom</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">intersection_geojson</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">waterways</span><span class="w"> </span><span class="n">w</span><span class="p">,</span><span class="w"> </span><span class="n">route_geom_cte</span><span class="w"> </span><span class="n">r</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">ST_Intersects</span><span class="p">(</span><span class="n">w</span><span class="p">.</span><span class="n">geom</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">.</span><span class="n">bbox</span><span class="p">)</span><span class="w"> </span><span class="c1">-- First fast filter with bounding box
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">ST_Intersects</span><span class="p">(</span><span class="n">w</span><span class="p">.</span><span class="n">geom</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">.</span><span class="n">geom</span><span class="p">)</span><span class="w"> </span><span class="c1">-- Then precise intersection
</span></span></span></code></pre></td></tr></table>
</div>
</div><p>This query uses the created index to quickly identify potential intersections, then it performs the exact spatial operation only on those candidates.</p>
<h2 id="quantifying-the-gains-" class="headerLink">
    <a href="#quantifying-the-gains-" class="header-mark"></a>Quantifying the Gains 📈</h2><p>To measure the performance gains, I created a benchmarking script (<code>benchmark.py</code>) that tests different GPX files against the API. The results showed a drastic improvement:</p>
<figure><a class="lightgallery" href="/media/performance/kreuzungen-benchmark.avif" title="/media/performance/kreuzungen-benchmark.avif" data-thumbnail="/media/performance/kreuzungen-benchmark.avif" data-sub-html="<h2>Seeing the performance increase 🧑‍💻</h2>">
        <img
            
            loading="lazy"
            src="/media/performance/kreuzungen-benchmark.avif"
            srcset="/media/performance/kreuzungen-benchmark.avif, /media/performance/kreuzungen-benchmark.avif 1.5x, /media/performance/kreuzungen-benchmark.avif 2x"
            sizes="auto"
            alt="/media/performance/kreuzungen-benchmark.avif">
    </a><figcaption class="image-caption">Seeing the performance increase 🧑‍💻</figcaption>
    </figure>
<p>You can see the performance of the API in action. The benchmark script runs 10 different GPX files, each with varying lengths and complexities, and measures the time taken to process each file, and it runs before the old browser-based solution even finishes a single route.</p>
<p>This represents a <strong>biggggg</strong> in processing speed compared to the browser-based solution!</p>
<h2 id="conclusion-performance-is-a-journey-not-a-destination-" class="headerLink">
    <a href="#conclusion-performance-is-a-journey-not-a-destination-" class="header-mark"></a>Conclusion: Performance is a Journey, Not a Destination 🧘</h2><p>This experiment shows that with different tools and techniques, we can dramatically improve the performance of applications. What was already &ldquo;pretty fast&rdquo; in the browser is now much faster because of a change in architecture 🦆</p>
<p>I&rsquo;ll be honest, I don&rsquo;t even have a feeling of what a &ldquo;fast&rdquo; optimized response to this problem should be 😲</p>
<p>There are so many ways this problem can be solved using a computer, and each approach has its own characteristics.</p>
<p>I didn&rsquo;t explore other optimizations, like using a different database or alternative indexing strategies. I could have written low-level code in Rust, but that would take far more time than I had available for this project.</p>
<p>We are lucky to have such fast tools available to us today to use for free, and can build on top of really well made software. This saves time ❤️</p>
<div class="details admonition tip open">
        <div class="details-summary admonition-title">
            <i class="icon fas fa-lightbulb fa-fw"></i>The power of open source collaboration<i class="details-icon fas fa-angle-right fa-fw"></i>
        </div>
        <div class="details-content">
            <div class="admonition-content"><p>Seriously, a lot of the reason why someone like me can solve these problems is thanks to gluing together libraries and tools that are openly available on GitHub.</p>
<p>There are many layers of abstraction in this solution, a lot of code I&rsquo;m not even aware of. Every tool I use is built on top of other tools, which are built on top of others, and so on. Many developers have spent countless hours pondering optimizations at each level, and it&rsquo;s thanks to this combined effort that a single developer like me can build something this efficient. A solution that&rsquo;s more than fast enough for my needs 🙇</p>
<p>Standing on the shoulders of giants isn&rsquo;t just a saying, it&rsquo;s literally how modern software development works. The R-Tree implementation I&rsquo;m using might have taken years to perfect by dedicated algorithm specialists inspired by centuries of mathematicians thinking about geometry, and here I am, the tech-bro activating it with a single line of SQL in a database named after a duck! This ability to reuse the mental work of others, whether you are aware of it or not, makes this field so incredible. ✨</p>
</div>
        </div>
    </div>
<h2 id="takeaways-performance-is-a-journey-not-a-destination-" class="headerLink">
    <a href="#takeaways-performance-is-a-journey-not-a-destination-" class="header-mark"></a>Takeaways: Performance is a Journey, Not a Destination 🧘</h2><p>It&rsquo;s deeply satisfying to see abstract ideas transform into real performance gains. What started as a theoretical hunch about spatial indexing and data processing became a real and measurable improvement. It&rsquo;s a humble reminder that studying these concepts can actually lead to practical benefits when applied to the right problem.</p>
<p>Not everything in life needs to be fast, but in situations where it matters, a well-chosen index and thoughtful data pipeline might just be the solution you need! 🏁</p>
<h2 id="standing-on-the-shoulders-of-open-giants-" class="headerLink">
    <a href="#standing-on-the-shoulders-of-open-giants-" class="header-mark"></a>Standing on the Shoulders of (Open) Giants 🙇</h2><p>None of this would be possible without the incredible open-source tools and libraries that are available today.</p>
<p><a href="https://www.openstreetmap.org" target="_blank" rel="noopener noreferrer">OpenStreetMap Contributors</a>: Every river and stream was manually mapped by volunteers in one of humanity&rsquo;s most impressive collaborative projects 💚</p>
<p><a href="https://www.geofabrik.de/" target="_blank" rel="noopener noreferrer">Geofabrik</a>: Provides free daily OSM extracts, saving countless hours of redundant processing. Thank you! 🙇</p>
<p><a href="https://osmcode.org/osmium-tool/" target="_blank" rel="noopener noreferrer">Osmosis</a>: When OSM data is like IKEA (an overwhelming maze with too many options when all you need is cup), this tool helps filter just what you need 🕵️‍♀️</p>
<p><a href="https://github.com/GIScience/ohsome-planet" target="_blank" rel="noopener noreferrer">ohsome-planet</a>: Transforms OSM&rsquo;s simply-elegant-but-awkward data model (just nodes with tags and pointers) into the GOAT GeoParquet format 📦</p>
<p><a href="https://duckdb.org" target="_blank" rel="noopener noreferrer">DuckDB</a>: Turns your laptop into a high-performance geospatial data warehouse 🦆</p>
<p><a href="https://www.getdbt.com/" target="_blank" rel="noopener noreferrer">dbt</a>, <a href="https://github.com/Maproom/gpxpy" target="_blank" rel="noopener noreferrer">gpxpy</a>, <a href="https://fastapi.tiangolo.com/" target="_blank" rel="noopener noreferrer">FastAPI</a>: The building blocks that tied everything together into a coherent system 🧰</p>]]></description></item></channel></rss>