home / 24ways

24ways

Custom SQL query returning 2 rows

Query parameters

rowidtitlecontentsyearauthorauthor_slugpublishedurltopic
246 Designing Your Site Like It’s 1998 It’s 20 years to the day since my wife and I started Stuff & Nonsense, our little studio and my outlet for creative ideas on the web. To celebrate this anniversary—and my fourteenth contribution to 24 ways— I’d like to explain how I would’ve developed a design for Planes, Trains and Automobiles, one of my favourite Christmas films. My design for Planes, Trains and Automobiles is fixed at 800px wide. Developing a <frameset> framework I’ll start by using frames to set up the framework for this new website. Frames are individual pages—one for navigation, the other for my content—pulled together to form a frameset. Space is limited on lower-resolution screens, so by using frames I can ensure my navigation always remains visible. I can include any number of frames inside a <frameset> element. I add two rows to my <frameset>; the first is for my navigation and is 50px tall, the second is for my content and will resize to fill any available space. As I don’t want frame borders or any space between my frames, I set frameborder and framespacing attributes to 0: <frameset frameborder="0" framespacing="0" rows="50,*"> […] </frameset> Next I add the source of my two frame documents. I don’t want people to be able to resize or scroll my navigation, so I add the noresize attribute to that frame: <frameset frameborder="0" framespacing="0" rows="50,*"> <frame noresize scrolling="no" src="nav.html"> <frame src="content.html"> </frameset> I do want links from my navigation to open in the content frame, so I give each <frame> a name so I can specify where I want links to open: <frameset frameborder="0" framespacing="0" rows="50,*"> <frame name="navigation" noresize scrolling="no" src="nav.html"> <frame name="content" src="content.html"> </frameset> The framework for this website is simple as it contains only two horizontal rows. Should I need a more complex layout, I can nest as many framesets—and as many individual documents—as I need: <frameset rows="50,*"> <frame name="navigation"> <frameset cols="25%,*"> <frame name="sidebar"> <frame name="content"> </frameset> </frameset> Letterbox framesets were common way to deal with multiple screen sizes. In a letterbox, the central frameset had a fixed height and width, while the frames on the top, right, bottom, and left expanded to fill any remaining space. Handling older browsers Sadly not every browser supports frames, so I should send a helpful message to people who use older browsers asking them to upgrade. Happily, I can do that using noframes content: <noframes> <body> <p>This page uses frames, but your browser doesn’t support them. Please upgrade your browser.</p> </body> </noframes> Forcing someone back into a frame Sometimes, someone may follow a link to a page from a portal or search engine, or they might attempt to open it in a new window or tab. If that page properly belongs inside a <frameset>, people could easily miss out on other parts of a design. This short script will prevent this happening and because it’s vanilla Javascript, it doesn’t require a library such as jQuery: <script type="text/javascript"> if (top == self) { location = 'frameset.html'; } </script> Laying out my page Before starting my layout, I add a few basic background and colour styles. I must include these attributes in every page on my website: <body background="img/container.jpg" bgcolor="#fef7fb" link="#245eab" alink="#245eab" vlink="#3c146e" text="#000000"> I want absolute control over how people experience my design and don’t want to allow it to stretch, so I first need a <table> which limits the width of my layout to 800px. The align attribute will keep this <table> in the centre of someone’s screen: <table width="800" align="center"> <tr> <td>[…]</td> </tr> </table> Although they were developed for displaying tabular information, the cells and rows which make up the <table> element make it ideal for the precise implementation of a design. I need several tables—often nested inside each other—to implement my design. These include tables for a banner and three rows of content: <table width="800" align="center"> <table>[…]</table> <table> <table> <table>[…]</table> </table> </table> <table>[…]</table> <table>[…]</table> </table> The width of the first table—used for my banner—is fixed to match the logo it contains. As I don’t need borders, padding, or spacing between these cells, I use attributes to remove them: <table border="0" cellpadding="0" cellspacing="0" width="587" align="center"> <tr> <td><img src="logo.gif" border="0" width="587" alt="Logo"></td> </tr> </table> The next table—which contains the largest image, introduction, and a call-to-action—is one of the most complex parts of my design, so I need to ensure its layout is pixel perfect. To do that I add an extra row at the top of this table and fill each of its cells with tiny transparent images: <tr> <td><img src="spacer.gif" width="593" height="1"></td> <td><img src="spacer.gif" width="207" height="1"></td> </tr> The height and width of these “shims” or “spacers” is only 1px but they will stretch to any size without increasing their weight on the page. This makes them perfect for performant website development. For the hero of this design, I splice up the large image into three separate files and apply each slice as a background to the table cells. I also match the height of those cells to the background images: <tr> <td background="slice-1.jpg" height="473"> </td> <td background="slice-2.jpg" height="473">[…]</td> </tr> <tr> <td background="slice-3.jpg" height="388"> </td> </tr> I use tables and spacer images throughout the rest of this design to lay out the various types of content with perfect precision. For example, to add a single-pixel border around my two columns of content, I first apply a blue background to an outer table along with 1px of cellspacing, then simply nest an inner table—this time with a white background—inside it: <table border="0" cellpadding="1" cellspacing="0"> <tr> <td> <table bgcolor="#ffffff" border="0" cellpadding="0" cellspacing="0"> […] </table> </td> </tr> </table> Adding details Tables are fabulous tools for laying out a page, but they’re also useful for implementing details on those pages. I can use a table to add a gradient background, rounded corners, and a shadow to the button which forms my “Buy the DVD” call-to-action. First, I splice my button graphic into three slices; two fixed-width rounded ends, plus a narrow gradient which stretches and makes this button responsive. Then, I add those images as backgrounds and use spacers to perfectly size my button: <table border="0" cellpadding="0" cellspacing="0"> <tr> <td background="btn-1.jpg" border="0" height="48" width="30"><img src="spacer.gif" width="30" height="1"></td> <td background="btn-2.jpg" border="0" height="48"> <center> <a href="" target="_blank"><b>Buy the DVD</b></a> </center> </td> <td background="btn-3.jpg" border="0" height="48" width="30"><img src="spacer.gif" width="30" height="1"></td> </tr> </table> I use those same elements to add details to headlines and lists too. Adding a “bullet” to each item in a list needs only two additional table cells, a circular graphic, and a spacer: <table border="0" cellpadding="0" cellspacing="0"> <tr> <td width="10"><img src="li.gif" border="0" width="8" height="8"> </td> <td><img src="spacer.gif" width="10" height="1"> </td> <td>Directed by John Hughes</td> </tr> </table> Implementing a typographic hierarchy So far I’ve explained how to use frames, tables, and spacers to develop a layout for my content, but what about styling that content? I use <font> elements to change the typeface from the browser’s default to any font installed on someone’s device: <font face="Arial">Planes, Trains and Automobiles is a comedy film […]</font> To adjust the size of those fonts, I use the size attribute and a value between the smallest (1) and the largest (7) where 3 is the browser’s default. I use a size of 4 for this headline and 2 for the text which follows: <font face="Arial" size="4"><b>Steve Martin</b></font> <font face="Arial" size="2">An American actor, comedian, writer, producer, and musician.</font> When I need to change the typeface, perhaps from a sans-serif like Arial to a serif like Times New Roman, I must change the value of the face attribute on every element on all pages on my website. NB: I use as many <br> elements as needed to create space between headlines and paragraphs. View the final result (and especially the source.) My modern day design for Planes, Trains and Automobiles. I can imagine many people reading this and thinking “This is terrible advice because we don’t develop websites like this in 2018.” That’s true. We have the ability to embed any number of web fonts into our products and websites and have far more control over type features, leading, ligatures, and sizes: font-variant-caps: titling-caps; font-variant-ligatures: common-ligatures; font-variant-numeric: oldstyle-nums; Grid has simplified the implementation of even the most complex compound grid down to just a few lines of CSS: body { display: grid; grid-template-columns: 3fr 1fr 2fr 2fr 1fr 3fr; grid-template-rows: auto; grid-column-gap: 2vw; grid-row-gap: 1vh; } Flexbox has made it easy to develop flexible components such as navigation links: nav ul { display: flex; } nav li { flex: 1; } Just one line of CSS can create multiple columns of fluid type: main { column-width: 12em; } CSS Shapes enable text to flow around irregular shapes including polygons: [src*="main-img"] { float: left; shape-outside: polygon(…); } Today, we wouldn’t dream of using images and a table to add a gradient, rounded corners, and a shadow to a button or link, preferring instead: .btn { background: linear-gradient(#8B1212, #DD3A3C); border-radius: 1em; box-shadow: 0 2px 4px 0 rgba(0,0,0,0.50), inset 0 -1px 1px 0 rgba(0,0,0,0.50); } CSS Custom Properties, feature and media queries, filters, pseudo-elements, and SVG; the list of advances in HTML, CSS, and other technologies goes on. So does our understanding of how best to use them by separating content, structure, presentation, and behaviour. As 2018 draws to a close, we’re certain we know how to design and develop products and websites better than we did at the end of 1998. Strange as it might seem looking back, in 1998 we were also certain our techniques and technologies were the best for the job. That’s why it’s dangerous to believe with absolute certainty that the frameworks and tools we increasingly rely on today—tools like Bootstrap, Bower, and Brunch, Grunt, Gulp, Node, Require, React, and Sass—will be any more relevant in the future than <font> elements, frames, layout tables, and spacer images are today. I have no prediction for what the web will be like twenty years from now. However, I want to believe we’ll build on what we’ve learned during these past two decades about the importance of accessibility, flexibility, and usability, and that the mistakes we made while infatuated by technologies won’t be repeated. Head over to my website if you’d like to read about how I’d implement my design for ‘Planes, Trains and Automobiles’ today. 2018 Andy Clarke andyclarke 2018-12-23T00:00:00+00:00 https://24ways.org/2018/designing-your-site-like-its-1998/ code
249 Fast Autocomplete Search for Your Website Every website deserves a great search engine - but building a search engine can be a lot of work, and hosting it can quickly get expensive. I’m going to build a search engine for 24 ways that’s fast enough to support autocomplete (a.k.a. typeahead) search queries and can be hosted for free. I’ll be using wget, Python, SQLite, Jupyter, sqlite-utils and my open source Datasette tool to build the API backend, and a few dozen lines of modern vanilla JavaScript to build the interface. Try it out here, then read on to see how I built it. First step: crawling the data The first step in building a search engine is to grab a copy of the data that you plan to make searchable. There are plenty of potential ways to do this: you might be able to pull it directly from a database, or extract it using an API. If you don’t have access to the raw data, you can imitate Google and write a crawler to extract the data that you need. I’m going to do exactly that against 24 ways: I’ll build a simple crawler using wget, a command-line tool that features a powerful “recursive” mode that’s ideal for scraping websites. We’ll start at the https://24ways.org/archives/ page, which links to an archived index for every year that 24 ways has been running. Then we’ll tell wget to recursively crawl the website, using the --recursive flag. We don’t want to fetch every single page on the site - we’re only interested in the actual articles. Luckily, 24 ways has nicely designed URLs, so we can tell wget that we only care about pages that start with one of the years it has been running, using the -I argument like this: -I /2005,/2006,/2007,/2008,/2009,/2010,/2011,/2012,/2013,/2014,/2015,/2016,/2017 We want to be polite, so let’s wait for 2 seconds between each request rather than hammering the site as fast as we can: --wait 2 The first time I ran this, I accidentally downloaded the comments pages as well. We don’t want those, so let’s exclude them from the crawl using -X "/*/*/comments". Finally, it’s useful to be able to run the command multiple times without downloading pages that we have already fetched. We can use the --no-clobber option for this. Tie all of those options together and we get this command: wget --recursive --wait 2 --no-clobber -I /2005,/2006,/2007,/2008,/2009,/2010,/2011,/2012,/2013,/2014,/2015,/2016,/2017 -X "/*/*/comments" https://24ways.org/archives/ If you leave this running for a few minutes, you’ll end up with a folder structure something like this: $ find 24ways.org 24ways.org 24ways.org/2013 24ways.org/2013/why-bother-with-accessibility 24ways.org/2013/why-bother-with-accessibility/index.html 24ways.org/2013/levelling-up 24ways.org/2013/levelling-up/index.html 24ways.org/2013/project-hubs 24ways.org/2013/project-hubs/index.html 24ways.org/2013/credits-and-recognition 24ways.org/2013/credits-and-recognition/index.html ... As a quick sanity check, let’s count the number of HTML pages we have retrieved: $ find 24ways.org | grep index.html | wc -l 328 There’s one last step! We got everything up to 2017, but we need to fetch the articles for 2018 (so far) as well. They aren’t linked in the /archives/ yet so we need to point our crawler at the site’s front page instead: wget --recursive --wait 2 --no-clobber -I /2018 -X "/*/*/comments" https://24ways.org/ Thanks to --no-clobber, this is safe to run every day in December to pick up any new content. We now have a folder on our computer containing an HTML file for every article that has ever been published on the site! Let’s use them to build ourselves a search index. Building a search index using SQLite There are many tools out there that can be used to build a search engine. You can use an open-source search server like Elasticsearch or Solr, a hosted option like Algolia or Amazon CloudSearch or you can tap into the built-in search features of relational databases like MySQL or PostgreSQL. I’m going to use something that’s less commonly used for web applications but makes for a powerful and extremely inexpensive alternative: SQLite. SQLite is the world’s most widely deployed database, even though many people have never even heard of it. That’s because it’s designed to be used as an embedded database: it’s commonly used by native mobile applications and even runs as part of the default set of apps on the Apple Watch! SQLite has one major limitation: unlike databases like MySQL and PostgreSQL, it isn’t really designed to handle large numbers of concurrent writes. For this reason, most people avoid it for building web applications. This doesn’t matter nearly so much if you are building a search engine for infrequently updated content - say one for a site that only publishes new content on 24 days every year. It turns out SQLite has very powerful full-text search functionality built into the core database - the FTS5 extension. I’ve been doing a lot of work with SQLite recently, and as part of that, I’ve been building a Python utility library to make building new SQLite databases as easy as possible, called sqlite-utils. It’s designed to be used within a Jupyter notebook - an enormously productive way of interacting with Python code that’s similar to the Observable notebooks Natalie described on 24 ways yesterday. If you haven’t used Jupyter before, here’s the fastest way to get up and running with it - assuming you have Python 3 installed on your machine. We can use a Python virtual environment to ensure the software we are installing doesn’t clash with any other installed packages: $ python3 -m venv ./jupyter-venv $ ./jupyter-venv/bin/pip install jupyter # ... lots of installer output # Now lets install some extra packages we will need later $ ./jupyter-venv/bin/pip install beautifulsoup4 sqlite-utils html5lib # And start the notebook web application $ ./jupyter-venv/bin/jupyter-notebook # This will open your browser to Jupyter at http://localhost:8888/ You should now be in the Jupyter web application. Click New -> Python 3 to start a new notebook. A neat thing about Jupyter notebooks is that if you publish them to GitHub (either in a regular repository or as a Gist), it will render them as HTML. This makes them a very powerful way to share annotated code. I’ve published the notebook I used to build the search index on my GitHub account. ​ Here’s the Python code I used to scrape the relevant data from the downloaded HTML files. Check out the notebook for a line-by-line explanation of what’s going on. from pathlib import Path from bs4 import BeautifulSoup as Soup base = Path("/Users/simonw/Dropbox/Development/24ways-search") articles = list(base.glob("*/*/*/*.html")) # articles is now a list of paths that look like this: # PosixPath('...24ways-search/24ways.org/2013/why-bother-with-accessibility/index.html') docs = [] for path in articles: year = str(path.relative_to(base)).split("/")[1] url = 'https://' + str(path.relative_to(base).parent) + '/' soup = Soup(path.open().read(), "html5lib") author = soup.select_one(".c-continue")["title"].split( "More information about" )[1].strip() author_slug = soup.select_one(".c-continue")["href"].split( "/authors/" )[1].split("/")[0] published = soup.select_one(".c-meta time")["datetime"] contents = soup.select_one(".e-content").text.strip() title = soup.find("title").text.split(" ◆")[0] try: topic = soup.select_one( '.c-meta a[href^="/topics/"]' )["href"].split("/topics/")[1].split("/")[0] except TypeError: topic = None docs.append({ "title": title, "contents": contents, "year": year, "author": author, "author_slug": author_slug, "published": published, "url": url, "topic": topic, }) After running this code, I have a list of Python dictionaries representing each of the documents that I want to add to the index. The list looks something like this: [ { "title": "Why Bother with Accessibility?", "contents": "Web accessibility (known in other fields as inclus...", "year": "2013", "author": "Laura Kalbag", "author_slug": "laurakalbag", "published": "2013-12-10T00:00:00+00:00", "url": "https://24ways.org/2013/why-bother-with-accessibility/", "topic": "design" }, { "title": "Levelling Up", "contents": "Hello, 24 ways. Iu2019m Ashley and I sell property ins...", "year": "2013", "author": "Ashley Baxter", "author_slug": "ashleybaxter", "published": "2013-12-06T00:00:00+00:00", "url": "https://24ways.org/2013/levelling-up/", "topic": "business" }, ... My sqlite-utils library has the ability to take a list of objects like this and automatically create a SQLite database table with the right schema to store the data. Here’s how to do that using this list of dictionaries. import sqlite_utils db = sqlite_utils.Database("/tmp/24ways.db") db["articles"].insert_all(docs) That’s all there is to it! The library will create a new database and add a table to it called articles with the necessary columns, then insert all of the documents into that table. (I put the database in /tmp/ for the moment - you can move it to a more sensible location later on.) You can inspect the table using the sqlite3 command-line utility (which comes with OS X) like this: $ sqlite3 /tmp/24ways.db sqlite> .headers on sqlite> .mode column sqlite> select title, author, year from articles; title author year ------------------------------ ------------ ---------- Why Bother with Accessibility? Laura Kalbag 2013 Levelling Up Ashley Baxte 2013 Project Hubs: A Home Base for Brad Frost 2013 Credits and Recognition Geri Coady 2013 Managing a Mind Christopher 2013 Run Ragged Mark Boulton 2013 Get Started With GitHub Pages Anna Debenha 2013 Coding Towards Accessibility Charlie Perr 2013 ... <Ctrl+D to quit> There’s one last step to take in our notebook. We know we want to use SQLite’s full-text search feature, and sqlite-utils has a simple convenience method for enabling it for a specified set of columns in a table. We want to be able to search by the title, author and contents fields, so we call the enable_fts() method like this: db["articles"].enable_fts(["title", "author", "contents"]) Introducing Datasette Datasette is the open-source tool I’ve been building that makes it easy to both explore SQLite databases and publish them to the internet. We’ve been exploring our new SQLite database using the sqlite3 command-line tool. Wouldn’t it be nice if we could use a more human-friendly interface for that? If you don’t want to install Datasette right now, you can visit https://search-24ways.herokuapp.com/ to try it out against the 24 ways search index data. I’ll show you how to deploy Datasette to Heroku like this later in the article. If you want to install Datasette locally, you can reuse the virtual environment we created to play with Jupyter: ./jupyter-venv/bin/pip install datasette This will install Datasette in the ./jupyter-venv/bin/ folder. You can also install it system-wide using regular pip install datasette. Now you can run Datasette against the 24ways.db file we created earlier like so: ./jupyter-venv/bin/datasette /tmp/24ways.db This will start a local webserver running. Visit http://localhost:8001/ to start interacting with the Datasette web application. If you want to try out Datasette without creating your own 24ways.db file you can download the one I created directly from https://search-24ways.herokuapp.com/24ways-ae60295.db Publishing the database to the internet One of the goals of the Datasette project is to make deploying data-backed APIs to the internet as easy as possible. Datasette has a built-in command for this, datasette publish. If you have an account with Heroku or Zeit Now, you can deploy a database to the internet with a single command. Here’s how I deployed https://search-24ways.herokuapp.com/ (running on Heroku’s free tier) using datasette publish: $ ./jupyter-venv/bin/datasette publish heroku /tmp/24ways.db --name search-24ways -----> Python app detected -----> Installing requirements with pip -----> Running post-compile hook -----> Discovering process types Procfile declares types -> web -----> Compressing... Done: 47.1M -----> Launching... Released v8 https://search-24ways.herokuapp.com/ deployed to Heroku If you try this out, you’ll need to pick a different --name, since I’ve already taken search-24ways. You can run this command as many times as you like to deploy updated versions of the underlying database. Searching and faceting Datasette can detect tables with SQLite full-text search configured, and will add a search box directly to the page. Take a look at http://search-24ways.herokuapp.com/24ways-b607e21/articles to see this in action. ​ SQLite search supports wildcards, so if you want autocomplete-style search where you don’t need to enter full words to start getting results you can add a * to the end of your search term. Here’s a search for access* which returns articles on accessibility: http://search-24ways.herokuapp.com/24ways-ae60295/articles?_search=acces%2A A neat feature of Datasette is the ability to calculate facets against your data. Here’s a page showing search results for svg with facet counts calculated against both the year and the topic columns: http://search-24ways.herokuapp.com/24ways-ae60295/articles?_search=svg&_facet=year&_facet=topic Every page visible via Datasette has a corresponding JSON API, which can be accessed using the JSON link on the page - or by adding a .json extension to the URL: http://search-24ways.herokuapp.com/24ways-ae60295/articles.json?_search=acces%2A Better search using custom SQL The search results we get back from ../articles?_search=svg are OK, but the order they are returned in is not ideal - they’re actually being returned in the order they were inserted into the database! You can see why this is happening by clicking the View and edit SQL link on that search results page. This exposes the underlying SQL query, which looks like this: select rowid, * from articles where rowid in ( select rowid from articles_fts where articles_fts match :search ) order by rowid limit 101 We can do better than this by constructing a custom SQL query. Here’s the query we will use instead: select snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet, articles_fts.rank, articles.title, articles.url, articles.author, articles.year from articles join articles_fts on articles.rowid = articles_fts.rowid where articles_fts match :search || "*" order by rank limit 10; You can try this query out directly - since Datasette opens the underling SQLite database in read-only mode and enforces a one second time limit on queries, it’s safe to allow users to provide arbitrary SQL select queries for Datasette to execute. There’s a lot going on here! Let’s break the SQL down line-by-line: select snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet, We’re using snippet(), a built-in SQLite function, to generate a snippet highlighting the words that matched the query. We use two unique strings that I made up to mark the beginning and end of each match - you’ll see why in the JavaScript later on. articles_fts.rank, articles.title, articles.url, articles.author, articles.year These are the other fields we need back - most of them are from the articles table but we retrieve the rank (representing the strength of the search match) from the magical articles_fts table. from articles join articles_fts on articles.rowid = articles_fts.rowid articles is the table containing our data. articles_fts is a magic SQLite virtual table which implements full-text search - we need to join against it to be able to query it. where articles_fts match :search || "*" order by rank limit 10; :search || "*" takes the ?search= argument from the page querystring and adds a * to the end of it, giving us the wildcard search that we want for autocomplete. We then match that against the articles_fts table using the match operator. Finally, we order by rank so that the best matching results are returned at the top - and limit to the first 10 results. How do we turn this into an API? As before, the secret is to add the .json extension. Datasette actually supports multiple shapes of JSON - we’re going to use ?_shape=array to get back a plain array of objects: JSON API call to search for articles matching SVG The HTML version of that page shows the time taken to execute the SQL in the footer. Hitting refresh a few times, I get response times between 2 and 5ms - easily fast enough to power a responsive autocomplete feature. A simple JavaScript autocomplete search interface I considered building this using React or Svelte or another of the myriad of JavaScript framework options available today, but then I remembered that vanilla JavaScript in 2018 is a very productive environment all on its own. We need a few small utility functions: first, a classic debounce function adapted from this one by David Walsh: function debounce(func, wait, immediate) { let timeout; return function() { let context = this, args = arguments; let later = () => { timeout = null; if (!immediate) func.apply(context, args); }; let callNow = immediate && !timeout; clearTimeout(timeout); timeout = setTimeout(later, wait); if (callNow) func.apply(context, args); }; }; We’ll use this to only send fetch() requests a maximum of once every 100ms while the user is typing. Since we’re rendering data that might include HTML tags (24 ways is a site about web development after all), we need an HTML escaping function. I’m amazed that browsers still don’t bundle a default one of these: const htmlEscape = (s) => s.replace( />/g, '&gt;' ).replace( /</g, '&lt;' ).replace( /&/g, '&' ).replace( /"/g, '&quot;' ).replace( /'/g, '&#039;' ); We need some HTML for the search form, and a div in which to render the results: <h1>Autocomplete search</h1> <form> <p><input id="searchbox" type="search" placeholder="Search 24ways" style="width: 60%"></p> </form> <div id="results"></div> And now the autocomplete implementation itself, as a glorious, messy stream-of-consciousness of JavaScript: // Embed the SQL query in a multi-line backtick string: const sql = `select snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet, articles_fts.rank, articles.title, articles.url, articles.author, articles.year from articles join articles_fts on articles.rowid = articles_fts.rowid where articles_fts match :search || "*" order by rank limit 10`; // Grab a reference to the <input type="search"> const searchbox = document.getElementById("searchbox"); // Used to avoid race-conditions: let requestInFlight = null; searchbox.onkeyup = debounce(() => { const q = searchbox.value; // Construct the API URL, using encodeURIComponent() for the parameters const url = ( "https://search-24ways.herokuapp.com/24ways-866073b.json?sql=" + encodeURIComponent(sql) + `&search=${encodeURIComponent(q)}&_shape=array` ); // Unique object used just for race-condition comparison let currentRequest = {}; requestInFlight = currentRequest; fetch(url).then(r => r.json()).then(d => { if (requestInFlight !== currentRequest) { // Avoid race conditions where a slow request returns // after a faster one. return; } let results = d.map(r => ` <div class="result"> <h3><a href="${r.url}">${htmlEscape(r.title)}</a></h3> <p><small>${htmlEscape(r.author)} - ${r.year}</small></p> <p>${highlight(r.snippet)}</p> </div> `).join(""); document.getElementById("results").innerHTML = results; }); }, 100); // debounce every 100ms There’s just one more utility function, used to help construct the HTML results: const highlight = (s) => htmlEscape(s).replace( /b4de2a49c8/g, '<b>' ).replace( /8c94a2ed4b/g, '</b>' ); This is what those unique strings passed to the snippet() function were for. Avoiding race conditions in autocomplete One trick in this code that you may not have seen before is the way race-conditions are handled. Any time you build an autocomplete feature, you have to consider the following case: User types acces Browser sends request A - querying documents matching acces* User continues to type accessibility Browser sends request B - querying documents matching accessibility* Request B returns. It was fast, because there are fewer documents matching the full term The results interface updates with the documents from request B, matching accessibility* Request A returns results (this was the slower of the two requests) The results interface updates with the documents from request A - results matching access* This is a terrible user experience: the user saw their desired results for a brief second, and then had them snatched away and replaced with those results from earlier on. Thankfully there’s an easy way to avoid this. I set up a variable in the outer scope called requestInFlight, initially set to null. Any time I start a new fetch() request, I create a new currentRequest = {} object and assign it to the outer requestInFlight as well. When the fetch() completes, I use requestInFlight !== currentRequest to sanity check that the currentRequest object is strictly identical to the one that was in flight. If a new request has been triggered since we started the current request we can detect that and avoid updating the results. It’s not a lot of code, really And that’s the whole thing! The code is pretty ugly, but when the entire implementation clocks in at fewer than 70 lines of JavaScript, I honestly don’t think it matters. You’re welcome to refactor it as much you like. How good is this search implementation? I’ve been building search engines for a long time using a wide variety of technologies and I’m happy to report that using SQLite in this way is genuinely a really solid option. It scales happily up to hundreds of MBs (or even GBs) of data, and the fact that it’s based on SQL makes it easy and flexible to work with. A surprisingly large number of desktop and mobile applications you use every day implement their search feature on top of SQLite. More importantly though, I hope that this demonstrates that using Datasette for an API means you can build relatively sophisticated API-backed applications with very little backend programming effort. If you’re working with a small-to-medium amount of data that changes infrequently, you may not need a more expensive database. Datasette-powered applications easily fit within the free tier of both Heroku and Zeit Now. For more of my writing on Datasette, check out the datasette tag on my blog. And if you do build something fun with it, please let me know on Twitter. 2018 Simon Willison simonwillison 2018-12-19T00:00:00+00:00 https://24ways.org/2018/fast-autocomplete-search-for-your-website/ code
Powered by Datasette · Query took 0.569ms