{"database": "24ways", "table": "articles", "is_view": false, "human_description_en": "where author = \"Simon Willison\"", "rows": [[168, "Unobtrusively Mapping Microformats with jQuery", "Microformats are everywhere. You can\u2019t shake an electronic stick these days without accidentally poking a microformat-enabled site, and many developers use microformats as a matter of course. And why not? After all, why invent your own class names when you can re-use pre-defined ones that give your site extra functionality for free?\n\nNevertheless, while it\u2019s good to know that users of tools such as Tails and Operator will derive added value from your shiny semantics, it\u2019s nice to be able to reuse that effort in your own code.\n\nWe\u2019re going to build a map of some of my favourite restaurants in Brighton. Fitting with the principles of unobtrusive JavaScript, we\u2019ll start with a semantically marked up list of restaurants, then use JavaScript to add the map, look up the restaurant locations and plot them as markers.\n\nWe\u2019ll be using a couple of powerful tools. The first is jQuery, a JavaScript library that is ideally suited for unobtrusive scripting. jQuery allows us to manipulate elements on the page based on their CSS selector, which makes it easy to extract information from microformats.\n\nThe second is Mapstraction, introduced here by Andrew Turner a few days ago. We\u2019ll be using Google Maps in the background, but Mapstraction makes it easy to change to a different provider if we want to later.\n\nGetting Started\n\nWe\u2019ll start off with a simple collection of microformatted restaurant details, representing my seven favourite restaurants in Brighton. The full, unstyled list can be seen in restaurants-plain.html. Each restaurant listing looks like this:\n\n<li class=\"vcard\">\n\t<h3><a class=\"fn org url\" href=\"http://www.riddleandfinns.co.uk/\">Riddle & Finns</a></h3>\n\t<div class=\"adr\">\n\t\t<p class=\"street-address\">12b Meeting House Lane</p>\n\t\t<p><span class=\"locality\">Brighton</span>, <abbr class=\"country-name\" title=\"United Kingdom\">UK</abbr></p>\n\t\t<p class=\"postal-code\">BN1 1HB</p>\n\t</div>\n\t<p>Telephone: <span class=\"tel\">+44 (0)1273 323 008</span></p>\n\t<p>E-mail: <a href=\"mailto:info@riddleandfinns.co.uk\" class=\"email\">info@riddleandfinns.co.uk</a></p>\n</li>\n\nSince we\u2019re dealing with a list of restaurants, each hCard is marked up inside a list item. Each restaurant is an organisation; we signify this by placing the classes fn and org on the element surrounding the restaurant\u2019s name (according to the hCard spec, setting both fn and org to the same value signifies that the hCard represents an organisation rather than a person).\n\nThe address information itself is contained within a div of class adr. Note that the HTML <address> element is not suitable here for two reasons: firstly, it is intended to mark up contact details for the current document rather than generic addresses; secondly, address is an inline element and as such cannot contain the paragraphs elements used here for the address information.\n\nA nice thing about microformats is that they provide us with automatic hooks for our styling. For the moment we\u2019ll just tidy up the whitespace a bit; for more advanced style tips consult John Allsop\u2019s guide from 24 ways 2006.\n\n.vcard p {\n\tmargin: 0;\n}\n.adr {\n\tmargin-bottom: 0.5em;\n}\n\nTo plot the restaurants on a map we\u2019ll need latitude and longitude for each one. We can find this out from their address using geocoding. Most mapping APIs include support for geocoding, which means we can pass the API an address and get back a latitude/longitude point. Mapstraction provides an abstraction layer around these APIs which can be included using the following script tag:\n\n<script type=\"text/javascript\" src=\"http://mapstraction.com/src/mapstraction-geocode.js\"></script>\n\nWhile we\u2019re at it, let\u2019s pull in the other external scripts we\u2019ll be using:\n\n<script type=\"text/javascript\" src=\"jquery-1.2.1.js\"></script>\n<script src=\"http://maps.google.com/maps?file=api&v=2&key=YOUR_KEY\" type=\"text/javascript\"></script>\n<script type=\"text/javascript\" src=\"http://mapstraction.com/src/mapstraction.js\"></script>\n<script type=\"text/javascript\" src=\"http://mapstraction.com/src/mapstraction-geocode.js\"></script>\n\nThat\u2019s everything set up: let\u2019s write some JavaScript!\n\nIn jQuery, almost every operation starts with a call to the jQuery function. The function simulates method overloading to behave in different ways depending on the arguments passed to it. When writing unobtrusive JavaScript it\u2019s important to set up code to execute when the page has loaded to the point that the DOM is available to be manipulated. To do this with jQuery, pass a callback function to the jQuery function itself:\n\njQuery(function() {\n\t// This code will be executed when the DOM is ready\n});\n\nInitialising the map\n\nThe first thing we need to do is initialise our map. Mapstraction needs a div with an explicit width, height and ID to show it where to put the map. Our document doesn\u2019t currently include this markup, but we can insert it with a single line of jQuery code:\n\njQuery(function() {\n\t// First create a div to host the map\n\tvar themap = jQuery('<div id=\"themap\"></div>').css({\n\t\t'width': '90%',\n\t\t'height': '400px'\n\t}).insertBefore('ul.restaurants');\n});\n\nWhile this is technically just a single line of JavaScript (with line-breaks added for readability) it\u2019s actually doing quite a lot of work. Let\u2019s break it down in to steps:\n\nvar themap = jQuery('<div id=\"themap\"></div>')\n\nHere\u2019s jQuery\u2019s method overloading in action: if you pass it a string that starts with a < it assumes that you wish to create a new HTML element. This provides us with a handy shortcut for the more verbose DOM equivalent:\n\nvar themap = document.createElement('div');\nthemap.id = 'themap';\n\nNext we want to apply some CSS rules to the element. jQuery supports chaining, which means we can continue to call methods on the object returned by jQuery or any of its methods:\n\nvar themap = jQuery('<div id=\"themap\"></div>').css({\n\t'width': '90%',\n\t'height': '400px'\n})\n\nFinally, we need to insert our new HTML element in to the page. jQuery provides a number of methods for element insertion, but in this case we want to position it directly before the <ul> we are using to contain our restaurants. jQuery\u2019s insertBefore() method takes a CSS selector indicating an element already on the page and places the current jQuery selection directly before that element in the DOM.\n\nvar themap = jQuery('<div id=\"themap\"></div>').css({\n\t'width': '90%',\n\t'height': '400px'\n}).insertBefore('ul.restaurants');\n\nFinally, we need to initialise the map itself using Mapstraction. The Mapstraction constructor takes two arguments: the first is the ID of the element used to position the map; the second is the mapping provider to use (in this case google ):\n\n// Initialise the map\nvar mapstraction = new Mapstraction('themap','google');\n\nWe want the map to appear centred on Brighton, so we\u2019ll need to know the correct co-ordinates. We can use www.getlatlon.com to find both the co-ordinates and the initial map zoom level.\n\n// Show map centred on Brighton\nmapstraction.setCenterAndZoom(\n\tnew LatLonPoint(50.82423734980143, -0.14007568359375),\n\t15 // Zoom level appropriate for Brighton city centre\n);\n\nWe also want controls on the map to allow the user to zoom in and out and toggle between map and satellite view.\n\nmapstraction.addControls({\n\tzoom: 'large',\n\tmap_type: true\n});\n\nAdding the markers\n\nIt\u2019s finally time to parse some microformats. Since we\u2019re using hCard, the information we want is wrapped in elements with the class vcard. We can use jQuery\u2019s CSS selector support to find them:\n\nvar vcards = jQuery('.vcard');\n\nNow that we\u2019ve found them, we need to create a marker for each one in turn. Rather than using a regular JavaScript for loop, we can instead use jQuery\u2019s each() method to execute a function against each of the hCards.\n\njQuery('.vcard').each(function() {\n\t// Do something with the hCard\n});\n\nWithin the callback function, this is set to the current DOM element (in our case, the list item). If we want to call the magic jQuery methods on it we\u2019ll need to wrap it in another call to jQuery:\n\njQuery('.vcard').each(function() {\n\tvar hcard = jQuery(this);\n});\n\nThe Google maps geocoder seems to work best if you pass it the street address and a postcode. We can extract these using CSS selectors: this time, we\u2019ll use jQuery\u2019s find() method which searches within the current jQuery selection:\n\nvar streetaddress = hcard.find('.street-address').text();\nvar postcode = hcard.find('.postal-code').text();\n\nThe text() method extracts the text contents of the selected node, minus any HTML markup.\n\nWe\u2019ve got the address; now we need to geocode it. Mapstraction\u2019s geocoding API requires us to first construct a MapstractionGeocoder, then use the geocode() method to pass it an address. Here\u2019s the code outline:\n\nvar geocoder = new MapstractionGeocoder(onComplete, 'google');\ngeocoder.geocode({'address': 'the address goes here');\n\nThe onComplete function is executed when the geocoding operation has been completed, and will be passed an object with the resulting point on the map. We just want to create a marker for the point:\n\nvar geocoder = new MapstractionGeocoder(function(result) {\n\tvar marker = new Marker(result.point);\n\tmapstraction.addMarker(marker);\n}, 'google');   \n\nFor our purposes, joining the street address and postcode with a comma to create the address should suffice:\n\ngeocoder.geocode({'address': streetaddress + ', ' + postcode});   \n\nThere\u2019s one last step: when the marker is clicked, we want to display details of the restaurant. We can do this with an info bubble, which can be configured by passing in a string of HTML. We\u2019ll construct that HTML using jQuery\u2019s html() method on our hcard object, which extracts the HTML contained within that DOM node as a string.\n\nvar marker = new Marker(result.point);\nmarker.setInfoBubble(\n\t'<div class=\"bubble\">' + hcard.html() + '</div>'\n);\nmapstraction.addMarker(marker);\n\nWe\u2019ve wrapped the bubble in a div with class bubble to make it easier to style. Google Maps can behave strangely if you don\u2019t provide an explicit width for your info bubbles, so we\u2019ll add that to our CSS now:\n\n.bubble {\n\twidth: 300px;\n}\n\nThat\u2019s everything we need: let\u2019s combine our code together:\n\njQuery(function() {\n\t// First create a div to host the map\n\tvar themap = jQuery('<div id=\"themap\"></div>').css({\n\t\t'width': '90%',\n\t\t'height': '400px'\n\t}).insertBefore('ul.restaurants');\n\t// Now initialise the map\n\tvar mapstraction = new Mapstraction('themap','google');\n\tmapstraction.addControls({\n\t\tzoom: 'large',\n\t\tmap_type: true\n\t});\n\t// Show map centred on Brighton\n\tmapstraction.setCenterAndZoom(\n\t\tnew LatLonPoint(50.82423734980143, -0.14007568359375),\n\t\t15 // Zoom level appropriate for Brighton city centre\n\t);\n\t// Geocode each hcard and add a marker\n\tjQuery('.vcard').each(function() {\n\t\tvar hcard = jQuery(this);\n\t\tvar streetaddress = hcard.find('.street-address').text();\n\t\tvar postcode = hcard.find('.postal-code').text();\n\t\tvar geocoder = new MapstractionGeocoder(function(result) {\n\t\t\tvar marker = new Marker(result.point);\n\t\t\tmarker.setInfoBubble(\n\t\t\t\t'<div class=\"bubble\">' + hcard.html() + '</div>'\n\t\t\t);\n\t\t\tmapstraction.addMarker(marker);\n\t\t}, 'google');\t \n\t\tgeocoder.geocode({'address': streetaddress + ', ' + postcode});\n\t});\n});\n\nHere\u2019s the finished code.\n\nThere\u2019s one last shortcut we can add: jQuery provides the $ symbol as an alias for jQuery. We could just go through our code and replace every call to jQuery() with a call to $(), but this would cause incompatibilities if we ever attempted to use our script on a page that also includes the Prototype library. A more robust approach is to start our code with the following:\n\njQuery(function($) {\n\t// Within this function, $ now refers to jQuery\n\t// ...\n});\n\njQuery cleverly passes itself as the first argument to any function registered to the DOM ready event, which means we can assign a local $ variable shortcut without affecting the $ symbol in the global scope. This makes it easy to use jQuery with other libraries.\n\nLimitations of Geocoding\n\nYou may have noticed a discrepancy creep in to the last example: whereas my original list included seven restaurants, the geocoding example only shows five. This is because the Google Maps geocoder incorporates a rate limit: more than five lookups in a second and it starts returning error messages instead of regular results.\n\nIn addition to this problem, geocoding itself is an inexact science: while UK postcodes generally get you down to the correct street, figuring out the exact point on the street from the provided address usually isn\u2019t too accurate (although Google do a pretty good job).\n\nFinally, there\u2019s the performance overhead. We\u2019re making five geocoding requests to Google for every page served, even though the restaurants themselves aren\u2019t likely to change location any time soon. Surely there\u2019s a better way of doing this?\n\nMicroformats to the rescue (again)! The geo microformat suggests simple classes for including latitude and longitude information in a page. We can add specific points for each restaurant using the following markup:\n\n<li class=\"vcard\">\n\t<h3 class=\"fn org\">E-Kagen</h3>\n\t<div class=\"adr\">\n\t\t<p class=\"street-address\">22-23 Sydney Street</p>\n\t\t<p><span class=\"locality\">Brighton</span>, <abbr class=\"country-name\" title=\"United Kingdom\">UK</abbr></p>\n\t\t<p class=\"postal-code\">BN1 4EN</p>\n\t</div>\n\t<p>Telephone: <span class=\"tel\">+44 (0)1273 687 068</span></p>\n\t<p class=\"geo\">Lat/Lon: \n\t\t<span class=\"latitude\">50.827917</span>, \n\t\t<span class=\"longitude\">-0.137764</span>\n\t</p>\n</li>\n\nAs before, I used www.getlatlon.com to find the exact locations \u2013 I find satellite view is particularly useful for locating individual buildings.\n\nLatitudes and longitudes are great for machines but not so useful for human beings. We could hide them entirely with display: none, but I prefer to merely de-emphasise them (someone might want them for their GPS unit):\n\n.vcard .geo {\n\tmargin-top: 0.5em;\n\tfont-size: 0.85em;\n\tcolor: #ccc;\n}\n\nIt\u2019s probably a good idea to hide them completely when they\u2019re displayed inside an info bubble:\n\n.bubble .geo {\n\tdisplay: none;\n}\n\nWe can extract the co-ordinates in the same way we extracted the address. Since we\u2019re no longer geocoding anything our code becomes a lot simpler:\n\n$('.vcard').each(function() {\n\tvar hcard = $(this);\n\tvar latitude = hcard.find('.geo .latitude').text();\n\tvar longitude = hcard.find('.geo .longitude').text();\n\tvar marker = new Marker(new LatLonPoint(latitude, longitude));\n\tmarker.setInfoBubble(\n\t\t'<div class=\"bubble\">' + hcard.html() + '</div>'\n\t);\n\tmapstraction.addMarker(marker);\n});\n\nAnd here\u2019s the finished geo example.\n\nFurther reading\n\nWe\u2019ve only scratched the surface of what\u2019s possible with microformats, jQuery (or just regular JavaScript) and a bit of imagination. If this example has piqued your interest, the following links should give you some more food for thought.\n\n\n\tThe hCard specification\n\tNotes on parsing hCards\n\tjQuery for JavaScript programmers \u2013 my extended tutorial on jQuery.\n\tDann Webb\u2019s Sumo \u2013 a full JavaScript library for parsing microformats, based around some clever metaprogramming techniques.\n\tJeremy Keith\u2019s Adactio Austin \u2013 the first place I saw using microformats to unobtrusively plot locations on a map. Makes clever use of hEvent as well.", "2007", "Simon Willison", "simonwillison", "2007-12-12T00:00:00+00:00", "https://24ways.org/2007/unobtrusively-mapping-microformats-with-jquery/", "code"], [249, "Fast Autocomplete Search for Your Website", "Every website deserves a great search engine - but building a search engine can be a lot of work, and hosting it can quickly get expensive.\nI\u2019m going to build a search engine for 24 ways that\u2019s fast enough to support autocomplete (a.k.a. typeahead) search queries and can be hosted for free. I\u2019ll be using wget, Python, SQLite, Jupyter, sqlite-utils and my open source Datasette tool to build the API backend, and a few dozen lines of modern vanilla JavaScript to build the interface.\n\nTry it out here, then read on to see how I built it.\nFirst step: crawling the data\nThe first step in building a search engine is to grab a copy of the data that you plan to make searchable.\nThere are plenty of potential ways to do this: you might be able to pull it directly from a database, or extract it using an API. If you don\u2019t have access to the raw data, you can imitate Google and write a crawler to extract the data that you need.\nI\u2019m going to do exactly that against 24 ways: I\u2019ll build a simple crawler using wget, a command-line tool that features a powerful \u201crecursive\u201d mode that\u2019s ideal for scraping websites.\nWe\u2019ll start at the https://24ways.org/archives/ page, which links to an archived index for every year that 24 ways has been running.\nThen we\u2019ll tell wget to recursively crawl the website, using the --recursive flag.\nWe don\u2019t want to fetch every single page on the site - we\u2019re only interested in the actual articles. Luckily, 24 ways has nicely designed URLs, so we can tell wget that we only care about pages that start with one of the years it has been running, using the -I argument like this: -I /2005,/2006,/2007,/2008,/2009,/2010,/2011,/2012,/2013,/2014,/2015,/2016,/2017\nWe want to be polite, so let\u2019s wait for 2 seconds between each request rather than hammering the site as fast as we can: --wait 2\nThe first time I ran this, I accidentally downloaded the comments pages as well. We don\u2019t want those, so let\u2019s exclude them from the crawl using -X \"/*/*/comments\".\nFinally, it\u2019s useful to be able to run the command multiple times without downloading pages that we have already fetched. We can use the --no-clobber option for this.\nTie all of those options together and we get this command:\nwget --recursive --wait 2 --no-clobber \n  -I /2005,/2006,/2007,/2008,/2009,/2010,/2011,/2012,/2013,/2014,/2015,/2016,/2017 \n  -X \"/*/*/comments\" \n  https://24ways.org/archives/ \nIf you leave this running for a few minutes, you\u2019ll end up with a folder structure something like this:\n$ find 24ways.org\n24ways.org\n24ways.org/2013\n24ways.org/2013/why-bother-with-accessibility\n24ways.org/2013/why-bother-with-accessibility/index.html\n24ways.org/2013/levelling-up\n24ways.org/2013/levelling-up/index.html\n24ways.org/2013/project-hubs\n24ways.org/2013/project-hubs/index.html\n24ways.org/2013/credits-and-recognition\n24ways.org/2013/credits-and-recognition/index.html\n...\nAs a quick sanity check, let\u2019s count the number of HTML pages we have retrieved:\n$ find 24ways.org | grep index.html | wc -l\n328\nThere\u2019s one last step! We got everything up to 2017, but we need to fetch the articles for 2018 (so far) as well. They aren\u2019t linked in the /archives/ yet so we need to point our crawler at the site\u2019s front page instead:\nwget --recursive --wait 2 --no-clobber \n  -I /2018 \n  -X \"/*/*/comments\" \n  https://24ways.org/\nThanks to --no-clobber, this is safe to run every day in December to pick up any new content.\nWe now have a folder on our computer containing an HTML file for every article that has ever been published on the site! Let\u2019s use them to build ourselves a search index.\nBuilding a search index using SQLite\nThere are many tools out there that can be used to build a search engine. You can use an open-source search server like Elasticsearch or Solr, a hosted option like Algolia or Amazon CloudSearch or you can tap into the built-in search features of relational databases like MySQL or PostgreSQL.\nI\u2019m going to use something that\u2019s less commonly used for web applications but makes for a powerful and extremely inexpensive alternative: SQLite.\nSQLite is the world\u2019s most widely deployed database, even though many people have never even heard of it. That\u2019s because it\u2019s designed to be used as an embedded database: it\u2019s commonly used by native mobile applications and even runs as part of the default set of apps on the Apple Watch!\nSQLite has one major limitation: unlike databases like MySQL and PostgreSQL, it isn\u2019t really designed to handle large numbers of concurrent writes. For this reason, most people avoid it for building web applications.\nThis doesn\u2019t matter nearly so much if you are building a search engine for infrequently updated content - say one for a site that only publishes new content on 24 days every year.\nIt turns out SQLite has very powerful full-text search functionality built into the core database - the FTS5 extension.\nI\u2019ve been doing a lot of work with SQLite recently, and as part of that, I\u2019ve been building a Python utility library to make building new SQLite databases as easy as possible, called sqlite-utils. It\u2019s designed to be used within a Jupyter notebook - an enormously productive way of interacting with Python code that\u2019s similar to the Observable notebooks Natalie described on 24 ways yesterday.\nIf you haven\u2019t used Jupyter before, here\u2019s the fastest way to get up and running with it - assuming you have Python 3 installed on your machine. We can use a Python virtual environment to ensure the software we are installing doesn\u2019t clash with any other installed packages:\n$ python3 -m venv ./jupyter-venv\n$ ./jupyter-venv/bin/pip install jupyter\n# ... lots of installer output\n# Now lets install some extra packages we will need later\n$ ./jupyter-venv/bin/pip install beautifulsoup4 sqlite-utils html5lib\n# And start the notebook web application\n$ ./jupyter-venv/bin/jupyter-notebook\n# This will open your browser to Jupyter at http://localhost:8888/\nYou should now be in the Jupyter web application. Click New -> Python 3 to start a new notebook.\nA neat thing about Jupyter notebooks is that if you publish them to GitHub (either in a regular repository or as a Gist), it will render them as HTML. This makes them a very powerful way to share annotated code. I\u2019ve published the notebook I used to build the search index on my GitHub account. \n\u200b\n\nHere\u2019s the Python code I used to scrape the relevant data from the downloaded HTML files. Check out the notebook for a line-by-line explanation of what\u2019s going on.\nfrom pathlib import Path\nfrom bs4 import BeautifulSoup as Soup\nbase = Path(\"/Users/simonw/Dropbox/Development/24ways-search\")\narticles = list(base.glob(\"*/*/*/*.html\"))\n# articles is now a list of paths that look like this:\n# PosixPath('...24ways-search/24ways.org/2013/why-bother-with-accessibility/index.html')\ndocs = []\nfor path in articles:\n    year = str(path.relative_to(base)).split(\"/\")[1]\n    url = 'https://' + str(path.relative_to(base).parent) + '/'\n    soup = Soup(path.open().read(), \"html5lib\")\n    author = soup.select_one(\".c-continue\")[\"title\"].split(\n        \"More information about\"\n    )[1].strip()\n    author_slug = soup.select_one(\".c-continue\")[\"href\"].split(\n        \"/authors/\"\n    )[1].split(\"/\")[0]\n    published = soup.select_one(\".c-meta time\")[\"datetime\"]\n    contents = soup.select_one(\".e-content\").text.strip()\n    title = soup.find(\"title\").text.split(\" \u25c6\")[0]\n    try:\n        topic = soup.select_one(\n            '.c-meta a[href^=\"/topics/\"]'\n        )[\"href\"].split(\"/topics/\")[1].split(\"/\")[0]\n    except TypeError:\n        topic = None\n    docs.append({\n        \"title\": title,\n        \"contents\": contents,\n        \"year\": year,\n        \"author\": author,\n        \"author_slug\": author_slug,\n        \"published\": published,\n        \"url\": url,\n        \"topic\": topic,\n    })\nAfter running this code, I have a list of Python dictionaries representing each of the documents that I want to add to the index. The list looks something like this:\n[\n  {\n    \"title\": \"Why Bother with Accessibility?\",\n    \"contents\": \"Web accessibility (known in other fields as inclus...\",\n    \"year\": \"2013\",\n    \"author\": \"Laura Kalbag\",\n    \"author_slug\": \"laurakalbag\",\n    \"published\": \"2013-12-10T00:00:00+00:00\",\n    \"url\": \"https://24ways.org/2013/why-bother-with-accessibility/\",\n    \"topic\": \"design\"\n  },\n  {\n    \"title\": \"Levelling Up\",\n    \"contents\": \"Hello, 24 ways. Iu2019m Ashley and I sell property ins...\",\n    \"year\": \"2013\",\n    \"author\": \"Ashley Baxter\",\n    \"author_slug\": \"ashleybaxter\",\n    \"published\": \"2013-12-06T00:00:00+00:00\",\n    \"url\": \"https://24ways.org/2013/levelling-up/\",\n    \"topic\": \"business\"\n  },\n  ...\nMy sqlite-utils library has the ability to take a list of objects like this and automatically create a SQLite database table with the right schema to store the data. Here\u2019s how to do that using this list of dictionaries.\nimport sqlite_utils\ndb = sqlite_utils.Database(\"/tmp/24ways.db\")\ndb[\"articles\"].insert_all(docs)\nThat\u2019s all there is to it! The library will create a new database and add a table to it called articles with the necessary columns, then insert all of the documents into that table.\n(I put the database in /tmp/ for the moment - you can move it to a more sensible location later on.)\nYou can inspect the table using the sqlite3 command-line utility (which comes with OS X) like this:\n$ sqlite3 /tmp/24ways.db\nsqlite> .headers on\nsqlite> .mode column\nsqlite> select title, author, year from articles;\ntitle                           author        year      \n------------------------------  ------------  ----------\nWhy Bother with Accessibility?  Laura Kalbag  2013      \nLevelling Up                    Ashley Baxte  2013      \nProject Hubs: A Home Base for   Brad Frost    2013      \nCredits and Recognition         Geri Coady    2013      \nManaging a Mind                 Christopher   2013      \nRun Ragged                      Mark Boulton  2013      \nGet Started With GitHub Pages   Anna Debenha  2013      \nCoding Towards Accessibility    Charlie Perr  2013      \n...\n<Ctrl+D to quit>\nThere\u2019s one last step to take in our notebook. We know we want to use SQLite\u2019s full-text search feature, and sqlite-utils has a simple convenience method for enabling it for a specified set of columns in a table. We want to be able to search by the title, author and contents fields, so we call the enable_fts() method like this:\ndb[\"articles\"].enable_fts([\"title\", \"author\", \"contents\"])\nIntroducing Datasette\nDatasette is the open-source tool I\u2019ve been building that makes it easy to both explore SQLite databases and publish them to the internet.\nWe\u2019ve been exploring our new SQLite database using the sqlite3 command-line tool. Wouldn\u2019t it be nice if we could use a more human-friendly interface for that?\nIf you don\u2019t want to install Datasette right now, you can visit https://search-24ways.herokuapp.com/ to try it out against the 24 ways search index data. I\u2019ll show you how to deploy Datasette to Heroku like this later in the article.\nIf you want to install Datasette locally, you can reuse the virtual environment we created to play with Jupyter:\n./jupyter-venv/bin/pip install datasette\nThis will install Datasette in the ./jupyter-venv/bin/ folder. You can also install it system-wide using regular pip install datasette.\nNow you can run Datasette against the 24ways.db file we created earlier like so:\n./jupyter-venv/bin/datasette /tmp/24ways.db\nThis will start a local webserver running. Visit http://localhost:8001/ to start interacting with the Datasette web application.\nIf you want to try out Datasette without creating your own 24ways.db file you can download the one I created directly from https://search-24ways.herokuapp.com/24ways-ae60295.db\nPublishing the database to the internet\nOne of the goals of the Datasette project is to make deploying data-backed APIs to the internet as easy as possible. Datasette has a built-in command for this, datasette publish. If you have an account with Heroku or Zeit Now, you can deploy a database to the internet with a single command. Here\u2019s how I deployed https://search-24ways.herokuapp.com/ (running on Heroku\u2019s free tier) using datasette publish:\n$ ./jupyter-venv/bin/datasette publish heroku /tmp/24ways.db --name search-24ways\n-----> Python app detected\n-----> Installing requirements with pip\n\n-----> Running post-compile hook\n-----> Discovering process types\n       Procfile declares types -> web\n\n-----> Compressing...\n       Done: 47.1M\n-----> Launching...\n       Released v8\n       https://search-24ways.herokuapp.com/ deployed to Heroku\nIf you try this out, you\u2019ll need to pick a different --name, since I\u2019ve already taken search-24ways.\nYou can run this command as many times as you like to deploy updated versions of the underlying database.\nSearching and faceting\nDatasette can detect tables with SQLite full-text search configured, and will add a search box directly to the page. Take a look at http://search-24ways.herokuapp.com/24ways-b607e21/articles to see this in action.\n\u200b\n\nSQLite search supports wildcards, so if you want autocomplete-style search where you don\u2019t need to enter full words to start getting results you can add a * to the end of your search term. Here\u2019s a search for access* which returns articles on accessibility:\nhttp://search-24ways.herokuapp.com/24ways-ae60295/articles?_search=acces%2A\nA neat feature of Datasette is the ability to calculate facets against your data. Here\u2019s a page showing search results for svg with facet counts calculated against both the year and the topic columns:\nhttp://search-24ways.herokuapp.com/24ways-ae60295/articles?_search=svg&_facet=year&_facet=topic\nEvery page visible via Datasette has a corresponding JSON API, which can be accessed using the JSON link on the page - or by adding a .json extension to the URL:\nhttp://search-24ways.herokuapp.com/24ways-ae60295/articles.json?_search=acces%2A\nBetter search using custom SQL\nThe search results we get back from ../articles?_search=svg are OK, but the order they are returned in is not ideal - they\u2019re actually being returned in the order they were inserted into the database! You can see why this is happening by clicking the View and edit SQL link on that search results page.\nThis exposes the underlying SQL query, which looks like this:\nselect rowid, * from articles where rowid in (\n  select rowid from articles_fts where articles_fts match :search\n) order by rowid limit 101\nWe can do better than this by constructing a custom SQL query. Here\u2019s the query we will use instead:\nselect\n  snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet,\n  articles_fts.rank, articles.title, articles.url, articles.author, articles.year\nfrom articles\n  join articles_fts on articles.rowid = articles_fts.rowid\nwhere articles_fts match :search || \"*\"\n  order by rank limit 10;\nYou can try this query out directly - since Datasette opens the underling SQLite database in read-only mode and enforces a one second time limit on queries, it\u2019s safe to allow users to provide arbitrary SQL select queries for Datasette to execute.\nThere\u2019s a lot going on here! Let\u2019s break the SQL down line-by-line:\nselect\n  snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet,\nWe\u2019re using snippet(), a built-in SQLite function, to generate a snippet highlighting the words that matched the query. We use two unique strings that I made up to mark the beginning and end of each match - you\u2019ll see why in the JavaScript later on.\n  articles_fts.rank, articles.title, articles.url, articles.author, articles.year\nThese are the other fields we need back - most of them are from the articles table but we retrieve the rank (representing the strength of the search match) from the magical articles_fts table.\nfrom articles\n  join articles_fts on articles.rowid = articles_fts.rowid\narticles is the table containing our data. articles_fts is a magic SQLite virtual table which implements full-text search - we need to join against it to be able to query it.\nwhere articles_fts match :search || \"*\"\n  order by rank limit 10;\n:search || \"*\" takes the ?search= argument from the page querystring and adds a * to the end of it, giving us the wildcard search that we want for autocomplete. We then match that against the articles_fts table using the match operator. Finally, we order by rank so that the best matching results are returned at the top - and limit to the first 10 results.\nHow do we turn this into an API? As before, the secret is to add the .json extension. Datasette actually supports multiple shapes of JSON - we\u2019re going to use ?_shape=array to get back a plain array of objects:\nJSON API call to search for articles matching SVG\nThe HTML version of that page shows the time taken to execute the SQL in the footer. Hitting refresh a few times, I get response times between 2 and 5ms - easily fast enough to power a responsive autocomplete feature.\nA simple JavaScript autocomplete search interface\nI considered building this using React or Svelte or another of the myriad of JavaScript framework options available today, but then I remembered that vanilla JavaScript in 2018 is a very productive environment all on its own.\nWe need a few small utility functions: first, a classic debounce function adapted from this one by David Walsh:\nfunction debounce(func, wait, immediate) {\n  let timeout;\n  return function() {\n    let context = this, args = arguments;\n    let later = () => {\n      timeout = null;\n      if (!immediate) func.apply(context, args);\n    };\n    let callNow = immediate && !timeout;\n    clearTimeout(timeout);\n    timeout = setTimeout(later, wait);\n    if (callNow) func.apply(context, args);\n  };\n};\nWe\u2019ll use this to only send fetch() requests a maximum of once every 100ms while the user is typing.\nSince we\u2019re rendering data that might include HTML tags (24 ways is a site about web development after all), we need an HTML escaping function. I\u2019m amazed that browsers still don\u2019t bundle a default one of these:\nconst htmlEscape = (s) => s.replace(\n  />/g, '&gt;'\n).replace(\n  /</g, '&lt;'\n).replace(\n  /&/g, '&'\n).replace(\n  /\"/g, '&quot;'\n).replace(\n  /'/g, '&#039;'\n);\nWe need some HTML for the search form, and a div in which to render the results:\n<h1>Autocomplete search</h1>\n<form>\n  <p><input id=\"searchbox\" type=\"search\" placeholder=\"Search 24ways\" style=\"width: 60%\"></p>\n</form>\n<div id=\"results\"></div>\nAnd now the autocomplete implementation itself, as a glorious, messy stream-of-consciousness of JavaScript:\n// Embed the SQL query in a multi-line backtick string:\nconst sql = `select\n  snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet,\n  articles_fts.rank, articles.title, articles.url, articles.author, articles.year\nfrom articles\n  join articles_fts on articles.rowid = articles_fts.rowid\nwhere articles_fts match :search || \"*\"\n  order by rank limit 10`;\n\n// Grab a reference to the <input type=\"search\">\nconst searchbox = document.getElementById(\"searchbox\");\n\n// Used to avoid race-conditions:\nlet requestInFlight = null;\n\nsearchbox.onkeyup = debounce(() => {\n  const q = searchbox.value;\n  // Construct the API URL, using encodeURIComponent() for the parameters\n  const url = (\n    \"https://search-24ways.herokuapp.com/24ways-866073b.json?sql=\" +\n    encodeURIComponent(sql) +\n    `&search=${encodeURIComponent(q)}&_shape=array`\n  );\n  // Unique object used just for race-condition comparison\n  let currentRequest = {};\n  requestInFlight = currentRequest;\n  fetch(url).then(r => r.json()).then(d => {\n    if (requestInFlight !== currentRequest) {\n      // Avoid race conditions where a slow request returns\n      // after a faster one.\n      return;\n    }\n    let results = d.map(r => `\n      <div class=\"result\">\n        <h3><a href=\"${r.url}\">${htmlEscape(r.title)}</a></h3>\n        <p><small>${htmlEscape(r.author)} - ${r.year}</small></p>\n        <p>${highlight(r.snippet)}</p>\n      </div>\n    `).join(\"\");\n    document.getElementById(\"results\").innerHTML = results;\n  });\n}, 100); // debounce every 100ms\nThere\u2019s just one more utility function, used to help construct the HTML results:\nconst highlight = (s) => htmlEscape(s).replace(\n  /b4de2a49c8/g, '<b>'\n).replace(\n  /8c94a2ed4b/g, '</b>'\n);\nThis is what those unique strings passed to the snippet() function were for.\nAvoiding race conditions in autocomplete\nOne trick in this code that you may not have seen before is the way race-conditions are handled. Any time you build an autocomplete feature, you have to consider the following case:\n\nUser types acces\nBrowser sends request A - querying documents matching acces*\nUser continues to type accessibility\nBrowser sends request B - querying documents matching accessibility*\nRequest B returns. It was fast, because there are fewer documents matching the full term\nThe results interface updates with the documents from request B, matching accessibility*\nRequest A returns results (this was the slower of the two requests)\nThe results interface updates with the documents from request A - results matching access*\n\nThis is a terrible user experience: the user saw their desired results for a brief second, and then had them snatched away and replaced with those results from earlier on.\nThankfully there\u2019s an easy way to avoid this. I set up a variable in the outer scope called requestInFlight, initially set to null.\nAny time I start a new fetch() request, I create a new currentRequest = {} object and assign it to the outer requestInFlight as well.\nWhen the fetch() completes, I use requestInFlight !== currentRequest to sanity check that the currentRequest object is strictly identical to the one that was in flight. If a new request has been triggered since we started the current request we can detect that and avoid updating the results.\nIt\u2019s not a lot of code, really\nAnd that\u2019s the whole thing! The code is pretty ugly, but when the entire implementation clocks in at fewer than 70 lines of JavaScript, I honestly don\u2019t think it matters. You\u2019re welcome to refactor it as much you like.\nHow good is this search implementation? I\u2019ve been building search engines for a long time using a wide variety of technologies and I\u2019m happy to report that using SQLite in this way is genuinely a really solid option. It scales happily up to hundreds of MBs (or even GBs) of data, and the fact that it\u2019s based on SQL makes it easy and flexible to work with.\nA surprisingly large number of desktop and mobile applications you use every day implement their search feature on top of SQLite.\nMore importantly though, I hope that this demonstrates that using Datasette for an API means you can build relatively sophisticated API-backed applications with very little backend programming effort. If you\u2019re working with a small-to-medium amount of data that changes infrequently, you may not need a more expensive database. Datasette-powered applications easily fit within the free tier of both Heroku and Zeit Now.\nFor more of my writing on Datasette, check out the datasette tag on my blog. And if you do build something fun with it, please let me know on Twitter.", "2018", "Simon Willison", "simonwillison", "2018-12-19T00:00:00+00:00", "https://24ways.org/2018/fast-autocomplete-search-for-your-website/", "code"], [326, "Don't be eval()", "JavaScript is an interpreted language, and like so many of its peers it includes the all powerful eval() function. eval() takes a string and executes it as if it were regular JavaScript code. It\u2019s incredibly powerful and incredibly easy to abuse in ways that make your code slower and harder to maintain. As a general rule, if you\u2019re using eval() there\u2019s probably something wrong with your design.\n\nCommon mistakes\n\nHere\u2019s the classic misuse of eval(). You have a JavaScript object, foo, and you want to access a property on it \u2013 but you don\u2019t know the name of the property until runtime. Here\u2019s how NOT to do it:\n\nvar property = 'bar';\nvar value = eval('foo.' + property);\n\nYes it will work, but every time that piece of code runs JavaScript will have to kick back in to interpreter mode, slowing down your app. It\u2019s also dirt ugly.\n\nHere\u2019s the right way of doing the above:\n\nvar property = 'bar';\nvar value = foo[property];\n\nIn JavaScript, square brackets act as an alternative to lookups using a dot. The only difference is that square bracket syntax expects a string.\n\nSecurity issues\n\nIn any programming language you should be extremely cautious of executing code from an untrusted source. The same is true for JavaScript \u2013 you should be extremely cautious of running eval() against any code that may have been tampered with \u2013 for example, strings taken from the page query string. Executing untrusted code can leave you vulnerable to cross-site scripting attacks.\n\nWhat\u2019s it good for?\n\nSome programmers say that eval() is B.A.D. \u2013 Broken As Designed \u2013 and should be removed from the language. However, there are some places in which it can dramatically simplify your code. A great example is for use with XMLHttpRequest, a component of the set of tools more popularly known as Ajax. XMLHttpRequest lets you make a call back to the server from JavaScript without refreshing the whole page. A simple way of using this is to have the server return JavaScript code which is then passed to eval(). Here is a simple function for doing exactly that \u2013 it takes the URL to some JavaScript code (or a server-side script that produces JavaScript) and loads and executes that code using XMLHttpRequest and eval().\n\nfunction evalRequest(url) {\n     var xmlhttp = new XMLHttpRequest();\n     xmlhttp.onreadystatechange = function() {\n          if (xmlhttp.readyState==4 && xmlhttp.status==200) {\n               eval(xmlhttp.responseText);\n          }\n     }\n     xmlhttp.open(\"GET\", url, true);\n     xmlhttp.send(null);\n }\n\nIf you want this to work with Internet Explorer you\u2019ll need to include this compatibility patch.", "2005", "Simon Willison", "simonwillison", "2005-12-07T00:00:00+00:00", "https://24ways.org/2005/dont-be-eval/", "code"]], "truncated": false, "table_rows_count": 336, "filtered_table_rows_count": 3, "expanded_columns": [], "expandable_columns": [], "columns": ["rowid", "title", "contents", "year", "author", "author_slug", "published", "url", "topic"], "primary_keys": [], "units": {}, "query": {"sql": "select rowid, * from articles where \"author\" = :p0 order by rowid limit 101", "params": {"p0": "Simon Willison"}}, "facet_results": {"topic": {"name": "topic", "results": [{"value": "code", "label": "code", "count": 3, "toggle_url": "http://search-24ways.herokuapp.com/24ways-f8f455f/articles.json?_facet=topic&_facet=author_slug&_facet=author&author=Simon+Willison&topic=code", "selected": false}], "truncated": false}, "author_slug": {"name": "author_slug", "results": [{"value": "simonwillison", "label": "simonwillison", "count": 3, "toggle_url": "http://search-24ways.herokuapp.com/24ways-f8f455f/articles.json?_facet=topic&_facet=author_slug&_facet=author&author=Simon+Willison&author_slug=simonwillison", "selected": false}], "truncated": false}, "author": {"name": "author", "results": [{"value": "Simon Willison", "label": "Simon Willison", "count": 3, "toggle_url": "http://search-24ways.herokuapp.com/24ways-f8f455f/articles.json?_facet=topic&_facet=author_slug&_facet=author", "selected": true}], "truncated": false}}, "suggested_facets": [], "next": null, "next_url": null, "query_ms": 14.134407043457031}