Practical: Implement an RDBMS Server for Simplepedia

Initial Due Date: 2024-10-31 11:59PM Final Due Date: 2024-12-09 4:00PM

Github Classroom Gradescope
In this practical, we will re-implement our Simplepedia server with a RDBMS backend.
Learning Goals:

Implement a NextJS server with a RDBMS backend
Gain familiarity with RDBMS-backed servers, Knex, SQLite, PostgreSQL and the Objection ORM library
Implement model validations as an example of aspect-oriented programming (AOP)

Submission: Successfully submitting your assignment is an ordered two-step process:

Commit and push your changes to GitHub
Submit your repository to the Gradescope assignment

Prerequisites

Create the git repository for your practical by accepting the assignment from GitHub Classroom. This will create a new repository for you with a skeleton application already setup for you.
Clone the repository to you computer with 💻 git clone (get the name of the repository from GitHub).
Open up the package.json file and add your name as the author of the package.
Install the module dependencies by executing 💻 npm install in the terminal.

Background

We are going to use NextJS API routes to implement a simplified version of the Simplepedia API. Recall the Simplepedia API is:

Endpoint	Method	Action
/api/sections	GET	Fetch a JSON array of all sections in sorted order
/api/articles	GET	Fetch the entire article collection as an array sorted by title
/api/articles?section=:section	GET	Fetch all articles in the corresponding section sorted by title
/api/articles	POST	Add a new article to the collection (the new article should be provided as the JSON-encoded request body)
/api/articles/:id	GET	Get article with `id` of :id
/api/articles/:id	PUT	Update the article with `id` of :id (entire updated article, including `id` should be provided as the JSON-encoded request body)

Unlike our previous practical, our server will use an RDBMS backend. For our unit tests we will first use SQLite then PostgreSQL database. By using the Knex library, the same code will (generally) work with both. Note, there are some differences among database engines, and particularly between SQLite and MySQL/PostgreSQL. In the past we had used SQLite for development and PostgreSQL in production (and will get started with approach here), but there were too many differences. Instead will use Docker to help us to develop and test with PostgreSQL as well.

Setup Knex

Knex is both a library and a command line tool (usable via npx). Install Knex and the database clients (for SQLite 3 and PostgreSQL):

💻 npm install --save knex sqlite3 pg

Create a file in the root of your practical directory named knexfile.js and paste the following to configure Knex’s database connections. You are configuring a SQLite-based test environment (using a specific seed directory and database file), a SQLite-based development environment and a production configuration using PostgreSQL.

Node environments

Node (and other tools in its ecosystem) distinguishes between test, development and production environments, as they often need different configurations. So far, this has largely been transparent to us. The tools we use, e.g, Next and Jest, automatically set the correct environment for us (e.g., setting “development” mode when running npm run dev). Here, we will provide different configurations for Knex depending on the current environment and use the NODE_ENV environment variable to set the current environment (and the process.env.NODE_ENV variable in our code to determine the current environment).

/* eslint-disable import/no-extraneous-dependencies */
const { loadEnvConfig } = require("@next/env");

// Adapted from NextJS knex example
const dev = process.env.NODE_ENV !== "production";
const { DATABASE_URL } = loadEnvConfig("./", dev).combinedEnv;

const defaultSettings = {
  migrations: {
    directory: "./knex/migrations",
  },
  seeds: {
    directory: "./knex/seeds",
  },
};

module.exports = {
  test: {
    ...defaultSettings,
    client: "sqlite3",
    connection: ":memory:",
    useNullAsDefault: true,
    seeds: {
      directory: "./knex/seeds/test",
    },
  },

  development: {
    ...defaultSettings,
    client: "sqlite3",
    connection: {
      filename: "./simplepedia.db",
    },
    useNullAsDefault: true,
  },

  production: {
    ...defaultSettings,
    client: "pg",
    connection: {
      connectionString: DATABASE_URL,
      ssl: true,
    },
  },
};

Recall that migrations are how we automatically configure the database. Each migration has two parts, the “up” function which makes the desired changes to the database, e.g. create a table or add a column, and the “down” function which reverts those changes. For SQLite, running the migration will also create the database file if it doesn’t exist.

Create a skeleton for a migration with 💻 npx knex migrate:make articles. This command will create a new folder called knex/migrations, in which you will find your new migration file with articles in the name (note the date in the file name so the current state of the database schema can be tracked). You will need to fill in the exports.up (which should create your table) and exports.down (which should delete or “drop” the table) using the Knex schema builder API.

In the database, you want to create a table named Article with the four columns that correspond to the Article properties. Recall from the programming assignments that the four properties are: id, title, contents, and edited. The id field should be of type increments, an auto-incrementing integer, to ensure that each article has a unique id number (Knex automatically sets the increments column as the primary key, but we will be explicit here as an example). The remaining columns should be string or text as shown below. The difference between string and text is the intended size: string is typically of shorter, fixed, length, while text implies a longer, variable-length, string. The actual implementation will depend on the underlying database (e.g. SQLite has only variable length strings while PostgreSQL enforces that strings are limited to 255 characters).

/* eslint-disable func-names */
exports.up = function (knex) {
  return knex.schema.createTable("Article", (table) => {
    table.increments("id").primary();
    table.string("title");
    table.text("contents");
    table.string("edited");
  });
};

exports.down = function (knex) {
  return knex.schema.dropTableIfExists("Article");
};

When defining the schema, you can specify additional constraints on the columns (as chained methods). What additional constraints should exist on the articles? Can there be articles with identical titles? Can the edited time be null? Add additional constraints to your schema.

👀 View a more specific version of the schema

Notice that in this expanded schema we are enforcing that all titles must be defined (notNullable) and unique (no duplicate titles) and similarly the edited time must be defined. The database is the most effective and efficient tool for enforcing these constraints. If we were to enforce uniqueness outside the database we would need to perform another query prior to any update. Instead the database can verify uniqueness as part of the insert or update operation. More generally, the more validations we can build into the schema, the greater confidence we have in the integrity of the data (since it couldn’t be saved in the database without satisfying those constraints). We don’t have worry, for instance, that we missed checking a validation in an API function somewhere…

table.increments("id").primary();
table.string("title").unique().notNullable();
table.text("contents");
table.string("edited").notNullable();

Once you have completed your migration, run any un-performed migrations (in this case just one) with the following command. Recall from class that migrations are only applied once, that is if you invoke the following command multiple times, the migration will only actually be performed once. Doing so ensures we don’t corrupt our database. But that also means that if modify a migration that has already been applied and migrate again, nothing will happen. If we want to reapply a migration we first need to “rollback” with the migrate:rollback command (invoking the down function in our migration script) before re-applying the migration command below. See additional note in the FAQ below about some subtleties of rollback.

💻 npx knex migrate:latest

The migration should have created a simplepedia.db file. If you have the sqlite3 client application installed (may depend on your platform) you can view the database directly. If not, skip the following command. Open the database with the SQLite client application via 💻 sqlite3 simplepedia.db (Ctrl+d or the .exit command to exit) and execute .schema at the interpreter prompt to view the tables created by the migration. You should see the Article table you just defined along with the tables that Knex creates to track which migrations have been applied. Here is what I see:

% sqlite3 simplepedia.db 
SQLite version 3.40.0 2022-11-16 12:10:08
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE `knex_migrations` (`id` integer not null primary key autoincrement, `name` varchar(255), `batch` integer, `migration_time` datetime);
CREATE TABLE sqlite_sequence(name,seq);
CREATE TABLE `knex_migrations_lock` (`index` integer not null primary key autoincrement, `is_locked` integer);
CREATE TABLE `Article` (`id` integer not null primary key autoincrement, `title` varchar(255) not null, `contents` text, `edited` varchar(255) not null);
CREATE UNIQUE INDEX `article_title_unique` on `Article` (`title`);
sqlite> 

Seeding the Article Data

Create the Seed File

Seed files are short scripts that populate the database. Create a skeleton seed file with:

💻 npx knex seed:make load-articles

The above command should have created a knex/seeds/load-articles.js file. Modify that file to read in the article from the data/seed.json file and insert it into the database. For simplicity we will use Node’s synchronous file I/O interface. Update the code generated by Knex to the following. Note that we use a slightly different import structure because this code is running outside of our application infrastructure and so is not transpiled to the import style expected by Node.

/* eslint-disable func-names */
/* eslint no-unused-vars: ["error", { "args": "none" }] */
const fs = require("fs");

exports.seed = function (knex) {
  const contents = fs.readFileSync("./data/seed.json");
  const data = JSON.parse(contents);

  // Deletes ALL existing entries
  // Use batch insert because we have too many articles for simple insert
  return knex("Article").del()
    .then(() => knex.batchInsert("Article", data, 100));
};

If you used schema the specified above, the column names match the property names of the objects in the data Array. Those objects can be passed directly to the insert function (recall that the database itself will generate the id property). However, we have too much data to just pass the array to insert, instead we need to use Knex’s batchInsert utility to insert the article in blocks (in this case in blocks of 100 articles).

Run the Seed

Run the seed with 💻 npx knex seed:run to populate your database. Optionally, if you reopen your database with the SQLite3 command-line tool you can view the articles. Execute the following command in the SQLite interpreter to view 10 articles (the semi-colon is required and the keywords in SQL are not case-sensitive).

select * from Article limit 10;

Here is what I see:

% sqlite3 simplepedia.db 
SQLite version 3.40.0 2022-11-16 12:10:08
Enter ".help" for usage hints.
sqlite> select * from Article limit 10;
Ikoga|Ikoga is a village in North-West District of Botswana. It is located close to Okavango Delta and has a primary school. The population was 699 in 2001 census.|2016-12-07T23:22:33.357Z
543 Charlotte|543 Charlotte is a minor planet orbiting the Sun. It was discovered by Paul Götz on September 11, 1904 in Heidelberg.|2016-12-12T02:46:56.549Z
9 (New York City Subway service)|9 was a designation given to several services of the New York City Subway.|2016-12-21T23:52:24.933Z
Allercombe|Allercombe is a hamlet in east Devon, England. It lies just south of the A30 road between the villages of Whimple and Aylesbeare.|2016-11-21T04:49:57.991Z
Nepenthes kongkandana|Nepenthes kongkandana is a tropical pitcher plant endemic to Songkhla Province in southern Thailand. It is closely related to N. kerrii.|2016-12-27T01:50:39.714Z
Alexey Kavokin|Alexey V. Kavokin (born 7 March 1970 in Leningrad) is a Russian and French theoretical physicist and writer.|2016-11-11T02:32:05.226Z
Thomas Van Lear|Thomas Van Lear (April 26, 1869 – March 4, 1931) was the Socialist Mayor of Minneapolis from January 1, 1917 to January 6, 1919.|2016-11-29T22:50:11.578Z
Bulinus globosus|Bulinus globosus is a species of a tropical freshwater snail with a sinistral shell, an aquatic gastropod mollusk in the family Planorbidae, the ramshorn snails and their allies.|2016-11-02T06:40:19.196Z
Love, Illinois|Love is an unincorporated community in Bureau County, Illinois, United States, located on Illinois Route 92, south of Tampico.|2016-12-28T11:11:30.718Z
Jesuit High School (Sacramento)|Jesuit High School of Sacramento is a Catholic High School which enrolls about 1,000 young men from throughout greater Sacramento Valley in California.|2016-12-25T03:47:30.404Z

Implement the Server

Connect to the database

Before you can use the database in your API handlers, you need a way to initialize Knex. Create a file knex/knex.js (a file named knex.js in the knex directory) that we can use to create an initialized knex instance. Copy in the following code adapted from the NextJS Knex example to create a knex instance that persists across hot reloads. Note the common pattern in which we use process.env.NODE_ENV to determine the environment (e.g. production, development, etc.) or default to development.

import createKnex from "knex";
import knexConfig from "../knexfile";

/**
 * Global is used here to ensure the connection is cached across hot-reloads in development. The approach
 * is adapted from: https://www.prisma.io/docs/guides/database/troubleshooting-orm/help-articles/nextjs-prisma-client-dev-practices
 */

const globalForKnex = global;

export const knex =
  globalForKnex.knex ||
  createKnex(
    knexConfig[process.env.NODE_ENV || "development"]
  );

if (process.env.NODE_ENV !== "production") globalForKnex.knex = knex;

Serving a single article

In src/pages/api/articles/[id].js we will implement the GET /api/articles/:id endpoint. First, import the knex object from the knex.js file you just created. Note that we use the curly brackets style, or “named” import, because knex is not a default export for that file. ../../../../knex/knex is the relative path to the knex.js file (recall “..” is go up to the parent directory).

import { knex } from "../../../../knex/knex";

Where indicated by the TODO delete the existing placeholder response code (added to facilitate testing before you have implemented all the routes). As a general note, you can only respond once per handler and so leaving the existing response will generate an error. Then implement code to return a single article with the corresponding id (i.e., with req.query.id). In order to fetch a single article you want to perform a select query with a where clause. The where clause will filter the returned articles to just those that match the id in the URL. The first method extracts just the single result. The Knex Query builder creates a Promise so we use make the handler function async and use await.

const article = await knex("Article").where({ id: req.query.id }).first();
if (article) {
  res.status(200).json(article);
} else {
  res.status(404).end(`Article with id ${req.query.id } not found`);
}

As we did before you can test your API implementation using fetch via the browser’s console. Start the development server with 💻 npm run dev then open the application and the browser’s developer tools. In the console, paste and execute a test fetch command. Hopefully you see the expected article!

fetch("/api/articles/4")
   .then(resp => resp.json())
   .then(data => { console.log(data); })
   .catch((error) => console.log(error));

In the test above, we are implementing minimal error handling. We will try to parse any response as JSON, including any error responses. To get more information about any errors, click over to the Network tab in the developer tools and click on the failing request. You want to view the full Response to see the complete error message. To assist in development, the skeleton reports the raw error message in the API response. In a production setting, however, you would not want to send your internal error messages to external clients and instead would likely use some form of internal error logging.

Serving the sections

In src/pages/api/sections.js we will implement the /api/sections endpoint. As before, import the knex object from the knex.js file. Where indicated by the TODO delete the existing placeholder response code and implement code to generate a sorted de-duplicated array of sections (i.e., first letter of the article upper-cased). Here we are going to take advantage of the capabilities of database to implement the de-duplicating and sorting. You can implement this route with:

const sections = await knex("Article")
   .select(knex.raw("UPPER(SUBSTRING(title, 1, 1)) AS section"))
   .distinct()
   .orderBy("section");
res.status(200).json(sections.map(section => section.section));

Here we drop down to “raw” SQL to utilize string functions that are not provided by Knex. As part of the query, we extract and upper-case the first letter of the title as section, then de-duplicate and sort those values. The result is an array of {section: "A"} objects, that we flatten into a a single array. Here we take advantage of the capabilities offered by the RDBMS to simplify our application. Pushing the computation into the DB reduces communication (we are not sending the titles just to only extract the first letter) and could utilize server-side caching to reduce queries and optimizations within the DB query engine itself to speed up the computation.

Enable testing

The skeleton contains a simple test suite from our previous practical. We need to make a few additions to use it with Knex.

Import the knex object from the knex/knex.js file as you did in your API file.
We want to reset the database to a known state before each test so our tests are Independent and Repeatable (recall FIRST). Add the following functions at the beginning of the describe("Simplepedia API") block to rebuild the database before all tests (and cleanup afterwards) and re-seed it before every test. The Jest test runner automatically sets NODE_ENV=test; thus Knex will use the test database you defined in knexfile.js (not the development database you just seeded). These functions have a specific structure. If the function provided to beforeAll, etc., returns a promise, Jest will wait until that promise has resolved before proceeding. We use that feature to ensure the database is initialized before the tests begin. Here we use the “implicit return” version of the fat arrow: these functions return the promises created by the various knex operations.
```
 beforeAll(() =>
   // Ensure test database is initialized before an tests
   knex.migrate.rollback().then(() => knex.migrate.latest()),
 );

 afterAll(() =>
   // Ensure database connection is cleaned up after all tests
   knex.destroy(),
 );

 beforeEach(() =>
   // Reset contents of the test database
   knex.seed.run(),
 );
```

Create knex/seeds/test/load-test-articles.js (i.e., create a new directory test in seeds) with the content below. Note that this is similar to your other seed file except that it loads a smaller set of articles and resets the article id (to ensure a consistent across tests).

 /* eslint-disable func-names */
 /* eslint no-unused-vars: ["error", { "args": "none" }] */
 const fs = require("fs");

 exports.seed = function (knex) {
   const contents = fs.readFileSync("./data/test-data.json");
   const data = JSON.parse(contents);
   // Deletes ALL existing entries and reset the id count. Then use
   // batch insert because we have too many articles for simple insert.
   return knex("sqlite_sequence")
     .where("name", "=", "Article")
     .update({ seq: 0 })
     .then(() => knex("Article").del())
     .then(() => knex.batchInsert("Article", data, 100));
 };

Now run the tests with 💻 npm test. The tests for GETting the sections and a single article should pass, but the others may/should fail.

Note that the tests assume the API works correctly, i.e., it assumes the response is JSON. On errors through we are sending text that can’t be parsed as JSON. Thus the test errors may appear to be JSON parsing errors, e.g.,

Received promise rejected instead of resolved
Rejected to value: [SyntaxError: Unexpected token 'U', "URL and ob"... is not valid JSON]

obscuring the underlying error message. To display the full error message, try inserting console.log(await res.text()) into the test before parsing the JSON to see the full error message. In this case it is “URL and object does not match”.

Many of the failures are related to validation. You could successfully implement the Simplepedia API just using Knex. However, you would need to duplicate any validation code and use Knex’s “low level” query interface. Instead, we will implement the server using the Objection.js ORM to make our routes simpler and implement our validations in an aspect-oriented approach (AOP).

Switching to the Objection ORM

Creating the Model

Install the package and associated validators 💻 npm install --save objection ajv-formats. Create a directory models containing a file BaseModel.js. This file will provide a base class for all our other models that incorporates additional validations into our model schema and binds our Objection.js models to the knex database interface (Objection.js wraps around the Knex library).

/* eslint-disable camelcase */
import { Model, AjvValidator } from "objection";
import addFormats from "ajv-formats";
import { knex } from "../knex/knex";

class BaseModel extends Model {
  static createValidator() {
    return new AjvValidator({
      onCreateAjv: (ajv) => {
        // Make suse all ajv formats are available in validations
        addFormats(ajv);
      },
    });
  }
}

// Bind the Objection model to the Knex connection before use
Model.knex(knex);

export default BaseModel;

Now create models/Article.js file with the following:

/* eslint-disable camelcase */
import { Model } from "objection";
import BaseModel from "./BaseModel";

export default class Article extends BaseModel {
  // Table name is the only required property.
  static get tableName() {
    return "Article";
  }

  // Objection.js assumes primary key is `id` by default

  static get jsonSchema() {
    return {
      type: "object",
      required: ["title"],

      properties: {
        id: { type: "integer" },
        title: { type: "string" },
        contents: { type: "string" },
        edited: { type: "string" },
      },
    };
  }
}

The jsonSchema function specifies the properties of the model using the JSON Schema specification. With just this schema you can implement additional validation, such as numerical limits, etc., beyond what is possible with just SQL. You can further customize the validation methods if needed.

Using your newly created model

Return to api/articles/[id].js to replace the Knex implementation with an implementation using Objection.js. Start by importing your newly created Article model. Note that Article is the default export in its file, so we use the import Article from "..." structure (as opposed to form with the curly brackets). You can now replace your previous Knex-based implementation with Objection.js:

const article = await Article.query()
  .findById(req.query.id)
  .throwIfNotFound();
res.status(200).json(article);

Notice that we no longer need to specifically handle an invalid id. The .throwIfNotFound() raises an error when no article is found, and that error can be handled by our error handling middleware.

Many of the operations we want to perform we can do exclusively with Objection.js. However, as with sections, it is sometimes helpful to “drop down” to SQL. Specifically when you implement the /api/articles route (in api/articles/index.js), we can use the Objection.js query builder to optionally add a WHERE clause to the query that selects articles based the the section parameter. Note the whereRaw below that utilizes the same SQL string handling functions as above. The ? implements parameter insertion (the ? is replaced with req.query.section) and automatic escaping that ensures the query (sent by the client) can’t inject additional SQL on our server (something we will talk about later in the semester).

let query = Article.query();
if (req.query.section) {
  query = query.whereRaw("UPPER(SUBSTRING(title, 1, 1)) = ?", [
    req.query.section,
  ]);
}
const articles = await query;
res.status(200).json(articles);

Now update the rest of the handlers in api/articles/index.js and api/articles/[id].js to work with your newly created Article model (you don’t need to change the implementation api/sections.js, it is easier to implement without Objection.js). The relevant Objection.js query methods will likely be insertAndFetch and updateAndFetchById. These will all need to be called on a query like shown above (e.g., Article.query().insertAndFetch(...)), not on the raw model (Article). All of your routes should be similar and have just a few lines of code. Those methods will require one or more arguments, e.g. the article you will be inserting into the database.

Unfortunately we can’t just use Objection for validation. The id of object sent in the PUT request must match the id in the URL. We will need to check that requirement is satisfied by writing code in our route handler:

const { id, ...updatedArticle } = req.body;
// req.query.id is a string, and so needs to be converted to an integer before comparison
if (id !== parseInt(req.query.id, 10)) {
  // Verify id in the url, e.g, /api/articles/10, matches the id the request body
   res.status(400).end(`URL and object does not match`);
   return;
}
// Update the database ...

Refining Your Validations

With all the routes in place, most, but not all, of the tests should pass. The remaining failing test should be testing what happens if an article is created without an contents field.

Since the id mismatch validation check depended on both the route and the data, we needed to implement it in the route handler. But this test is just about data integrity, thus it is best addressed in the Model. Modify the Article model to include a suitable default (thus fixing that issue everywhere an Article instance might be created) making the test pass (i.e., add a default field to the contents property as described in the JSON Schema documentation).

We would like our API model to be even more robust. At present, we allow any string for the edited time, but we should really only allow valid dates. Practice TDD by writing a test for POSTing an article with an invalid date (in the “POST operations” test suite), and then add to the schema to introduce the necessary constraint. Check out the JSON Schema documentation for string formats, specifically “date-time”.

👀 Show an approach...

First create a possible test for an invalid edited time inside the existing describe section for POST requests:

test("Should reject an article with an invalid edited time", async () => {
  await testApiHandler({
    rejectOnHandlerError: false, // We want to assert on the error
    pagesHandler: articlesEndpoint,
    test: async ({ fetch }) => {
      const res = await fetch({
        method: "POST",
        headers: {
          "content-type": "application/json",
        },
        body: JSON.stringify({ title: "A title", edited: "4" }),
      });
      expect(res.ok).toBe(false);
    },
  });
});

and then add the corresponding constraint in the schema to make that test pass:

edited: { type: "string", format: "date-time" },

Production deployments (and also managing secrets)

While SQLite is a very effective test and development platform (with the caveats mentioned above) it is not well suited for a production environment where we want to support concurrent access, robust backup and other management features (and where we may not have fast, persistent, file storage). Instead we will use PostgreSQL, as provided by https://neon.tech. Neon provides a managed PostgreSQL database accessible from anywhere with a free-tier and no credit card required. To get started:

Sign up for a free account at https://neon.tech
Create a new project
From the Connection Details box, and copy the connection string (it should start “postgresql://”). This string contains all the information needed connect to your newly created database, including the password.
Create a a file named .env.local in the root directory of your practical with:
```
 DATABASE_URL=postgresql://...
```
where postgresql://... is the full URL you just copied from Neon (make sure to copy the version with the actual password, not the “*”s shown by default).

Notice that this “env file” is ignored by Git. It is used to store secrets, like your database credentials, which should not be included in your code (and uploaded to sites like GitHub, etc.) Since this file is not included in version control, each developer will need their own copy (i.e., you will have to distribute this file for your projects separately from your project source code). This file is used to set environment variables that can be accessed by your applications. For example in knexfile.js, the production configuration the connectionString is set from the DATABASE_URL variable set in .env.local. Any secrets, e.g., API keys for third party services, should be managed this way. Application platforms typically have ways to securely set environment variables for your applications (which is how you would set the DATABASE_URL, etc. for your project).

We will migrate and seed the database as before, but now will need to explicitly specify the “production” environment. The default environment is “development”, and so far everything we wanted to do used the “development” environment, so we could largely ignore NODE_ENV. Now we want to make sure we use any production-specific .env files and the knex production configuration. To do so we set NODE_ENV=production before any knex or other commands. To do so in a consistent and platform-independent way we have added the cross-env package to the application (with 💻 npm install -S cross-env). With cross-env, you can use the following approach set environment variables on Windows, OSX and Linux.

💻 npx cross-env NODE_ENV=production npx knex migrate:latest
💻 npx cross-env NODE_ENV=production npx knex seed:run

Using executables installed by NPM (or why so many `npx` commands)

Why are there so many npx commands above? npx is a program distributed alongside npm that executes the command provided to it from the local NPM packages, e.g., from node_modules/.bin(or by downloading the package). In this context, the first npx is saying use the cross-env program installed as part of this package (i.e., specified in its package.json file). The second is saying use the knex program installed as part of this package. The whole commands above should really be read as 3 commands: 1) execute cross-env ... using the locally installed packages, 2) which executes npx knex ... with NODE_ENV set to production, 3) which finally executes knex migrate:latest using the locally installed packages (and the environment set by cross-env).

Platform Note

In other examples/documentation/tutorials you might see the following approach for setting the environment. But this approach only works on Linux/OSX and if you are using BASH on Windows (a less common situation):

💻 NODE_ENV=production npx knex migrate:latest
💻 NODE_ENV=production npx knex seed:run

To minimize platform-related issues we use and recommend cross-env. For Windows users, I recommend installing the package globally, independent of any particular project (💻 npm install -g cross-env), so you can use it everywhere.

You should now be able to see the article data on Neon! You can now build and run the production version of your application (which will use your newly populated database) with the commands below.

💻 npm run build
💻 npm run start

Open the URL and test your API and database by creating a new article. Paste the following in the browser console.

fetch("/api/articles", { 
   method: "POST",
   headers: { "content-type": "application/json" }, 
   body: JSON.stringify({
      title: "A new article",
      contents: "Article body",
      edited: "2016-11-19T22:57:32.639Z",
   }),
})
  .then(resp => resp.json())
  .then(data => console.log(data))
  .catch((error) => console.log(error));

You should see a response with the newly assigned id and can also view your new inserted article on neon by executing the following query in the SQL Editor:

SELECT * FROM "Article" WHERE title='A new article';

Using an ORM to facilitate associations

So far there likely doesn’t seem to be much difference between the Objection implementation and the implementation with plain Knex. The differences start to show when we introduce relations (also termed associations) between the models. Let’s imagine we wanted to extend Simplepedia with a feature to explore related articles, that is define a “many to many” relation between articles. We will do so with a join table in our RDBMS that contains references to two articles.

Start by creating another migration to add the relevant table 💻 npx knex migrate:make related_articles. Because this migration has a later date than the migration you created above, it will be applied second. Modify the generated file to create the table. Similar to our examples in class, we want to create a new table RelatedArticle with two foreign key references. Since each related combination can only appear once, we will use the two columns together as a composite primary key. Once you have implemented the migration, apply it as you did before.

/* eslint-disable func-names */
exports.up = function(knex) {
  return knex.schema.createTable("RelatedArticle", (table) => {
    table.integer("articleId").references("id").inTable("Article").onDelete("CASCADE");
    table.integer("relatedId").references("id").inTable("Article").onDelete("CASCADE");
    table.primary(["articleId", "relatedId"]);
  });
};

exports.down = function(knex) {
    return knex.schema.dropTableIfExists("RelatedArticle");
};

Next, add the relation to the Article model in models/Article.js. Relations are specified with a static object in the Model class named relationMappings. Each of the different relation types requires different information. Here we specify the name of the relation as the key (the name will be used in our queries), related in this case, and the type and links of the relation.

static relationMappings = {
   related: {
   relation: Model.ManyToManyRelation,
   modelClass: Article, // eslint-disable-line no-use-before-define
   join: {
      from: "Article.id",
      through: {
         // RelatedArticle is the join table. These names must match the schema
         from: "RelatedArticle.articleId",
         to: "RelatedArticle.relatedId",
      },
      to: "Article.id",
   }
   }
};

Now that the relation is specified in the model, we can perform “graph” queries to fetch, insert, etc. related articles. In our API we now want to return related articles when we fetch a specific article, that is the fetch article object should contain an additional property related that is an array of article objects. In keeping with TDD, start by adding the test below (you will need to import Article in the test file before doing so). Here we create a test relation before the test (and clean it up afterwards). This test should be added within the existing top-level describe as it relies on the setup and teardown code you implemented before.

describe("Related articles test", () => {
  beforeEach(() =>
    // Create test example by setting related article (by id)
    Article.relatedQuery("related").for(1).relate(2),
  );

  afterEach(() =>
    // Clear relation join table
    knex("RelatedArticle").del(),
  );

  test("GET /api/articles/[id] should return related articles", async () => {
    await testApiHandler({
      rejectOnHandlerError: true,
      pagesHandler: articleEndpoint,
      params: { id: 1 }, // Testing dynamic routes requires params or patcher
      test: async ({ fetch }) => {
        const res = await fetch();
        await expect(res.json()).resolves.toMatchObject({ ...data[0], related: [data[1]]});
      },
    });
  });
});

Add .withGraphFetched("related") to the query builder in your GET /api/articles/[id] handler to also fetch the related models. Try inserting a console.log in your API handler to see the structure of the returned object. At this point your new test (and the rest of the tests) should pass!

This is just the most basic usage of the Objection’s relations, but hopefully it gives you a sense of the possibilities and how to get started. Assembling complex objects from across multiple tables is now possible with just small additions to your query!

Developing and testing like production

So far we have used a different environment for testing (SQLite) than we intend to use in production (PostgreSQL). While similar, the tools are not identical (see the FAQ entry below), and specifically, SQLite is more permissive than PostgreSQL. In the past, those differences have been a frequent source of trouble. To avoid inconsistencies, to the extent possible, we want to develop and test using the same tools and environment as our production deployment (note that doesn’t mean testing on our production deployment). To that end, we will transition our development and testing to use PostgreSQL. Unlike SQLite, which is a library, PostgreSQL runs a separate process (server). To make it easier to install the tools and run that process in a cross-platform way, we will do so as a Docker container that we can stop and start as needed.

Prerequisites Note

The following approach requires Docker to be installed. If you haven’t done so, please return to that section of the getting started instructions. If your computer is unable to run Docker, please see the instructor about a loaner machine or possible workarounds.

Development

Setting up PostgreSQL for development requires the following steps:

Define a Docker compose file docker-compose.dev.yml, with instructions for configuring the container running PostgreSQL
Add a .env.development.local file specifying connection string for our development database server (that will run in the Docker container)
Change the knexfile.js development environment to use PostgreSQL
Update the package.json file to automatically start (and stop) the PostgreSQL container when we run npm run dev.

Docker Compose

At the top-level of your project, create a file docker-compose.dev.yml with the following. This specifies that we want to launch the postgres:16 image (with PostgreSQL 16 pre-installed) with the provided configuration. In short, this specifies that the container should expose port 5432 (the port typically used for PostgreSQL) and store its data in the ./postgres-data directory, within your project, on the local host. That is although the databases is running in a ephemeral container, it is storing its data in a directory on the host machine (mounted into the container). This ensure the data is persistent, even if we recreate the container. Since this directory is managed by PostgreSQL, we don’t want to include it in our git repository. Make sure to add postgres-data to your .gitignore file, if not already present.

services:
  db:
    image: postgres:16
    restart: always
    environment:
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_DB=postgres
    ports:
      - "5432:5432"
    volumes:
      - ./postgres-data:/var/lib/postgresql/data
volumes:
  postgres-data:
    driver: local

Local environment

We previously created an environment file for our production PostgreSQL connection string. We will do the same for our new development set, using the values we specified in our Docker Compose configuration. Create a file .env.development.local in the top-level of your project with the following. As before, the file name should be greyed-out in VSCode, indicating that this file, since it contains secrets, is excluded from the git repository. Unlike .env.local, this file is only used in the development environment (as its name indicates), and it takes priority over the generic environment file.

DATABASE_URL=postgres://postgres:postgres@localhost:5432/postgres

Change `knexfile.js`

Now that we are going to use PostgreSQL for development (instead of SQLite), we want to change the development configuration in our knexfile.js, accordingly. Modify the development section to be:

development: {
  ...defaultSettings,
  client: "pg",
  connection: {
    connectionString: DATABASE_URL,
  }
},

Update `package.json` to start and stop the database server

In our DevOps approach, we want to automate as much of the operations work as possible. To that end, we want to be able to automatically start and stop the database server when we run the development server. Specifically we want to start the database server before we run the development server and automatically stop it when we are done. Add or update the following scripts in your package.json file. The new “predev” and “postdev” scripts automatically run before and after the “dev” script, e.g., when you execute npm run dev. These start (“up”) and stop (“down”) the containers specified in the docker-compose.dev.yml file. The change to “dev”, inserts a small cross-platform runner script to catch Ctrl+c when running the development server so that the database server shuts down gracefully. This is not a built-in feature, but is enabled by the dev-runner.js script included in the skeleton. This approach ensures our skeleton works on both Windows and OSX/Linux.

"predev": "docker compose -f docker-compose.dev.yml up -d",
"postdev": "docker compose -f docker-compose.dev.yml down",
"dev": "node ./dev-runner.js next dev",

Running the development server

At this point you should be able to run 💻 npm run dev as usual (it may take a while the first time as Docker downloads the necessary images). But now, it will automatically start the database server beforehand and stop it when to press Ctrl+c. Before you test your application, remember that you need to migrate and seed the database. While the development server is running (and thus the database server is running), execute the following (hopefully now familiar) Knex commands in another terminal window.

💻 npx knex migrate:latest
💻 npx knex seed:run

These commands will pick up the new “development” configuration to migrate and seed the PostgreSQL database! You should now be able to run the the same test commands as described above in the browser console.

Viewing your Postgres databases

Neon’s web application provides helpful graphical tools for querying your database and viewing the resulting data (but is only accessible to the account that owns the database). There are both command line and graphical applications you can install to do the same locally. One tool to tryout is this PostgreSQL extension for VSCode. This extension enables you to explore your database schema and easily run queries from within VSCode. Install the extension. While the development server, and thus the database is running (in its Docker container), create a new connection by clicking on the “+” in the PostgreSQL Explorer tab (a picture of an elephant) or running the pallette command PostgreSQL: Add Connection. Enter the information from your .env.development.local file, e.g. the hostname is localhost, the user is postgres, their password is postgres, the port is 5432, you want a standard (non-SSL) connection, and the database is also named postgres. Once connected you can explore the tables and their schema, and by right clicking on a table run pre-specified or new queries. As an example, right click on the Article table, select “New Query” and run the following (by right clicking in the query window and selecting “Run Query”) to produce an ordered list of sections:

SELECT DISTINCT UPPER(SUBSTRING(title, 1, 1)) AS section FROM "Article" ORDER BY section;

You can use this extension with your Neon-hosted database as well. All the relevant information for creating a new connection is in the connection string you copied from the Neon website, which is formatted as

postgresql://username:password@host/database?sslmode=require

Use the default port of 5432 and select a secure connection (indicated by the sslmode=require at the end of the connection string). You should now be able to similarly explore and query that database as well!

Testing

As noted above, we want to test in a environment that is as similar as possible to our production environment. To do so, we will similar modify our Knex configuration to use PostgreSQL during testing. Unlike development, we want to create independent (and ephemeral) databases for each test file (recall “FIRST”). To do so, we will use the testcontainers library to automatically launch a PostgreSQL container as part of our test setup.

Setting up PostgreSQL for testing requires the following steps:

Change the knexfile.js test environment to automatically create a PostgreSQL container(s)
Update the test seed script(s) to produce a consistent configuration for PostgreSQL (not just SQLite)

Change `knexfile.js`

Install the new development dependencies for automatically creating containers:

💻 npm install -D testcontainers @testcontainers/postgresql

Now update the “test” section of your knexfile.js with the code below. The Knex connection can be a function, not just a static specification. Here we use this function to automatically create a new, and independent, database server for the tests.

test: {
  ...defaultSettings,
  client: "pg",
  connection: async () => {
    // Only import testcontainers when running in a test environment
    const { PostgreSqlContainer } = await import("@testcontainers/postgresql");

    // Create a new container for each connection, i.e., for each test file
    // being run in parallel. These containers are automatically cleaned up
    // by test containers via its ryuk resource reaper.
    const container = await new PostgreSqlContainer("postgres:16").start();
    return {
      host: container.getHost(),
      port: container.getPort(),
      database: container.getDatabase(),
      user: container.getUsername(),
      password: container.getPassword(),
    };
  },
  seeds: {
    directory: "./knex/seeds/test",
  },
},

The first time we run the tests, PostgreSqlContainer will download the relevant images. That can be slow and lead to test timeouts. To prevent that we will pre-download the images with the following commands (you should only need to do this once). Note, you want want to add the same commands to any GitHub actions scripts so that your CI tests don’t timeout.

💻 docker pull postgres:16
💻 docker pull testcontainers/ryuk:0.5.1

Update the seeds

You previously created a knex/seeds/tests/load-test-articles.js file. It contained SQLite specific code for resetting the auto-incrementing id column. We want that seed to work for both SQLite and PostgresSQL. Replace the current return statement with the following. Here we detect the type of database and reset the auto-increment appropriately.

// Deletes ALL existing entries and reset the id count. Then use
// batch insert because we have too many articles for simple insert.
const { client } = knex.client.config;
if (client === "sqlite3") {
  return knex("sqlite_sequence")
    .where("name", "=", "Article")
    .update({ seq: 0 })
    .then(() => knex("Article").del())
    .then(() => knex.batchInsert("Article", data, 100));
} if (client === "pg") {
  return knex.raw('ALTER SEQUENCE "Article_id_seq" RESTART WITH 1')
    .then(() => knex("Article").del())
    .then(() => knex.batchInsert("Article", data, 100));
} 
  return knex("Article").del()
    .then(() => knex.batchInsert("Article", data, 100))

At this point you should be able to run the tests again, e.g., 💻 npm test, but using and PostgreSQL (and they should still be passing!). If you keep the Docker dashboard open, you should see the containers get launched automatically (and cleaned-up automatically). If you get test failures due to a timeout error, especially the first time, try running the tests again (as it the first run can be slow when loading images the first time). Alternately, you can increase the timeout for the beforeAll and beforeEach functions by providing a second argument (in milliseconds). For example, to increase the timeout to 20 seconds from the default of 5 seconds:

beforeAll(() => ..., 20000 /* Wait for 20 seconds */);

Finishing up

Commit any changes you may have made and then push your changes to GitHub. You should then submit your repository to Gradescope. Note that due to limitations in Gradescope (your code is already run in a Docker container), it will replace your knexfile.js to test against SQLite..

Grading

Required functionality:

Reimplement the Simplepedia API with Knex and Objection.js
Add additional validations and associated test
Implement related article many-to-many relation
Pass all tests
Pass all ESLint checks

Recall that the Practical exercises are evaluated as “Satisfactory/Not yet satisfactory”. Your submission will need to implement all of the required functionality (i.e., pass all the tests) to be Satisfactory (2 points).

RDBMS FAQ

Interactions with the database are a common source of questions during the project. This practical introduces many of the key ideas, but is necessarily basic usage. Some frequent questions that have arisen as you move beyond this practical:

Several of the tests in this practical relate to article validations, e.g., preventing duplicate article titles. Recall from class that we want to implement these validations once, but have them applied everywhere we create new instances of a model (via “aspect-oriented programming”). We can do so with a combination of database schema and schema in the Objection.js models. Check out the “collapsing” links above (“View a more specific schema…” and “Show an approach…”), which show more specific database and Objection.js model schemas.

In the specific case of uniqueness, that constraint is best enforced by the database itself (since we need to know all the other titles too). Adding the unique constraint with Knex automatically creates the relevant index in the underlying database to efficiently enforce uniqueness of the specified fields. Thinking through and adding these constraints (to your database/models) up front can help maintain data integrity and prevent future errors related to invalid assumptions, e.g. that all usernames are unique. In your projects, I encourage you to think about what fields must be unique (e.g. user’s e-mails), what fields must be defined, etc.

It worked with SQLite but not PostgreSQL. Why?

SQLite is purposely permissive about types, i.e., if you specify an integer but provide a string, SQLite will happily store the string. This is not true for other database engines, e.g., PostgreSQL. That is why we try to test using the same tools as production.

I was able to seed my database, but later got uniqueness errors when inserting new data…

Do you have the id, or more generally the primary key, specified in the seed for a table with an increment (auto-incrementing) primary key? If so, inserts with the id specified do not trigger the increment, thus the internal counter remains un-changed, e.g., still 0. Subsequent inserts will use that counter value and thus may then end up duplicate ids. Two possible solutions: remove the id from the seed so counter is incremented during seeding (the recommended approach), or to manually set the counter to the appropriate value during migration.

I removed the `id`s from seed but now don’t know how to link my data during seeding

Seeding data with correct associations can be tricky. For the reasons in the answer above, it can be tricky to pre-specify the ids (and thus the relations) in seeding. Instead we want to mimic the way relationships would be created during regular use. Not all seeding has to be done from JSON files (as we have done so far), sometimes it is easier and more straight forward to implement individual queries. For example, to add a “related article” relationship during seeding with knex, you could add this then to the promise “chain” in the seed function. This queries for the articles by the titles, and then uses the ids to create an entry in the join table. If you wanted to do this on a larger scale, you could create JSON file describing the relationships via queryable parameters, e.g., titles, the perform the queries and insertions specified by that data.

.then(async () => {
  const article1 = await knex("Article").where({ title: "Ohio State Route 761"}).first();
  const article2 = await knex("Article").where({ title: "Northern Highway (Victoria)"}).first();
  await knex("RelatedArticle").insert({ articleId: article1.id, relatedId: article2.id});
});

This is a problem where the ORM can also help you. You can insert multiple related entries at once via Objection.js’ insertGraph or other similar insertion methods. In that approach you can simultaneously insert related model instances and the correct foreign keys will be set automatically. Alternately you could query for one side of the relation and use that value to insert or relate data via $relatedQuery (check out relation queries for more information). We could perform the same insertion as above using Objection by using the $relatedQuery and relate methods (the latter because we are relating existing data, not inserting new data). Note, to import your Objection.js modules in your knex seed script you need to enable ES modules imports in Node by adding the --esm option to your knex command, e.g., 💻 npx knex seed:run --esm. You will also want to rebind the models to knex to ensure they are using the knex object provided as an argument to the seed function. Do so by importing Model, e.g. import { Model } from "objection"; and setting the knex instance at the beginning of the seed function with Model.knex(knex);.

const fs = require("fs");
// To use import in knex command add the --esm option (to enable ES modules)
import { Model } from "objection";
import Article from "../../models/Article"

exports.seed = function (knex) {
  Model.knex(knex);
  
  // Original seed code ...
  .then(async () => {
    const article1 = await Article.query().where("title", "Ohio State Route 761").first();
    const article2 = await Article.query().where("title", "Northern Highway (Victoria)").first();
    await article1.$relatedQuery("related").relate(article2);
  });
};

I am getting errors about circular/cyclical dependencies in my Objection.js models

When we specify both directions of an association, we can create import loops (also called circular dependencies or circular imports) where model A imports model B, but model B also imports model A (e.g., because they are two sides of a relation). This might manifest with ESLint warnings or errors when you run your code. If you observe this problem, one question to ask is if you really need both sides the relation, i.e., do you perform queries in both “directions”. If not, perhaps you can leave the relation and the import out of one of the models. While the Objection.js documentation suggests several approaches if the circular dependencies are unavoidable, they don’t seem to work within Next, or don’t work consistently. One approach that seems to work more robustly is described here and implemented in this example. Instead of a static value, make relationMappings a function, e.g., the example above becomes the following (notice we are creating a function that returns an object).

static relationMappings = () => ({
  related: {
    relation: Model.ManyToManyRelation,
    modelClass: Article,
    join: {
        from: "Article.id",
        through: {
          from: "RelatedArticle.articleId",
          to: "RelatedArticle.relatedId",
        },
        to: "Article.id",
    }
   }
});

Our approach to testing downloads the relevant Docker images when they are first used. To reduce the likelihood for timeout you can “pre-pull” the images as we do above, by adding the following as a step your GitHub actions file before you run the tests.

- name: Pull test docker images
  shell: bash
  run: |
    for each in {"postgres:16","testcontainers/ryuk:0.5.1"}; do docker pull "$each"; done

I am rolling back my migrations but still not seeing the desired schema

By default knex migrate:rollback only “rolls back” the last batch of migrations, which may not be all the migrations. To roll back all migrations, use the --all option, e.g., knex migrate:rollback --all.

I am seeing “database system is ready to accept connections” when using postgres for tests

The only method I found to combat this error is to go to the troubleshoot menu in docker desktop, clean and purge all data, and restore factory settings.

Practical: Implement an RDBMS Server for Simplepedia

Prerequisites

Background

Setup Knex

Node environments

Seeding the Article Data

Create the Seed File

Run the Seed

Implement the Server

Connect to the database

Serving a single article

Serving the sections

Enable testing

Switching to the Objection ORM

Creating the Model

Using your newly created model

Refining Your Validations

Production deployments (and also managing secrets)

Using executables installed by NPM (or why so many `npx` commands)

Platform Note

Using an ORM to facilitate associations

Developing and testing like production

Prerequisites Note

Development

Docker Compose

Local environment

Change knexfile.js

Update package.json to start and stop the database server

Running the development server

Viewing your Postgres databases

Testing

Change knexfile.js

Update the seeds

Finishing up

Grading

RDBMS FAQ

I am failing tests related to duplicate titles and other validations

It worked with SQLite but not PostgreSQL. Why?

I was able to seed my database, but later got uniqueness errors when inserting new data…

I removed the ids from seed but now don’t know how to link my data during seeding

I am getting errors about circular/cyclical dependencies in my Objection.js models

I am getting timeout errors in my GitHub actions for tests related to the database

I am rolling back my migrations but still not seeing the desired schema

I am seeing “database system is ready to accept connections” when using postgres for tests

Change `knexfile.js`

Update `package.json` to start and stop the database server

Change `knexfile.js`

I removed the `id`s from seed but now don’t know how to link my data during seeding