Automating Book Metadata with Airtable and the Google Books API

I read a lot. Not just mainstream fiction or business books — a lot of light novels, web novels, and Chinese translated series that don’t show up properly on Goodreads, and definitely not in Calibre’s catalogues. Every tool I tried either had patchy coverage or wanted me to conform to the kind of library it expected me to have.

So I built my own.

The core of it is an Airtable base I use as my personal book library. One record per book, structured the way I want it — authors, publisher, page count, cover image, rating, description, publication date. What I didn’t want to do was fill all of that in manually, every time, for every book. So I built a small Node.js script to do it for me.

How it works

The workflow is simple by design. I add a row to my Airtable base and enter the ISBN. That’s it — nothing else on my end. When I’m ready to sync, I run npm start locally and the script handles the rest.

It pulls all records from a filtered Airtable view called “Books Update” — any record where Status is empty shows up there. For each one, it looks up the ISBN against the Google Books API, maps the response to my Airtable fields, and PATCHes the record in a single API call. Once written, it sets Status to Synced, which removes the record from the view automatically. Clean loop.

The fields it populates: title, authors, publisher, description, published date, page count, cover image, rating (rounded to Airtable’s 1–5 scale), and a maturity flag.

The fallback problem

The first thing I ran into was that Google Books simply doesn’t have everything. It’s great for anything published through a major Western imprint, but older titles, non-English books, and a lot of the light novel and translated series I wanted to track either weren’t there or returned garbage results.

So I added Open Library as an automatic fallback. Google Books is tried first. If it returns no result, Open Library gets a shot. Both APIs use ISBNs as the lookup key and both map to the same set of fields — so the script handles both transparently and I don’t need to think about which source is being used.

The one meaningful difference: Open Library doesn’t carry rating data, so the rating field stays empty for books that fall through to the fallback. That’s a reasonable tradeoff — better no rating than a wrong one.

The Airtable field types problem

This was the part that took more debugging than I expected. Airtable has strict opinions about how different field types accept data — and the API error messages aren’t always helpful about exactly what it’s rejecting.

The main issues:

multipleSelects fields (Author, Publisher) need arrays, not strings. If I passed a single string, it rejected it. If a new author name didn’t already exist as an option in the field, it rejected it. The fix for the second part was including "typecast": true in every PATCH request — this tells Airtable to create new select options automatically rather than throwing a 422.

Date fields needed normalisation. Google Books returns dates in at least three formats: "2023", "2023-05", or "2023-05-15". Open Library uses natural language: "March 2001", "2001", "March 15, 2001". Airtable only accepts YYYY-MM-DD. I wrote a formatDate() helper that handles all the variants and normalises them before writing.

Attachment fields (Covers) need to be wrapped as [{ url: "..." }] — an array of objects with a url key, not just the URL string.

None of these are complicated once you know the rules, but none of them are obvious from the Airtable docs either.

What I ended up with

A script that takes about ten seconds to run, populates every field I care about, and handles errors per-record so one dodgy ISBN doesn’t abort the whole batch. The console output tells me for each book whether it succeeded (✓), failed (✗), or had a warning (⚠), and which data source was used — useful when a book consistently falls through to Open Library.

It’s not a product. It’s not something I’m planning to package and distribute. It’s a personal tool that solved a personal problem — and it’s the kind of thing I keep building because the satisfaction of not doing something repetitive manually is genuinely worth an afternoon of work.

The code is on GitHub if you want to take a look or adapt it for your own library.