Data Methodology
Bill Taylor’s Show Archive
Overview
This site is the result of a hybrid human-curated + AI-assisted archival workflow designed to document, verify, and present decades of live-music performance history in a way that is accurate, maintainable, and future-proof.
Rather than relying on a traditional database or CMS, the project intentionally uses simple, durable tools—spreadsheets, static hosting, and lightweight serverless logic—combined with AI to accelerate data gathering, reconciliation, and presentation.
1. Data sources & initial collection
The foundation of the archive is primary and secondary historical data, including:
- Personal show logs and band records
- Archive.org live recordings and show notes
- Setlist.fm entries
- Venue websites and calendars
- Press articles, flyers, and posters
- Personal recollection cross-checked against public records
Role of AI in data gathering
AI was used extensively in the initial discovery and normalization phase, including:
- Locating potential show dates across public sources
- Parsing unstructured text (archive notes, press articles, flyers)
- Proposing likely band lineups from notes and patterns
- Normalizing venue names, cities, and states
- Identifying duplicates and conflicts
Important: AI suggestions were treated as proposals, not facts. Every show entry was reviewed and either confirmed, corrected, or discarded by a human before being considered canonical.
2. Canonical data model (Google Sheets)
All authoritative data for the site lives in a single Google Sheet, which serves as the project’s source of truth.
Each row represents one performance and includes (where available):
- Date
- Band
- Venue
- City / State
- Personnel (musicians, separated by
|)
- Recording links (Archive.org, etc.)
- Poster / media links
- Notes and annotations
Why a spreadsheet?
- Human-readable and easy to edit
- Version history and change tracking
- No vendor lock-in
- Works equally well for manual edits and AI-assisted batch updates
AI continues to assist with validation, backfills, and anomaly checks—but the spreadsheet remains authoritative.
3. Desired site workflow (design philosophy)
From the beginning, the site was designed around a few guiding principles:
- No manual web edits required — all updates happen in the spreadsheet.
- No fragile admin UI — fewer moving parts means fewer failures.
- Every build is lockable — versions can be frozen, archived, and rebuilt later.
- Transparency over perfection — methodology matters more than claims of completeness.
AI was used collaboratively to propose page structure, define filtering logic (past vs. future shows), shape collaborator logic, and keep features aligned with available data.
4. Site architecture & wiring
The live site is assembled using the following components:
Google Sheets
- Holds all canonical data
- Published as CSV for read-only access
Cloudflare Worker
- Fetches published CSV data
- Parses and transforms data in real time
- Applies logic (date filtering, personnel parsing, counts, rollups)
- Outputs lightweight JSON to the frontend
This approach avoids databases, server maintenance, and authentication layers.
Static frontend
- HTML / CSS / minimal JavaScript
- Entire site rebuilt from the same data source
- Extremely small JS footprint
Hosting & DNS
- Domain registered via Network Solutions
- DNS and edge logic handled by Cloudflare
- Static site hosted via InfinityFree
AI assisted in debugging hosting/SSL issues, optimizing file size and performance, and designing a fault-tolerant, low-cost deployment model.
5. Press, media, and attribution
Press items and media assets follow the same philosophy:
- Indexed via spreadsheet rows
- Linked to external sources or hosted files
- Attributed with publication, date, and band
AI was used to locate historical press references, summarize long articles, and propose standardized metadata fields. All press content remains attributed to original sources.
6. Ongoing maintenance workflow
- New show happens (or a historical show is discovered)
- Row is added or updated in the Google Sheet
- Personnel field uses
|-separated names
- Recording or media links added when available
- Site reflects changes instantly — no rebuild required
AI continues to assist with bulk cleanup, historical backfills, and relationship analysis (collaborators, counts).
7. Transparency & limitations
Despite best efforts:
- Some early shows lack complete documentation
- Personnel may be incomplete where records no longer exist
- Recording coverage is uneven across eras
Rather than obscuring gaps, the archive acknowledges them. This project prioritizes honesty, traceability, and continuous improvement.
8. Why this matters
This archive is not just a list of shows. It is:
- A longitudinal record of creative collaboration
- A reproducible model for artist-run archives
- Proof that AI can be used as a research assistant—not an authority
Every design and data decision reflects that philosophy.
Last updated: v2.8.10