Building the API Layer: From Schemas to Live Endpoints

The data layer was complete: 866 products across three platforms (Fashionpedia, Met Museum, Smithsonian), enriched with AI-generated metadata, embedded into two Qdrant vector collections, and connected by 7,324 style bridges. The Next.js frontend was built and waiting. But between the data and the UI—nothing. The api/ directory had a set of broken stubs: an __init__.py containing just =, a search.py with syntax errors like query = str instead of query: str, and a main.py importing from paths that didn’t exist.

The job: design and build the full contract between backend and frontend.

Designing 13 schemas

Before writing a single endpoint, I defined every request and response shape as Pydantic v2 schemas. This meant cross-referencing three sources of truth:

TypedDict types in bridge_queries.py (the data layer’s output shape)
TypeScript interfaces in vv-web/src/types/index.ts (the frontend’s expected input)
SQLAlchemy ORM models in database.py (the actual column definitions)

The schemas had to bridge all three. Thirteen models across four files:

product.py: ProductSummary (13 fields, compact for bridge results) and ProductDetail (34 fields, full product page)
search.py: SearchFilters, TextSearchRequest, ImageSearchRequest, SearchResult, SearchResponse
bridge.py: BridgeResult, BridgeListResponse, BridgeTypeStats, ScoreHistogramBucket, BridgeStats
filters.py: FilterOptions (8 filterable dimensions)

The JSON-in-TEXT problem

The ORM stores arrays like style_tags and colors as JSON strings in TEXT columns. The API needs actual arrays. A field validator handles the parsing:

@field_validator("style_tags", "colors", mode="before")
@classmethod
def _parse_json_lists(cls, v):
    return _parse_json_list(v)

With model_config = {"from_attributes": True}, a single call like ProductDetail.model_validate(orm_object) handles everything—JSON parsing, type coercion, null defaults. No manual field mapping needed.

Cleaning up dead columns

While cross-referencing the ORM model against actual data, I found five columns with zero rows populated out of 866: color, season, year, pattern, period. Leftovers from early dataset imports that were superseded by enrichment fields. Removed them from every schema before they could become permanent API baggage.

Filter alignment

Caught that SearchFilters (what users send with search requests) and FilterOptions (what the /filters endpoint returns) were misaligned—different field names, different field counts. Aligned them to the same 8 dimensions: era, decade, garment_type, vibe, occasion, fit_style, culture, material. Now the filter dropdown options match exactly what the search accepts.

Schema verification

All 13 schemas import cleanly. ProductDetail.model_validate() works with a live ORM object from Postgres. BridgeResult.model_validate() works with a TypedDict from bridge_queries. 100 unit tests pass (including 23 that were broken by a stale import path from an earlier directory restructure—fixed that too).

Writing the routers

I wrote every router by hand rather than generating them. When I wrote Depends(get_db) for the tenth time, I understood what dependency injection was doing. When I accidentally shadowed an imported function name with an endpoint function name, I understood why Python resolved it the way it did. That kind of understanding matters when you’re debugging at 11pm.

Products router—five endpoints. The first draft had 30 lines of manual field mapping in get_product. Replaced it all with ProductDetail.model_validate(product). One line. That’s what the upfront schema work was for.

Bridges router—four endpoints. Route ordering matters here: /top, /stats, and /between/{a}/{b} must be declared before /{bridge_id}, or FastAPI tries to parse “top” as an integer. Also worth noting that bridge_queries functions use keyword-only arguments (the * separator), so positional args silently fail.

Filters router—one endpoint, eight SELECT DISTINCT queries. Had to cross-reference the ORM model for the correct field names—Product.garment_type not Product.subcategory, Product.fit_style not Product.brand. The ORM model is the source of truth.

Search router—the most complex. Two POST endpoints (text and image), native Qdrant filtering, and a base64 image decoder.

Native Qdrant filtering

For text search, I wanted filters applied at the vector database level, not in Python after fetching results. Added a query_filter parameter to search_similar() in the vector DB layer, then built a helper that converts the SearchFilters schema into Qdrant’s native Filter format:

def _build_qdrant_filter(filters: SearchFilters | None) -> Filter | None:
    if not filters:
        return None
    conditions = []
    for field, value in filters.model_dump(exclude_none=True).items():
        conditions.append(FieldCondition(key=field, match=MatchValue(value=value)))
    return Filter(must=conditions) if conditions else None

This means Qdrant prunes the search space before doing vector comparisons—filtering happens where it’s cheapest.

The payload gap

The most substantive problem I caught was that the two Qdrant collections had different payload shapes. vintage_text had 28 metadata fields (all the enrichment data). vintage_images only had 12—the image embedding script predated the enrichment pipeline, so it didn’t include those fields.

This mattered because image search results would be missing material, vibe, garment_type, and everything else the frontend needs to display. Two options: join with Postgres on every image search, or fix the payloads at the source.

Wrote a backfill script using Qdrant’s set_payload()—it merges new fields into existing points without touching the vectors. CLIP embeddings are computed from pixel data, so they’re independent of metadata. All 866 points updated in seconds. Also fixed the image embedding script so future runs include the full payload by default. Both collections now have identical shapes.

Final assembly

main.py ended up straightforward—all the real complexity lives in the routers and schemas:

app = FastAPI(title="Vintage Vestige API", version="1.0.0")

app.include_router(search.router)
app.include_router(products.router)
app.include_router(filters.router)
app.include_router(bridges.router)

The final count

api/
├── main.py              (22 lines)
├── dependencies.py      (singletons for Qdrant + embeddings)
├── schemas/             (13 Pydantic models)
│   ├── product.py       (ProductSummary, ProductDetail)
│   ├── search.py        (SearchFilters, TextSearchRequest, ImageSearchRequest, SearchResult, SearchResponse)
│   ├── bridge.py        (BridgeResult, BridgeListResponse, BridgeTypeStats, ScoreHistogramBucket, BridgeStats)
│   └── filters.py       (FilterOptions)
└── routers/             (13 endpoints)
    ├── search.py        (POST /search/text, POST /search/image)
    ├── products.py      (5 GET endpoints)
    ├── bridges.py       (4 GET endpoints)
    └── filters.py       (GET /filters)

Takeaways

The shape of your data matters more than the code that moves it. The first half of this work was all schemas—no endpoints, no routes, just defining the contract. Once that was solid, the routers came together quickly, because every question about “what fields do I return?” was already answered.

The payload gap reinforced something related: when two systems (text embeddings and image embeddings) are built by different scripts at different times, their data shapes drift. The fix was trivial, but only because I caught it before building workarounds into the API layer.

Next up: running the full test suite, then connecting the frontend to the live API.