Dataface Tasks

Async prefetch with future-based cache for progressive rendering

IDDFT_CORE-ASYNC_PREFETCH_WITH_FUTURE_BASED_CACHE_FOR_PROGRESSIVE_RENDERING
Statusnot_started
Priorityp3
Milestonem4-v1-0-launch
Ownersr-engineer-architect

Problem

With synchronous prefetch (from the batch prefetch task), render blocks until ALL queries complete before rendering ANY chart. For a dashboard with 10 queries across 3 sources, the user waits for the slowest source before seeing anything. In the interactive serve path, we could start rendering charts as their data arrives — a chart whose query finished in 200ms shouldn't wait for another chart's 3-second Snowflake query.

Context

Depends on: DFT_CORE-WIRE_UP_BATCH_QUERY_PREFETCH_BEFORE_RENDER (synchronous prefetch must land first)

Key files: - dataface/core/execute/executor.pyself._cache: dict[str, list[dict]] is a simple dict. Needs to accept Future objects. - dataface/core/serve/server.py — FastAPI serve layer, already async. - dataface/core/render/renderer.pyrender() and chart rendering call executor.execute_chart() synchronously.

Current cache behavior: execute_query() checks if cache_key in self._cache — a synchronous key lookup. Either data is there or it isn't. No concept of in-flight queries. If prefetch runs async and render starts before it finishes, execute_chart would cache-miss and fire a duplicate query.

Possible Solutions

Replace self._cache: dict[str, list[dict]] with dict[str, list[dict] | Future[list[dict]]]. Prefetch submits queries to a ThreadPoolExecutor and stores futures in the cache. execute_query() checks if the cache value is a Future and calls .result() to block until that specific query completes.

  • Uses stdlib concurrent.futures — no new dependencies
  • Minimal change to executor interface — callers don't need to know about async
  • ThreadPoolExecutor naturally handles parallel execution across source profiles
  • Synchronous mode (CLI) just calls .result() immediately or skips futures entirely

Option B: Full asyncio integration

Make execute_query and execute_chart async, use asyncio.Future and await. Render becomes async throughout.

  • More invasive — every call site becomes async
  • Better fit for the serve path (already async) but painful for CLI
  • Would need sync wrappers for non-async callers

Option C: SSE streaming with progressive chart delivery

Don't overlap prefetch with render. Instead, render each chart independently and stream results to the client via Server-Sent Events as each chart completes.

  • Best UX — charts appear progressively
  • Most complex — requires client-side assembly, SSE infrastructure
  • Could combine with Option A for both server-side and client-side progressiveness

Plan

  1. Extend executor cache type to dict[str, list[dict] | Future[list[dict]]]
  2. Add execute_face_batch_async() that submits queries to ThreadPoolExecutor, stores futures in cache
  3. Update execute_query() cache lookup: if value is a Future, call .result() to block
  4. Wire into serve path: use async prefetch for web requests, sync prefetch for CLI
  5. Measure: compare time-to-first-chart-rendered with sync vs async prefetch

Implementation Progress

Review Feedback

  • [ ] Review cleared