Indie Devlog #03: Optimizing AI Costs in a Consumer Finance App

Monday February 16 2026·3 min read

Indie Hacking AI Flutter Product Development

AI features are expensive. When you are building a consumer finance app with a $6.99/month subscription target, every API call eats into a thin margin. Fiscify uses Google Gemini for natural language expense entry, receipt scanning, statement imports, and spending insights — all features that sound like a fast track to bankruptcy if you are not careful about costs.

The revenue math is tight: at $6.99/month, after Apple's 15-30% cut and targeting $5.00 gross profit per subscriber, I have about $0.94 per subscriber per month to spend on the entire variable stack — AI, hosting, database, email, everything. The AI budget alone has to fit inside that envelope.

The two-tier model strategy

The most impactful decision was choosing two Gemini tiers and being ruthless about which calls go where:

Feature	Model	Rationale
Natural language entry	`gemini-2.5-flash-lite`	Simple structured JSON extraction
Bulk/batch processing	`gemini-2.5-flash-lite`	Throughput, not intelligence
AI insights	`gemini-2.5-flash-lite`	Periodic batch runs
Receipt scanning	`gemini-2.5-flash`	Vision input is expensive
Statement imports	`gemini-2.5-flash`	Complex multi-chunk PDFs

Flash-Lite costs roughly $0.10/1M input tokens and $0.40/1M output tokens. Flash is more expensive but necessary when the input includes images or complex financial documents. The split saves roughly 60-70% on the highest-volume calls.

Thinking is expensive — disable it

Gemini supports chain-of-thought reasoning ("thinking"), which burns output tokens fast. For structured JSON generation — parsing "$45 sushi and $3 cab" into [{amount: 45, category: "dining"}, {amount: 3, category: "transport"}] — thinking adds no value. The model either produces valid JSON or it does not. Reasoning about why it chose those categories is wasted spend.

Every structured JSON call sets thinking_budget: 0. This cuts output token consumption by roughly half on extraction endpoints.

Receipt images: resize before you send

Receipt scanning was the highest per-call cost in the system. A 12MP phone photo of a grocery receipt contains far more pixels than Gemini needs to read "Total: $47.23".

The solution was aggressive pre-processing on the client. Every receipt image is resized to a long edge of 2048px and compressed as JPEG before the vision API call. This drops the image to roughly 200-400KB from the original 3-5MB, reducing input token count by an order of magnitude.

Future<Uint8List> prepareReceiptImage(File image) async {
  final decodedImage = await decodeImageFromList(await image.readAsBytes());
  final targetWidth = decodedImage.width > decodedImage.height
      ? 2048
      : (2048 * decodedImage.width / decodedImage.height).round();
  final targetHeight = decodedImage.height > decodedImage.width
      ? 2048
      : (2048 * decodedImage.height / decodedImage.width).round();

  final recorder = PictureRecorder();
  final canvas = Canvas(recorder);
  final paint = Paint()..filterQuality = FilterQuality.high;
  canvas.drawImageRect(
    decodedImage,
    Rect.fromLTWH(0, 0, decodedImage.width.toDouble(), decodedImage.height.toDouble()),
    Rect.fromLTWH(0, 0, targetWidth.toDouble(), targetHeight.toDouble()),
    paint,
  );
  final picture = recorder.endRecording();
  final img = await picture.toImage(targetWidth, targetHeight);
  final byteData = await img.toByteData(format: ImageByteFormat.jpeg, quality: 85);
  return byteData!.buffer.asUint8List();
}

At 2048px and JPEG quality 85, the model still reads receipt text with near-100% accuracy.

Fair usage caps as cost control

The server enforces rate limits for burst protection, but the real quota system runs on the client against Supabase. Every subscription plan has defined monthly caps:

Feature	Free	Plus
Natural language entries	100/mo	1,000/mo
Receipt scans	15/mo	500/mo
Statement imports	3/mo	10/mo

The free caps are tight enough to prevent abuse while providing genuine value. The Plus caps are generous — "unlimited*" with an asterisk — but bounded so the worst-case AI spend per subscriber stays predictable.

At maximum fair usage, the estimated AI cost per Plus subscriber lands well under $0.94/month. The largest driver is statement imports with multi-chunk PDFs hitting the more expensive Flash model, followed by receipt scans with vision input.

What I monitor

I export usage_metadata aggregates per endpoint every month and compare against the model assumptions:

p50/p95 token counts per feature — are my estimates accurate?
Cost per active free user — if this grows materially, tighten the caps.
Model retirement dates — Google retires Gemini versions periodically. The model IDs in the API are configurable, not hardcoded, so swapping a version is a config change.

The quarterly review checklist lives in the repo alongside the unit economics doc. Every three months I reconcile the Supabase quota config with the pricing doc and update the cost model with actual usage data.

The principle

The approach is not unique to AI costs. The same discipline applies to any variable cost in a subscription product: measure before you optimize, set hard caps, and know your unit economics well enough to know when a feature is losing money.

When I added receipt scanning, the unit economics doc told me exactly how many scans a Plus subscriber could use before the feature became unprofitable. That number became the fair usage cap. No guesswork, no surprises on the bill.

For suggestions and queries, just contact me.