How We Built a Sub-50ms DICOM Streaming Pipeline

When we started PrizMed in 2019, the biggest pain point we heard from healthcare providers was simple: "Why does it take 45 seconds to load an MRI study that was acquired 2 minutes ago?" The answer was depressingly mundane — legacy DICOM protocols, synchronous processing pipelines, and storage architectures designed in the 1990s.

The Challenge

Medical imaging data is uniquely demanding. A single CT study can contain 5,000+ slices at 512×512 pixels, totaling several gigabytes of raw DICOM data. Multiply that by hundreds of concurrent studies across a health system, and you're looking at sustained throughput requirements measured in terabytes per hour.

Our goal was ambitious: stream any imaging study from acquisition to viewer in under 50 milliseconds (p95), while maintaining strict HIPAA compliance and zero data loss guarantees.

Architecture Overview

Our streaming pipeline is built on three core principles:

Chunked streaming — We break studies into variable-size chunks (4–8 MB) and stream them through our API using HTTP/2 with multiplexed connections. The /api/v2/imaging/stream endpoint accepts chunked uploads with sequence tracking via X-Stream-Id and X-Chunk-Seq headers.
Edge processing — Rather than routing all data through a central datacenter, we process and cache imaging data at 40+ edge locations. The nearest edge node handles decompression, de-identification, and initial quality checks.
Speculative pre-fetch — Our ML model predicts which studies a radiologist will view next based on worklist patterns, and pre-positions the data at the optimal edge location before it's requested.

Connection Multiplexing

One of our most impactful optimizations was moving from one-connection-per-study to multiplexed streaming. Our client SDK maintains a pool of 16–32 concurrent HTTP/2 streams per connection, with each stream handling a separate chunk sequence. This eliminated the TCP slow-start penalty that plagued our original implementation.

The results were dramatic: upload throughput improved by 13x on high-latency links, and our p50 latency dropped from 180ms to 12ms.

Results

Today, our pipeline processes over 2.4 billion images annually with a p95 latency of 47ms. During peak hours, we sustain 800+ Gbps of aggregate throughput across our edge network. And we've maintained 99.99% uptime for 34 consecutive months.

If you're building healthcare applications that need fast, reliable access to medical imaging data, check out our API docs or reach out to our team.