
WHY A* BEATS DIJKSTRA ON REAL CITY GRAPHS: LESSONS FROM 302,066 NODES
I benchmarked Dijkstra and A* on the entire road network of Lucknow — 302,066 nodes, 4,132 km². Here's what the data actually showed, and why the Gomti River makes uninformed search almost useless.
Most graph theory textbooks introduce Dijkstra and A* side-by-side and leave it at that. But what actually happens when you run both on a real city — not a toy example, not a 1000-node graph, but an entire metropolitan road network parsed from OpenStreetMap?
I spent several weeks building a C++17 pipeline on the Lucknow district PBF file (bounding box [80.494, 26.595] to [81.284, 27.068]) and running 1,000 randomised origin-destination trials. The result: A* achieved a mean search-space reduction of 78.13% (p < 0.0001). Here's why that number is both expected and surprising.
The Setup
The pipeline uses Libosmium to parse the PBF binary, filters for highway tags, and builds an adjacency list using std::unordered_map<long long, vector<Edge>>. Both algorithms use std::priority_queue for fringe management. The only difference is the heuristic: Dijkstra uses h(n) = 0, A* uses the Haversine formula between node n and the goal.
The Haversine formula gives the great-circle distance between two geographic coordinates. For a city-scale graph it is effectively Euclidean, but the key point is it is admissible — it never overestimates the true cost. That is the requirement for A* to guarantee optimal paths.
The Riverine Bottleneck Paradox
Lucknow has a river cutting through it: the Gomti. When the source is in Trans-Gomti (Indira Nagar, Gomti Nagar) and the target is in Cis-Gomti (Hazratganj, Chowk), Dijkstra expands in a circular wavefront and explores thousands of nodes moving away from the target — because they have lower edge weight than the bridge crossings. Bridge nodes are expensive, so they do not appear in Dijkstra's frontier until it has already explored a massive region.
A* sees this differently. The Haversine heuristic assigns high h(n) to any node moving away from the goal. Bridge nodes become attractor states. The search corridor collapses into a near-straight beam aimed at the nearest bridge. Node exploration drops by over 85% for cross-river journeys.
The Cantonment Regularity Effect
The Lucknow Cantonment is almost a grid. Straight roads, predictable intersections, high geometric regularity. In these regions, A*'s advantage approaches its theoretical maximum because roads follow the Euclidean direction to the goal — minimal heuristic dilution. In contrast, the organic layout of Chowk produces lower gains (~60-65%), because the heuristic can be misleading when a node looks close but is separated by a maze of alleys.
The Numbers
Across 1,000 trials: Dijkstra mean nodes explored was 162,862 and A* was 41,204, for a mean efficiency gain of 78.13%. In over 90% of trials, A* achieved more than 60% reduction. The median trial saw ~80% reduction.
One important caveat: at distances above 50km, a heuristic dilution effect appears. For regional-scale routing, hierarchical graph partitioning (like contraction hierarchies) would be the next step.
What I Would Do Next
The obvious extension is traffic-weighted edges. Right now all edges use physical distance as cost. Real routing uses time — which means congestion data and dynamic weights. Another direction is bidirectional A*, running simultaneous searches from source and goal that meet in the middle. For very long paths, this can halve exploration further. The full paper with all figures and C++17 source is on my GitHub.