% du -h hello hello_trimmed
168K hello
120K hello_trimmed
% du -h hono hono_trimmed
668K hono
296K hono_trimmed
% du -h elysia elysia_trimmed
5.9M elysia
1.3M elysia_trimmed
% du -h h3 h3_trimmed
508K h3
296K h3_trimmedWe can shrink Porffor binary sizes by ~1.5-5x. How? Profile-guided dead code elimination.
Dead code elimination is hard in dynamic languages
import hugeDependency from 'huge-dependency';
function foo(arg) {
if (arg == null) return true;
return hugeDependency.foo(arg);
}
const anArgument = // something extremely hard to infer
foo(anArgument);Statically, we have to presume hugeDependency.foo is used as we just cannot know statically here. This might mean we have to pull in say, 1 million lines of JS, and as the compiler we cannot do anything about it. This sucks. Unless...
Profile-guided DCE
As we cannot know statically, what if we just find out at runtime? Assume a spherical cow deterministic program, we can run it once with profiling to collect every function call, delete unused functions, and compile again. This is similar to profile-guided optimization, but purely for dead code elimination.
By far, collecting the profile is the hardest part here. Not the collection itself but how/what to collect. A web server, for example, you would have to request every route and possible signature (parameters, etc) then use that, presuming no differences due to state or a database or ...
But for relatively deterministic programs, this method of just eliminating any uncalled function works great. However, there is another method that works for all programs and still has benefit.
Profile-guided selective optimization
We can do "DCE-lite" instead: not eliminating uncalled functions (in our profile) just optimizing them differently. So known called functions are optimized with -O3 while unknown functions are optimized with -Oz. This sounds eerily similar to just-in-time compilation!
Using this method, we get roughly half the gain of elimination:
% du -h hono hono_trimmed_safe hono_trimmed
668K hono
476K hono_trimmed_safe
296K hono_trimmed
% du -h elysia elysia_trimmed_safe elysia_trimmed
5.9M elysia
3.3M elysia_trimmed_safe
1.3M elysia_trimmedBenefits
Not only does this help binary size, but memory usage and performance slightly too!
./hello_trimmed
Latency #1 median 0.177 ms, p90 0.330 ms, p99 0.394 ms
Throughput (req/s) #1 overall 249357.4, p10 232781.0, p1 101905.8
Memory usage #1 peak 2.16 MB, median 2.16 MB
CPU time / req #1 median 3.98 µs, p90 4.00 µs, p99 4.00 µs
Binary size #1 117.4 KB
Startup #1 7.526 ms
./hello
Latency #2 median 0.179 ms, p90 0.333 ms, p99 0.410 ms
Throughput (req/s) #2 overall 245919.9, p10 231727.5, p1 115515.0
Memory usage #2 peak 2.19 MB, median 2.17 MB
CPU time / req #2 median 4.02 µs, p90 4.10 µs, p99 4.15 µs
Binary size #2 165.8 KB
Startup #2 7.602 msThe only downside, aside from potential crashes with strict mode and potential slowdown with optimization mode, is compilation time. Although, in strict mode, post-elimination compilation time is greatly improved as there is much less to compile. This could possibly be cached in future to result in much faster compilation time than without.