Native-only garbage collection

Porffor now has garbage collection (GC)... somewhat. I propose a likely controversial garbage collector implementation: Wasm builds have no GC and native builds have a tightly integrated tiny GC.

No GC for Wasm

I imagine this will be controversial and I will preface that this is just my opinion: GC in Wasm is generally poor (compared to native). Your options for implementing GC in Wasm:

Wasm GC: neat but very complex and performance is still not that compelling. It would require rewriting a significant part of Porffor and result in a lot of Wasm runtimes being incompatible (it is practically infeasible to support both here).
Implementing your own GC: largely performance (and complexity) issues, mostly because of Wasm (rightfully) having tightly managed memory access.

Having a performant and minimal GC simply does not align with Wasm's model, which is fine! There are definitely arguments against this but from what I know and my experience, this is generally true.

I propose that in most situations in which you compile to Wasm, you do not really need GC as they are typically stateless one-shots which can either be re-instantiated or reset between execution. I am sure there are scenarios I am missing, but from what I have seen this is mostly true. You likely would not have long-running server side processes in Wasm; you are either using the Wasm as a lambda-like or using native instead.

A GC for native

Porffor's GC is intentionally as simple as possible. Not only does this generally improve performance but also reduce the attack surface. The GC is ~400 lines of C that replaces Porffor's regular allocator from Wasm in Porffor's own Wasm -> C compiler. Replacing the allocator means we do not need a rewrite, our Wasm code does not know the difference. It is simple: mark-and-sweep with bump allocation (and free-list reuse). For now, it is only used for HTTP servers as it is easier to track roots since we can GC specifically between requests to avoid GC spikes during requests (so this does not work generally with JS yet, but is not far off). There is no compacting or generations as we are talking in hundreds of KBs here with other bottlenecks before GC so it isn't needed yet.

Very simplified pseudo-code for those not familiar with GC:

const roots = // ...
const allObjects = // ...

function mark(root) {
  // Mark root and all its references as safe
  if (root.safe) return;
  root.safe = true;

  const references = // ...
  for (const ref of references) mark(ref);
}

function sweep() {
  // Sweep away all non-safe objects
  // (objects are ~anything JS using memory: JS objects, arrays, strings, ...)
  for (const object of allObjects) {
    // Free if not safe, otherwise mark as unsafe for next collection
    if (!object.safe) {
      free(object);
    } else {
      object.safe = false;
    }
  }
}

function onRequest(requestObject, callback) {
  // Don't sweep request object while request is being processed
  roots.add(requestObject);

  // Run user code to process the request and return the response
  const response = userCode(requestObject);
  callback(response);

  // Remove request object from roots
  roots.remove(requestObject);

  // Mark and sweep
  for (const root of roots) mark(root);
  sweep();
}