Clojure and me » 2018

Content-Defined Dependency Shading

unsorted — cgrand, 9 March 2018 @ 14 h 16 min

Shading is the practice of renaming a dependency and embed it in a project to be sure it won’t conflict with another version of itself (it’s a good time to go watch or rewatch Rich Hickey’s Spec-ulation).

For Unrepl we rely on shading extensively as we don’t want the code injected by the client to interfere with running code or even tools running a different strain of Unrepl.

That’s how we ended with the idea of content-defined shading: choose a granularity of shading (e.g. namespace or all eps, or whole project), compute a hash (in our case SHA1, thus we are SHA-ding!) on it and use the hash in the renaming process.

Doing so we end with stable names that don’t depend on date, version or commit.

Comments (7)

Datastructures: stateless transient flags

unsorted — cgrand, 5 March 2018 @ 16 h 07 min

Traditionally transient data structures use a mutable box to determine whether a node can be modified in place or not. Somehow it acts like a transaction: when the mutable box contains a non null then it means the transient is still “open” (editable nodes have not been shared yet os can be modified). Thus each node has a reference to a reference object (when the node was created by a persistent operation the reference to the reference itself is null).

When I was working on confluent map I found another way to track transients node ownership.

The main idea is to put the flag not in the node but in its parent. Since there are one flag per child, better to store them has a bitmap. Space-wise this solution is not greedier than using a reference type: a 32 bit bitmap vs a 32 bit reference (with compressed pointers) and an additional object.

The role of these flags is to tell whether a child is exclusively owned (not shared) by its parent. Now when one traverse a transient data structures from its root, one has just to check that the whole ownership chain is exclusive. When so the node is editable (mutable). No mutability required.

Tangentially related notes:

In confluent map, I doesn’t even need to have a separate bitmap, since I had a case left in the main bitmap.
CHAMP hash maps are no faster than Clojure ones, benchmarks in the paper measure differences in hash algorithms (plain Java vs Clojure Murmur3 strain) not in data structures implementations.
confluent maps sports 3-way merge in time linear with the number of edits (not the actual size).

Comments (4)

Mic check

unsorted — cgrand, @ 13 h 04 min

Long time no post.
In the years since the last post, I’ve worked on several projects, let me introduce some!

First there’s xforms a collection of transducers-related stuff. Xforms is really great if you need to do any kind of aggregation, it also provides transducer versions of some core functions and several new transducing contexts (strings, io). Plus it has optimizations for dealing with key-value pairs without ever allocating a pair object (best case).

Xforms was initially a clojure/jvm lib but Mike Fikes started porting it to cljs, however I was not happy with having to either split the codebase or break the API to solve the “macros-and-code-in-one-file” problem. With his and António’s expert knowledges to guide me I figured out a couple of macros which allows to write cljc code which mixes macros and code, works on clj/jvm, cljs/clj and self-host cljs (yes there are macros to figure out the cljs flavor). This is really useful when porting clj/jvm code to cljs/*. These macros are packaged in a library named Macrovich.

With colleagues at HCA Datalab we worked on Powderkeg (“Keg” for friends) which basically turns Apache Spark in a giant transducing context. Plus it works without any AOT. Start the repl, connect to the cluster, run your transducers on RDDs. Benefits: you can experiment against real data with a tight feedback loop and you can test your computations with no dependencies on Spark.

The part of Keg that makes the REPL-no-AOT experience possible has been repurposed into Portkey which allows to deploy freshly REPLed functions as AWS Lambdas; a sub project is aws-clj-sdk an AWS api generated from the machine-readable services descriptions provided by Amazon (like official SDKs or Python’s Boto); Kimmo Koskinen and Baptiste Dupuch are hard at work on both projects.

Last, there’s Unrepl which aims to provide better REPLs and tooling in general without requiring a single dependency to your project. All that is needed is a plain socket repl and an Unrepl-powered tool (like Spiral for Emacs, Vimpire for VIM, or Unravel for the terminal).

Comments (22)