css-tooling

I’ve been thinking lately about what’s so bad about CSS? Why do engineers shy away from it so much? Why is Tailwind, a tool which in some ways helps you sidestep writing CSS, so popular?

Modern CSS is very powerful. The progressive additions to the language have made it more and more capable at meeting any design, and pulled more and more out of the realm of Javascript, effectively making User Interface design more declarative and less imperative.

Isolation

I use Tailwind at work, and honestly it is a joy to use. Once its hooked up to your design system it lets you build out components and pages quickly, and with consistent styling.

Before we incorporated Tailwind into our stack, we had a large monolith with a tangled web of SCSS files combined by some black magic into a semi-functioning design system. It was a nightmare to work with, and engineers cowered in fear at the prospect of touching any underlying styles.

Tailwind really shon here. It let us iterate on the small piece of the application each team was focused on in isolation. I think that’s the key, the isolation. If you want to paint a div red, you can add a class name. What could be simpler? No worrying about that other rule nestled somewhere in the mountain of SCSS which says all divs should be blue.

Breakpoints

Tailwind standardises breakpoints, and makes them easy to use. In CSS we are stuck with the @media query, which is a bit of a pain to use. Especially since you can’t inject CSS variables into it. More on CSS variables shortly.

Theming

Tailwind LSP (when setup and working correctly) is also a great tool. Everyone loves autocomplete! Having the actual values of your design system at your fingertips is a game changer. No more guessing what the colour names. No more inconsistency, and hopefully no more hardcoded colours.

But I’ve not seen a good solution for this in vanilla CSS. There is this VS Code plugin which is a good start. You have define paths which contain your CSS variables, and it will autocomplete them for you. But this is a solely Visual Studio Code solution.

Outcome

We need better CSS tooling!

We need to find ways of achieving the following in vanilla CSS:

I’m wondering if I can come up with a solution which answers each of the above points in a clean, unified way.

I discussed my ideas around this area of CSS tooling with my team lead @Chris. He was very receptive to the idea.

He expressed that the idea sounded similar in features to the Tailwind CSS LSP which is open source. He suggested that I should take a look at the project to see how they have implemented their features.

I dove into how I’ve been writing vanilla CSS for my personal site, and dang is it nice to be fully free to flex the cascade etc. It needs better guardrails to help improve the authoring experience.

I love it What ya waiting for!

The dreaded naming

Everyone knows that the hardest problem in computer science is naming things.

I played around with a ton of names for this project, but at the moment I’ve settled on Lime.

I like the idea of a fresh, zesty CSS tooling project. And the name is short and snappy, which is always a plus.

I also find the Limes song by Tommy Dockerz hilarious

https://www.youtube.com/watch?v=7QIqU-DoPsM

I have just come across one big drawback. There is no Lime emoji. 🍋 exists, but no lime.

This feels pretty racist towards limes.

The plan

In my mind the project will be made up of 3 pieces

Parsing the CSS

Lightning CSS already has a Rust library for parsing the CSS into an AST

https://lightningcss.dev/docs.html#from-rust

use lightningcss::stylesheet::{
  StyleSheet ParserOptions
};

// Parse a style sheet from a string.
let mut stylesheet = StyleSheet::parse(
  r#"
  .foo {
    color: red;
  }

  .bar {
    color: red;
  }
  "#,
  ParserOptions::default()
).unwrap();

Determining CSS selector usage

This is the tricky part. I think the best way to do this is to have a regex search of the codebase for class names as a first pass solution.

Getting it nailed perfectly would require a language specific parser. Like one for raw HTML which knows how to pick out class names/attributes, another for JSX, another for Vue etc etc. It doesn’t scale particularly well.

How will Limescale?!

The interface

Finally the insights need to be exposed as an LSP surface or CLI for use in vim / vscode etc

I’ll likely start with a CLI as that will be simpler to implement and test the idea works, but build it in a way that it can be extended to an LSP later.

Starting the project with cargo init

Parsing the CSS selectors from a given CSS file was pretty straightforward with LightningCSS. Rust’s strong type system made it easy to explore the parsed AST and extract what I needed.

Crawling @import statements

The first piece of actual functionality I added was the ability to crawl @import statements in CSS files, extracting the selectors from each file and adding them to the main list of selectors.

It was pretty trival to pick out the CssRule::Import type in a match statement and extract the file path, then std::fs::read_to_string the file contents and recursively follow any further imports.

/// Parse a CSS stylesheet and return a list of selectors
/// The filepath is required to provide context for the root CSS file
pub fn parse(filepath: &str, raw: &str) -> Vec<Selector> {
    // Parse a style sheet from a string.
    let stylesheet = StyleSheet::parse(raw, ParserOptions::default()).unwrap();

    // get selectors from rules
    let CssRuleList(rules) = stylesheet.rules;

    let imports = parse_imports(rules.clone());
    let mut selectors = imports
        .iter()
        // recursively parse the imported files
        .map(|i| parse(i.filepath.as_str(), i.raw.as_str()))
        .flatten()
        .collect::<Vec<Selector>>();

    for rule in rules {
        selectors.extend(parse_selectors_from_rule(filepath, rule));
    }

    selectors
}

Testing the CSS parser

The harder part came in how to easily test the parser, especially when nested import statements were involved.

After some digging into the recommended solution, I found that I could use the tempfile crate to create temporary files and directories for testing. This allowed me to create a test directory with a few CSS files and test the parser against them.

let tmp_dir = TempDir::new("lime").unwrap();
set_current_dir(tmp_dir.path()).unwrap();

let entrypoint_css = r#"
    @import "test2.css";
    @import "test3.css";
    "#;

let test_2_path = tmp_dir.path().join("test2.css");
let test_3_path = tmp_dir.path().join("test3.css");

std::fs::write(test_2_path, ".foo { color: red; } .bar { color: red; }").unwrap();
std::fs::write(
    test_3_path,
    r#"
    .foo { color: red; }
    @media (min-width: 768px) {
        .bar { color: red; }
    }
    "#,
)
.unwrap();
let result = parse("test.css", entrypoint_css);
let selectors: Vec<String> = result.iter().map(|r| r.selector.clone()).collect();
assert_eq!("test2.css", result.first().unwrap().filepath);
assert_eq!([".foo", ".bar", ".foo", ".bar"], selectors.as_slice());

I did keep finding that the tests were failing when I ran them in parallel. I had to add a #[serial] attribute (from the serial_test crate) to the test functions to ensure they ran in sequence.

use serial_test::serial;

#[test]
#[serial]
fn parses_imported_selectors() {
    // test code
}

Thoughts on colocation of tests

Very much a Rust thing, but the colocation of module tests inside the source code module is something which I didn’t think I would like as much as I do. It’s nice to have the tests right there, tied to the implementation, and not in a separate tests directory.

One killer feature though, is the ability to test private functions. I can write tests for the parse_selectors_from_rule function, which is private, and not have to expose it just for testing.

fn parse_selectors_from_rule(filepath: &str, rule: CssRule) -> Vec<Selector> {
    // implementation code
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn extracts_selectors_from_rule() {
        // test code
    }
}

Very cool.

But how to handle the content?

The CSS part is pretty easy to think about, get me the selectors I am currently using and their locations.

But what about the content? How do I know which selectors are being used in which files?

My initial attempt was to use the underlying Rust library that powers Ripgrep. They’re great libraries, nicely organised and very consumable.

I could use the grep crate to search for the selectors in the content of the files. But I quickly realised that this leaves a lot of processing of the content to me. Like how do I know that a specific string is actually related to the CSS? Its not a true reflection of the CSS.

Imagine you have a class name of “card” in your CSS. All it would take to break Lime would be for you to write a single comment or function name containing that 4 letter string.

Treesitter 🌴

From using Neovim I know all about how powerful Treesitter is. It seems to be underpinning every code editor these days! I’ve even written my own grammar for Treesitter in C once upon a time, a fun experience! Its extensible and powerful.

Fortunately there are Rust bindings for Treesitter, so I can use it to parse the content of the files and get a queryable AST of a source code file, as long as that language has a grammar defined for Treesitter.

My thinking is, if I can write some generic Rust functions which take a TS query and a source code file, using Treesitter I can extract a list of class names, ids and DOM attributes which might be used as CSS selectors.

The question is, do I pass in the CSS selectors of interest, and use them to focus in on the relevant parts of the AST? Or do I have each process be separate, and then reconcile the two lists at the end?

More experimentation required.

What has Lisp got to do with CSS?

I’ve been thinking a lot about how to make Lime more extensible, whilst keeping the rich knowledge of the content source code files. Treesitter has a great mechanism for this, by allowing you to query files using a specialised Lisp dialect.

To this end I wanted to experiment with a root /queries/LANG directory in the project, where each language has its own set of queries. I feel like this separation of concerns would allow for Lime to be easily extended to support new languages without needing to change any Rust code at all.

An added benefit should then be, that in userspace, a consumer of Lime could define their own Treesitter queries which will have generic captures that Lime can then use to provide feedback 🧠

Including static files in the Rust binary

How can I consume the static Treesitter query files internal to Lime without needing to read them from the filesystem at runtime?

I immediately discovered a Rust builtin which would all for reading a specific file at compile time, then including it in the binary.

let bytes = include_bytes!("query.scm");
print!("FILE CONTENTS", String::from_utf8_lossy(bytes));

https://doc.rust-lang.org/std/macro.include_bytes.html

This is a great start, but I need to be a bit clever-er about this, and allow for an entire directory to be included in the binary. I found a crate called include_dir which does exactly this.

https://crates.io/crates/include_dir

An evolution of the include_str!() and include_bytes!() macros for embedding an entire directory tree into your binary.

Perfect. Just what I need!

cargo add include_dir

The API is even cleaner as a bonus

use include_dir::{include_dir, Dir};

static PROJECT_DIR: Dir = include_dir!("$CARGO_MANIFEST_DIR");

// you can retrieve a file by its full path
let lib_rs = PROJECT_DIR.get_file("src/queries/html/id.scm").unwrap();

// you can also inspect the file's contents
let body = lib_rs.contents_utf8().unwrap();
assert!(body.contains("SOME_INTERESTING_STRING"));

// you can search for files (and directories) using glob patterns
#[cfg(feature = "glob")]
{
    let glob = "**/*.scm";
    for entry in PROJECT_DIR.find(glob).unwrap() {
        println!("Found {}", entry.path().display());
    }
}

Globbing

That globbing feature is going to be super useful for Lime. I can now include an entire directory of Treesitter queries in the binary, and use each query file in my Rust functions.

So I turned it on by adding the glob feature to the Cargo.toml

[dependencies]
include_dir = { version = "0.7.3", features = ["glob"] }
...

Thinking through how state will be stored

I’ve been thinking a lot about how to store the state of the CSS selectors and content locations in Lime.

I initially wondered about using the state crate, which looks awesome, you can define a global “holder” for state and update it either in a thread local way, or globally.

static CONFIG: LocalInitCell<Configuration> = LocalInitCell::new();

fn main() {
    CONFIG.set(|| Configuration {
        /* fill in structure at run-time from user input */
    });

    /* at any point later in the program, in any thread */
    let config = CONFIG.get();
}

I still might use it for the initial configuration, as global access would be very handy!

However, I need to think deeply about access patterns for this data.

Firstly, I need to be able to hop back and forth between the selector and content which implements it. This sounds an awful lot like an index in a SQL database.

This immediately makes me think SQLite!

Secondly, I need to be able to store the state of the CSS selectors and content locations in a way that is easily queryable in ways I have not yet anticipated. This sounds like a job for a database.

And thirdly, I want to maintain state between runs. A cache on disk. Why implement this myself when SQLite can be that cache for me?

SQLx

Turns out Rust has a great library focused on async SQL queries called SQLx. And guess what. It supports SQLite!

Its a great library, but unfortunately suffers from the same issue as other async Rust crates. Decision fatigue.

The first thing you have to do with the library is make a choice. Do you want tokio or async-std? Do you TLS or not? Flavour of SQL do you want to use?

# Cargo.toml
[dependencies]
# PICK ONE OF THE FOLLOWING:

# tokio (no TLS)
sqlx = { version = "0.7", features = [ "runtime-tokio" ] }
# tokio + native-tls
sqlx = { version = "0.7", features = [ "runtime-tokio", "tls-native-tls" ] }
# tokio + rustls
sqlx = { version = "0.7", features = [ "runtime-tokio", "tls-rustls" ] }

# async-std (no TLS)
sqlx = { version = "0.7", features = [ "runtime-async-std" ] }
# async-std + native-tls
sqlx = { version = "0.7", features = [ "runtime-async-std", "tls-native-tls" ] }
# async-std + rustls
sqlx = { version = "0.7", features = [ "runtime-async-std", "tls-rustls" ] }

At least I know which database I want! Sign me up for the sqlite feature flag.

I opted for Tokio and no TLS for now.

After install, I did hit a horrible Rust error which I’ve never seen before:

error[E0658]: async fn return type cannot contain a projection or Self that references lifetimes from a parent scope

Scary stuff.

Thank the Rust Gods, this helpful issue on the SQLx repository helped me diagnose and solve the error.

rustup update

I have this function:

async fn parse(filepath: &str, cache: &impl FileActions) -> Result<(), Box<dyn std::error::Error>> {
    let raw = std::fs::read_to_string(&filepath)?;
    let mut hasher = DefaultHasher::new();
    raw.hash(&mut hasher);
    let hashed = hasher.finish().to_string();
    cache.cache_file(filepath, &hashed).await?;

    // Parse a style sheet from a string.
    let stylesheet = StyleSheet::parse(&raw, ParserOptions::default())?;
    // get selectors from rules
    let CssRuleList(rules) = stylesheet.rules;

    for import_url in Parser::parse_imports(rules) {
        // Recursively call `parse` and `await` on a boxed (heap-allocated) future.
        Self::parse(&import_url, cache).await?;
    }

    Ok(())
}

The Self:parse call causes a compile time error because asynchronous functions are not allowed to call themselves recursively.

Further reading, and this very helpful page of the async Rust book explains that the implementation of Futures and Async in Rust mean that the compiler can’t determine the size of the stack at compile time, so it can’t handle recursive calls. Instead, functions need to placed in the heap via a Box type 🧠

Surely AI can help here

GPT4 (via Zed’s assistants panel) gave me this monstrosity as a suggestion

fn parse_recursive<'a>(
    import_url: &'a str,
    cache: &'a impl FileActions,
) -> Pin<Box<dyn Future<Output = Result<(), Box<dyn std::error::Error>>> + Send + 'a>> {
    Box::pin(Self::parse(import_url, cache))
}

Very difficult to ready.

The explanation was actually pretty well formed

Your parse function is recursively calling itself within an async context, which is problematic for the Rust compiler. Unlike normal, synchronous recursion, which can have its stack space determined at compile time (though still potentially at risk of stack overflow at run time for deep recursion), asynchronous recursion’s stack space cannot be efficiently predicted or managed due to the nature of futures and how they are polled. This is what the error [E0733]: recursion in an async fn requires boxing is warning you about. To resolve this issue, you can make your recursive calls through a heap-allocated Box<dyn Future>, effectively allowing the future to be ‘moved’ onto the heap, thus sidestepping the stack space issue associated with recursive async functions. Below is an adapted version of your parse function that employs boxing to facilitate recursion:

## A more readable refactor

I refactored the parse_recursive to follow the recommendations in the async rust book.

fn parse_recursive<'a>(import_url: &str, cache: &impl FileActions) -> BoxFuture<'a, ()> {
    async move {
        Self::parse(import_url, cache).await;
    }
    .boxed()
}

But this still is not correct. There is an error when passing the cache over the async boundary. Any values which are moved need to implement the Send trait, which for my special cache type, it does not, and realistically it can’t as it contains an internal open connection to Sqlite.

One option to get around this could be to refactor the cache into some type like a Mutex, something which is happy to be passed over async boundaries. But actually I think the better solution would be to refactor the recursive calls one level higher.

Instead I can gather all of the required import urls and then call the parse function on each of them in parallel. This way I can avoid the recursive call and the need to pass the cache over the async boundary.