Squirrel Strings

Don't you sometimes wonder why a whole class of security vulnerabilities exist in programming languages? Some languages try to fix issues like memory safety vulnerabilities, but one issue that's encountered on The Web™ more often is injection vulnerabilities.

username = request.get("username")
for row in execute(f"SELECT * FROM users WHERE username={username}"):
	frubnicate(row)

The possibility for injection vulnerabilities is everywhere. Most famously SQL injections where string templating or concatenation is used to build SQL commands. It's still very easy to create one of these vulnerabilities because the APIs are perfect for injection.

The only mitigation for this is warnings in the documentation and lets be fair, a lot of people won't read documentation.

This is not only an issue for SQL. There's a lot of possibilities for injection like HTML injection in template engines making it possibly to perform XSS attacks. Or shell injection vulnerabilities in embedded devices.

server = requests.get("dns")
os.system(f'echo "{server}" > /etc/resolv.conf')

For running commands it's safer and recommended to use the APIs that expect a list of arguments instead of a string, bypassing string concatenation. For HTML injection template engines usually protect the application by escaping variables fed into it.

For SQL the solution is parameterized queries to seperate the actual SQL from the data supplied by users. But what if this distinction was a programming language feature instead?

Introducing squirrel strings

So why call it squirrel strings? Because naming things is hard and I thought of this for fixing SQL.

A squirrel string is a string that is made from a string template but not finalized into an actual string. In this example I'll show how something like this would work in Python since there it can fit in quite nicely.

username = request.get("username")
for row in execute(s"SELECT * FROM users WHERE username={username}"):
	frubnicate(row)

This is the exact same example as the worst-case example above but with a single letter changed. The f-string has become an s-string. Now it seems to be doing the exact same thing, but in a secure way. The major difference is hidden here in the execute method.

A hypothetical squirrel string language construct would function the same as a Python f-string but would not return a plain string, instead it returns an object containing the literal part of the template and the data injected.

>>> foo = "World"
>>> demo = s"Hello {foo}!"
>>> demo
<Squirrel "Hello ?!" ["World"]>
>>> demo.template
["Hello ", "!"]
>>> demo.data
["World"]

For this to not to introduce weird edge-cases where the squirrel-string is used after changing the data the squirrel string should evaluate the expressions in the template before storing it in the squirrel object.

>>> foo = s"1+2={1+2}"
>>> foo.template
["1+2=", ""]
>>> foo.data
[3]

An SQL query

So with this language feature it would be way easier to make a safer database API that accepts SQL strings. To ensure injection doesn't happen through regular strings type-checks can be used to check at runtime and type annotations can be used at "compile" time.

def execute(query: Squirrel):
    if not isinstance(query, Squirrel):
        raise SecurityError()
    sql = "?".join(query.template)
    return cur.execute(sql, query.data)
    
execute(s"UPDATE users SET password={password} WHERE id={userid}")
# Becomes UPDATE users SET password=? WHERE id=?

With this in place it would be safe to template SQL strings because the insecure user data never becomes actually part of the query, which I think is a way better solution than making developers use more verbose APIs to ensure the safety.

Another side benefit is that the type of the data placed into the template is preserved so numbers would not be passed as strings into the query. The constant part of the query could also be cached internally by the database API as a prepared statement without the developer having to use another API.

In the real world

So this is all a random shower thought. None of this is actually implemented and I might be missing some obvious flaw here that causes this to not work.

The API for consuming squirrel-strings is something I thought up while writing the post and for certain use-cases a completely different API might be more useful. Maybe have the literal string as "Hello !" and a dict with data and offsets into the template. Maybe the api should stringlify all incoming data anyway.

The examples also use Python here since it's relatively easy to fit this into Python. I don't know of any other programming language that has flags for string literals embedded into the syntax.

But even if this is a perfect idea with no downsides, it wouldn't fix injection vulnerabilities immediately. Languages would have to create some syntax for defining squirrel strings. Libraries need to start accepting squirrel strings and at some point compatability with existing code would need to be broken, otherwise the "legacy" codepath would still be the default insecure one.

But in any case I wanted to put the idea out there.

Update

It was pointed out to me that this feature is almost fully in javascript already. Javascript has Tagged Templates which allow something similar but not as an actual object, the string is instead converted in the arguments for a function.

The neat thing is that does make it possible to implement this in javascript using that language feature:

function squirrel(template, ...args) {
    // This should probably be something more proper but
    // I'm not a JS programmer
    let result = new Object();
    result.template = template;
    result.data = args;
    return result;
}

let name = "unsafe data";
let example = squirrel`SELECT * FROM people WHERE name=${name}`;
console.log(example.template) // ['SELECT * FROM people WHERE name=', '']
console.log(example.data)     // ['unsafe data']

This functionality is in fact used by some Javascript libraries already to run SQL queries.