```I am writing a DSL that allows to parse HTML in kotlin ex kotlinlang #language-proposals

```I am writing a DSL that allows to parse HTML in...

Christian Dräger

04/22/2021, 7:09 PM

Copy code

I am writing a DSL that allows to parse HTML in kotlin.
(example: <https://github.com/skrapeit/skrape.it#parse-and-verify-html-from-string>)    
since *  is common to use in this context (e.g. when it comes to css selectors) it would feel natural if the DSL could do as well. Therefore it would be handy if it would be possible to overload a unaryTimes operator.
since kotlin already supports unaryPlus and unaryMinus overload, why there isn't a unaryTimes?
here is an example what it looks like right now:

Copy code

@Test
fun `can parse html from String`() {
    htmlDocument("""
        <html>
            <body>
                <h1>welcome</h1>
                <div>
                    <p>first p-element</p>
                    <p class="foo">some p-element</p>
                    <p class="foo">last p-element</p>
                </div>
            </body>
        </html>""") {
        h1 {
            findFirst {
                text toBe "welcome"
            }
            p {
                withClass = "foo"
                findSecond {
                    text toBe "some p-element"
                    className  toBe "foo"
                }
            }
            p {
                findAll {
                    text toContain "p-element"
                }
                findLast {
                    text toBe "last p-element"
                }
            }
        }
    }
}

Copy code

i think this should be possible right now by making * an infix operator?

Copy code

@Test
fun `can parse html from String`() {
    htmlDocument("""
    <html>
        <body>
            <h1>welcome</h1>
            <div>
                <p>first p-element</p>
                <p class="foo">some p-element</p>
                <p class="foo">last p-element</p>
            </div>
        </body>
    </html>""") {
        h1 {
            find 0 {
            text toBe "welcome"
        }
            p {
                withClass = "foo"
                find 1 {
                    text toBe "some p-element"
                    className  toBe "foo"
                }
            }
            p {
                find * {
                    text toContain "p-element"
                }
                findLast {
                    text toBe "last p-element"
                }
            }
        }
    }
}

Copy code

what would be possible due to the proposal of having unaryTimes operator to overload:

Copy code

@Test
fun `can parse html from String`() {
    htmlDocument("""
    <html>
        <body>
            <h1>welcome</h1>
            <div>
                <p>first p-element</p>
                <p class="foo">some p-element</p>
                <p class="foo">last p-element</p>
            </div>
        </body>
    </html>""") {
        h1 {
            0 { // already possible by invoke operator
                text toBe "welcome"
            }
            p {
                withClass = "foo"
                1 {
                    text toBe "some p-element"
                    className  toBe "foo"
                }
            }
            p {
                * { // would need unaryTimes
                text toContain "p-element"
            }
                findLast {
                    text toBe "last p-element"
                }
            }
        }
    }
}

04/22/2021, 7:22 PM

What in the standard library would use unaryTimes? What operation does unary times even represent? How does this lexically interact with the array spread operator? Right now it seems like you just want a particular DSL sigil for your specific use case. As a counter proposal, I would actually rather see unaryPlus and unaryMinus removed from the language. unaryPlus is useless. The stdlib implements it on scalar number types for symmetry with unaryMinus as a no-op function. unaryMinus seems like an attempt to be clever about lexing and general support for negation but introduces warts into the language such that you cannot represent MIN_VALUE of scalar number types as a literal. You can implement a regular

.inv()

function for the same purpose on custom types. Their primary use seems to be DSLs, and I would rather see some kind of DSL-specific escape hatch for opening up the use of wider sigils than adding one more (seemingly) mathematically nonsensical operation solely for DSL usage.

Christian Dräger

04/22/2021, 7:26 PM

Ok, understand. All your points Sound reasonable to me and agree this would only be useful for DSLs and doesn't make sense mathematically Can you explain what you mean with DSL-specific escape hatch?

04/22/2021, 7:28 PM

Well right now there's

@DslMarker

which modifies the language behavior to remove your ability to reference enclosing `this`'s in the function hierarchy. Perhaps there could be something with further signaled that you were stepping outside the bounds of the normal language so that more characters could be used like your

. Basically something between what we have today and macros, without going full macro.

👀 1

nanodeath

04/22/2021, 8:09 PM

only place I've seen unaryPlus leveraged is in the kotlinx.html DSL. it's nice and short, but

+"foo"

to append a string does seem bizarre to me. what would replace it?

edrd

04/22/2021, 10:34 PM

punkt also uses `unaryPlus`: https://github.com/pjagielski/punkt

edrd

04/22/2021, 10:36 PM

I think there are some pretty legitimate DSL uses for it.

unaryMinus

, on the other hand, never seen (actually I didn't even know it existed)

04/22/2021, 10:37 PM

Nothing would replace it. That's the point. It's a language feature that's too specific for its own good. What would replace it is a general purpose mechanism for DSLs to unlock additional sigils without needlessly complicating the whole language.

edrd

04/22/2021, 10:42 PM

Or maybe people that want the level of flexibility OP suggested can use FIR compiler plugins, when they're available

nanodeath

04/22/2021, 10:53 PM

unaryMinus

is probably useful for negating

BigDecimal

04/23/2021, 12:10 AM

We use it on a lot of number-based types. A regular function works just as well, like Duration.negate()

04/23/2021, 12:10 AM

Plus you can't create literals of those custom types making it somewhat hamstrung today

vladimirsitnikov

11/24/2021, 7:49 AM

@Christian Dräger, the assertion DSL you propose reads great, however, there's an issue with keeping the assertion code up to date. For instance, suppose there's an application feature that adds an extra html tag everywhere. How do you update all the assertions? Suppose you had a lot of

withClass = "foo" and "bar" and "fizz" and "buzz"

, and UI designers removed class

fizz

to optimize CSS. How do you keep the tests up to date? Have you explored something like https://approvaltests.com/ that is tailored to Kotlin? Approvaltests (and other snapshot-like tools) allow automatic re-generation of the "expected" outcomes, so I incline that fancy

withClass

assertions are not that suitable for 7+ line HTML outputs :-/ It would be great if the failing test could read the original test code, and update it with the actual outputs (see https://github.com/kotest/kotest/issues/395 ).

Christian Dräger

11/25/2021, 1:35 PM

@vladimirsitnikov very interesting. didn’t know about this. it would be cool if skrapeit assertions would support this. good point

12 Views

Open in Slack

Previous Next