Bicameral, Not Homoiconic

121 points by todsacerdoti 7 months ago

Homoiconic has a pretty clear definition. It was coined by someone in reference the property of a specific system, many decades ago. That system stored program definitions in the same form that the programmer entered them in (either just the original character-level text, or some tokenized version of it), allowing the definitions to be recalled at runtime and redefined. He turned "same form" into "homoiconic" with the help of Greek/Latin. It's all in the Wikipedia.

Line numbered BASIC is homoiconic: you can edit any line of code and continue the program.

POSIX shell lets functions be redefined. They can be listed with the set command executed without arguments, and copy-pasted.

In Common Lisp, there is a function called ed, support for which is implementation-defined. If support is available, it is supposed to bring up an editor of some kind to allow a function definition to be edited. That is squarely a homoiconic feature.

Without ed support or anything like it, the implementation does not retain definitions in a way that can be edited; i.e. is not homoiconic. Some Lisps compile everything entered into them; you cannot edit a defun because it has been turned into machine language.

gwd 7 months ago
> Line numbered BASIC is homoiconic: you can edit any line of code and continue the program.
Oh man, anyone else remember those self-modifying BASIC programs, which would:
1. Clear the screen
2. Print a bunch of new BASIC lines on the screen, with a CONTINUE command at the end, thus:
```
    100 PRINT $NEWVAR
    110 <whatever>
    CONTINUE
```
3. Position the cursor at the top of the screen
4. Enable some weird mode where "Enter" was considered to be pressed over and over again
5. Execute the BREAK command, so that the interpreter would then read the lines just printed?
I forget the kinds of programs that used this technique, but thinking back now as a professional developer, it seems pretty wild...
- gwd 7 months ago
  
  Decided to look it up. Ah, memories:
  https://www.atariarchives.org/creativeatari/SelfModifying_Pr...
- tolciho 7 months ago
  
  You can (sort of) do this with a shell script combined with another process that seeks the (shared) file descriptor somewhere else in the file, as the shell is very line oriented. Not very well; it requires that the shell script block or sleep while the other process fiddles with the seek position.
  
  gwd 7 months ago
  
  In bash you have "eval", which I've used for some monstrosities in the past; but at least doesn't need to be visible to the user as you're doing it!
- eep_social 7 months ago
  
  Your description makes me think of quines? https://en.m.wikipedia.org/wiki/Quine_(computing)
thaumasiotes 7 months ago

> He turned "same form" into "homoiconic" with the help of Greek/Latin.
Well, sort of. Mostly that's just English.
There's no Latin at all, but hom- [same] and icon [image] are arguably Greek roots. The Latin equivalents would be eadem [same, as in "idempotent"] and imago [image, and the feminine gender of this word explains why we need "eadem" and not "idem"]. I'm not sure how you'd connect those. (And you might have issues turning imago into an adjective, since the obvious choice would be imaginary.)
However, since icon begins with a vowel, I don't think it's possible for hom- to take the epenthetic -o- that appears when you're connecting two Greek roots that don't have an obvious way to connect. If the word was constructed based on Greek principles, it would be hom(e)iconic. Treating homo- as a prefix that automatically includes a final O is a sign of English; in Greek they're separate things.
I remember that when there was a scandal around cum-ex financial instruments, a lot of people wanted to say that cum-ex was Latin for "with-without", which it isn't; it's Latin for "with-from". ("Without" in Latin is sine, as compare French sans or Spanish sin.) Cum-ex is English for "with-without", and the same kind of thing is going on with homoiconic.
- redbar0n 7 months ago
  
  > He turned "same form" into "homoiconic" with the help of Greek/Latin.
  «same form» in Greek would rather be «homomorphic». (Or in latin «eademforma», which could maybe be turned to «idoform» in english)
  «Homoiconic» could also have been named «monomorphic» (single form), similar to to «polymorphic» (many forms).
  «Homoiconic» in Greek means «same-likeness» or «self-similarity» in English.
- Y_Y 7 months ago
  
  I'd like to offer some additional amateur translation options for "homoiconic" to Latin. There's already a decent word "conformis" which has the close English counterpart "conformal", but if we're inventing new words, I'd propose "coninstar", as in "con-" meaning "together in/sharing" and "instar" being "representation/form".
  
  thaumasiotes 7 months ago
  
  Con- before vowels is co-; compare cohabit; coincide.
  (Technically, you wouldn't expect an N before vowels anyway because the root word ends in an M, so hypothetically you'd have "cominstar". But since the consonant just disappears before vowels, that's moot. [Though technically technically, disappearing when before vowels is expected of M - this is a feature of Latin pronunciation generally - and not of N.])
  
  Y_Y 7 months ago
  
  I'll plead ignorance here, and ask for clemency on the grounds that modern coinages like "conurbation" may be exempt, and also that there seem to be notable exceptions to this rule, like this example I've thrown together[0] :
  "con"+"iacio" (also "jacio") => "conicio" (also "coicio" also "conjicio")
  (Also "coinstar" is a trademark of those spare change gobblers you find after the register at Walmart.)
  [0] https://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1...
  
  thaumasiotes 7 months ago
  
  > also "jacio"
  It'd be a better example of an exception if it unambiguously started with a vowel. This is sort of the reverse of the case I pointed to above, where "habito" does start with a vowel, or rather it almost does, enough to trigger the same changes.
  https://www.etymonline.com/word/com-
  > Before vowels and aspirates, it is reduced to co-; before -g-, it is assimilated to cog- or con-; before -l-, assimilated to col-; before -r-, assimilated to cor-; before -c-, -d-, -j-, -n-, -q-, -s-, -t-, and -v-, it is assimilated to con-, which was so frequent that it often was used as the normal form.
  I and J aren't different letters in Latin, but they are different kinds of sound, if sometimes only hazily different. Same goes for U and V. By modern convention we have convention and conjecture; the hazy difference seems sufficient to explain why the Romans left us every variety of the compound, from coniicio through conicio to coicio. A naive analysis (the most I can really do) would say that coniicio comes from someone who sees iacio as starting with a consonant, coicio comes from someone who doesn't, and conicio is a reduced form of coniicio.
  
  lupire 7 months ago
  
  And Google's etymology feature says that con- and -ation are English, while -urb- is Latin.
  https://www.google.com/search?q=conurbation
  
  thaumasiotes 7 months ago
  
  That's not the most objective decision in the world. If we're describing "conurbation" in specific, why not call -urb- English too, taken from the common English word urban? Urban ultimately draws from Latin urbs, but so does con- draw from com- and -ation draw from -io(n).
  (In Latin, there are plenty of words ending in -atio(n); however, within the language this is not a single unit, it's a sequence of part of the verb stem plus two separate morphemes -a-t-io(n). The -at- marks the passive participial form of an a-stem verb; compare faction (zero-stem), inhibition (e-stem).)
ValentinA23 7 months ago

>stored program definitions in the same form that the programmer entered them in
>allowing the definitions to be recalled at runtime and redefined
>Some Lisps compile everything entered into them; you cannot edit a defun because it has been turned into machine language.
Ability to recall and redefine definitions at runtime, even when the language is compiled is orthogonal to homoiconicity. Ruby can do this (interpreted). Clojure too (compiled). To do so, they don't store the program as text, they store source locations (file://...:line:col) and read the files from the disk (or jar). In fact any programming language that does source-mapping and has eval() is inches away from being able to do this. This was the case for Ruby and was made possible by the pry REPL library [1]. And then there are tools like javassist [2] that allow you to edit compiled code to some extent using a limited form of the language.
Note that in the case of lisps, this is entirely orthogonal to macros (the source is passed as arguments to macros in the form of an AST/list rather than a pointer into a file), which is where homoiconicity shines. Storing code in the same format it is written in (strings) doesn't alleviate the headache of processing it when you want to do meta programming.
Additionally, macros allow you to do structured meta programming: macros are guaranteed to only impact code they enclose. Compare this with redefinitions that are visible to the whole code base. This is like global vs local variables: macros don't redefine code, they transform it.
[1] https://github.com/pry/pry#edit-methods
[2] https://www.javassist.org/tutorial/tutorial2.html#before
- marcosdumay 7 months ago
  
  > they store source locations (file://...:line:col) and read the files from the disk
  That's also known as "storing the program as text".
  But yeah, macros are related to another kind of homoiconicity, where the interpreted bytecode is written using the same symbols as your program data.
  You can have both of those (source = bytecode) and (bytecode = data structures) only one of them or neither.
coldtea 7 months ago

>In Common Lisp, there is a function called ed, support for which is implementation-defined. If support is available, it is supposed to bring up an editor of some kind to allow a function definition to be edited. That is squarely a homoiconic feature.
It's enough that the language stores the current source code and can reload it for that. So hot-code-swapping/reload is enough, not homoiconicity needed - which makes it not so squarely a homoiconic feature.
ggm 7 months ago

I think this comment re-enforced my sense the author wanted to drive to a destination and didn't want to divert down a road of "why LISP homoiconic is different to eval()" which I think was .. lazy.
The idea has merit. Having the REPL deal with the parse structure of data in such a way that taking parsed data and presenting it as code has a lower barrier to effective outcome on the current run state than eval() is pretty big.
I'd say eval() isn't self-modifying. You can't come out the other side of eval() with future execution state of yourself different. As I understand it, the homoiconic features of LISP means you can.
samth 7 months ago

Notably, the definition given in the Wikipedia entry referencing TRAC means that "homoiconic" is a property of an _implementation_, not of a language. This would mean that Lisp, a programming language, could not properly be described as homoiconic, since it admits multiple implementations including those that do not have this property (eg, SBCL rather clearly doesn't).

galaxyLogic 7 months ago

If I understand the gist of this article it goes like ...

1. Scanner divides source-code-string into ordered chunks each with some identifying information, what is the type and content of each chunk.

2. The next stage better NOT be a "Parser" but a "Reader" which assembles the chunks into a well-formed tree-structure thus recognizing which chunks belong togeether in the branches of such trees.

3. Parser then assigns "meaning" to the nodes and branches of the tree produced by Reader, by visiting them. "Meaning" basically means (!) what kind of calculation will be performed on some nodes of the tree.

4. It is beneficial if the programming language has primitives for accessing the output of the reader, so it can have macros that morph the reader-produced tree so it can ask the parser to do its job on such a re-morphed tree.

Did I get it close?

Joker_vD 7 months ago

> 2. The next stage better NOT be a "Parser" but a "Reader" which assembles the chunks into a well-formed tree-structure thus recognizing which chunks belong togeether in the branches of such trees.
> 3. Parser then assigns "meaning" to the nodes and branches of the tree produced by Reader, by visiting them. "Meaning" basically means (!) what kind of calculation will be performed on some nodes of the tree.
So, an "AST builder" that is followed by a "semantic pass". That's... how most of the compilers have been structured, at least conceptually, since their invention. In particularly memory-starved environments those passes were actually separate programs, launched sequentially; most famously the ancient IBM FORTRAN compilers were structured like this (they couldn't manage fit both the program being compiled and the whole compiler into the core; so they've split the compiler into 60-something pieces).
- indigo945 7 months ago
  
  It helps to read the article... the author was not introducing this as a novel concept, but elaborating on how this is a better mental model for how an interpreter or compiler works. It's not Tokenize -> Parse, it's Tokenize -> Read -> Parse.
  The article discusses this particularly with regards to the meme of LISPs being "homoiconic". The author elaborates that the difference between LISPs and other programming languages lies actually not in "homoiconicity" (a Javascript string can contain a program, and you can run `eval` on it, hence Javascript is "homoiconic"), but in what step of the parsing pipeline they let you access: with Javascript, it's before Tokenization happens; with LISPs, it's after Reading happened, before the actual Parse step.
  
  Joker_vD 7 months ago
  
  I've actually read the article, thank you; the author also argues that this "bicameral" style is what allows one to have useful tooling since it can now consume tree-like AST instead of plain strings. Unfortunately, that is not the unique advantage of "languages with bicameral syntax" although the author appears (?) to believe it to be so. The IDEs has been dealing with ASTs long before LSP has been introduced although indeed, this has only been seriously explored since the late nineties or so, I believe.
  So here is a problem with the article: the author believes that what he calls "bicamerality" is unique to LISPs, and that it also requires some S-expr/JSON/XML-like syntax. But that's not true, isn't? Java, too, has a tree-like AST which can be (very) easily produced (especially when you don't care about the semantic passes such as resolving imports and binding names mentions to their definitions, etc.), and it has decidedly non-LISP-like syntax.
  And no, I also don't believe the author actually cares all that much about the reader/parser/eval being available inside the language itself: in fact, the article is structured in a way that mildly argues against having this requirement for a language to be said to have "bicameral syntax".
  
  indigo945 7 months ago
  
  > So here is a problem with the article: the author > believes that what he calls "bicamerality" is unique to > LISPs, and that it also requires some S-expr/JSON/XML- > like syntax.
  I didn't find that assumption anywhere in the article. My reading is that all interpreters and compilers, for any language, are built to implement two non-intersecting sets of requirements, namely to "read" the language (build an AST) and to "parse" the language (check if the AST is semantically meaningful). Therefore, all language implementations require Tokenization, Reading and Parsing steps, but not all interpreters and compilers are structured in a way that cleanly separates the latter two of these three sets of concerns (or "chambers"), and (therefore) not all languages give the programmer access to the results of the intermediate steps. Java obviously has an AST, but a Java program, unlike a LISP program, can't use macros to modify its own AST. The programmer has no access to what the compiler "read" and can't modify it.
  
  Joker_vD 7 months ago
  
  Mmmm. This article is like one of those duck-rabbit pictures, isn't it? With a slight mental effort, you can read it one way, or another way.
  So, here are some excerpts:
  These advantages ("It’s a lot easier to support matching, indentation, coloring, and so on", and "tools hit the trifecta of: correct, useful, and relatively easy") are offset by one drawback: some people just don’t like them. It feels constraining to some to always write programs in terms of trees, rather than more free-form syntax. Still, what people are willing to embrace for writing data seems to irk them when writing programs, leading to the long-standing hatred for Lispy syntaxes. But, you argue, “Now I have a bicameral syntax! Nobody will want to program in it!” And that may be true. But I want you to consider the following perspective. [...] a bicameral syntax that is a very nice target for programs that need to generate programs in your language. This is no longer a new idea, so you don’t have to feel radical: formats like SMT-LIB and WebAssembly text format are s-expressions for a reason.
  The last three paragraphs play upon each other: people hate Lispy syntax; people dislike bicameral syntaxes; S-expressions are bicameral syntax.
  And notice that nothing in those excerpts and nothing in the text surrounding them (sections 4 to 7) really refers to the ability to access the program's syntax from inside the program itself. In fact, the sections 1 to 2 argue that such an ability is not really all that important and is not what makes LISPs LISPs. Then what does? The article goes on about "bicamerality" (explicit distinction between the reader and the parser) but doesn't ever mention again the ability of the program to modify its own syntax or eval.
  I can't help but to make the tacit deduction that those never-again-mentioned things are not part of "bicamerality". You, perhaps, instead take those things as an implicit, never-going-out-of-sight context that is always implied to be important, so those things are never mentioned again because already enough has been said about them but they still are crucial part of "bicamerality".
  It's a duck-reabbit article. We both perceive it very differently; perhaps in reality it's just an amalgam of ideas that, when mixed together in writing, lack the coherent meaning?
  
  indigo945 7 months ago
  
  Yes, I understand your meaning now (and no longer understand the article's, which indeed seems to quack like a rabbit).
- skrishnamurthi 7 months ago
  
  No, this isn't what the article says. I have not bothered saying anything about the "semantic pass", which is downstream from getting an AST. What the article talks about is not what "ancient IBM FORTRAN compilers" did.
- aidenn0 7 months ago
  
  The output of the Lisp reader is not an AST. It is completely unaware of many syntactical rules of the language, and is absent of any context. The equivalent in a C like language would be a stage that quite willingly generates a tree for the following:
  void foo(int int) { else { x = 3; } }
  Which most compilers will never construct a tree for despite it following some unifying rules for the structure of code in a C-like language (braces and parentheses are balanced, statement has a semicolon after it, &c.).
skrishnamurthi 7 months ago

Author here. Yes, very close. #4 is not a bit strong: there is value to doing this even if you don't have macros, for instance, because of other benefits (e.g., decent support from editors). But of course it also makes macros relatively easy and very powerful.
- galaxyLogic 7 months ago
  
  And what about homoiconity in Lisp vs. other lanaguages? In Lisp it means that programs are "lists" and so is "data". Programs in lisp are more than strings, like in most other languages, they are "nested lists". Lisps let us write prograssm as lists, adn store data as lists. JavaScript only allows us to write programs as (structureless) strings.
  Of course that is well-known but I think it is a big deal, that you have such homo-iconicity in Lisp but no in most other languages. Prolog maybe?

codeflo 7 months ago

It seems that the Rust macro system is inspired by a similar idea: In the first step (the "reader" in this article's terminology), the source is converted into something called a token tree.

A token tree is not a full parse tree with resolved operator precedence and whatnot. It only has child nodes for bracket pairs ((), [] and {}) and their contents, in part to determine where the macro call ends. Otherwise, it's a flat list of tokens that the macro (what this article would call the "parser") can interpret in any way it wants.

wruza 7 months ago

Sounds like Rust did to macros what I wanted long ago in C (and everyone frowned upon me for that). Lisps and sexprs aren’t exclusive to this. You can “load” the code into a var and modify it through regular data processing and then feed it to an executor. You just need language designers to implement that. This entire lisp homoiconicity religion bugged me since forever. It’s just a read-eval part of a loop which never had a requirement for everything to be represented as a Cons.
samth 7 months ago

Indeed, the Rust macro system was designed by people who had worked on the Racket macro system previously.
moomin 7 months ago

I think you’re right. What LISP really brought to the party was a very simple token structure. This made it pretty easy to express manipulations of that structure and hence create whatever macros you like.
This is instantly useful to the compiler writer because most of “LISP” is built upon more basic primitives. The disadvantage is the Jeff Goldblum “You scientists” meme.

kibwen 7 months ago

I liked the first half of the article, but I'm not sure I got anything from the second half. As the author notes, in order to be useful a definition must exclude something, and the "bicameral" distinction doesn't seem to exclude anything; even Python eventually gets parsed into a tree. Conceptually splitting out "parsing" into "tree validation" and "syntax validation" is slightly interesting (although isn't this now a tricameral system?), but in practice it just seems like a simple aid to constructing DSLs.

> These advantages are offset by one drawback: some people just don’t like them. It feels constraining to some to always write programs in terms of trees, rather than more free-form syntax.

I think this is misdiagnosing why many people are averse to Lisp. It's not that I don't like writing trees; I love trees for representing data. But I don't think that thinking of code as data is as intuitive or useful as Lisp users want me to think it is, despite how obviously powerful the notion is.

Y_Y 7 months ago

I also struggled with the "bicameral" definition. The best I could come up with is that because e.g. Scheme represents code and and data in the same way (isn't there a word for this?) it's possible to represent and manipulate (semantically) invalid code. This is because the semantics are done in the other "chamber". The example given was `(lambda 1)` which is a perfectly good sexp, but will error if you eval it.
This could be contrasted with C where code (maybe more precisely program logic) is opaque (modulo preprocessor) and can only be represented by function pointers (unless you're doing shellcode). Here the chamber that does the parsing from text (if we don't look inside GCC) also does semantic "checking" and so while valid functions can be represented within C (via the memory contents at the function pointer), the unchecked AST or some partial program is not represented.
I've tried not to give too many parentheticals above, but I'm not sure the concept holds water if you play tricks. Any Turing machine can represent any program, presumably in a way that admits cutting it up into atoms and rearranging to an arbitrary (potentially invalid) form. I'd be surprised if this hasn't been discussed in more detail somewhere in the literature.
This
chubot 7 months ago

It excludes languages that build a single AST directly from tokens. I am pretty sure Clang is like this, and probably v8. (They don't have structured macros, so it's not observable by users.)
As opposed to building first an untyped CST (concrete syntax tree), and then transforming that into a typed AST.
CPython does exactly this, but it has no macro stage either, so it's not exposed to users. (Python/ast.c is the CST -> AST transformation. It transforms an untyped tree to a typed tree.)
So the key reason it matters is that it's a place to insert the macro stage.
---
I agree that the word "bicameral" is confusing people, but it basically means "reader --> parser" as opposed to just "parser".
The analogies in the article are very clear to me -- in this world, JSON and XML parsers are "readers", but they are NOT "parsers"! (and yes that probably confuses many people, some new words could be necessary)
The JSON Schema or XML Schema would be closer to the parser -- it determines whether you have a "for loop" or "if statement", or an "employee" and "job title", etc.
Another clarifying comment - https://lobste.rs/s/ici6ek/bicameral_not_homoiconic#c_bmx0vf
- chubot 7 months ago
  
  I'll also argue that the ideas in this post absolutely matter in practice.
  For example, Github Actions uses YAML as its Reader / S-expression / CST layer.
  And then it has a separate "parser", for say "if" nodes, and then another parser for the string value of those "if" nodes.
  https://docs.github.com/en/actions/writing-workflows/workflo...
  if: ${{ ! startsWith(github.ref, 'refs/tags/') }} if: github.repository == 'octo-org/octo-repo-prod'
  This fact is poorly exposed to users:
  You must always use the ${{ }} expression syntax or escape with '', "", or () when the expression starts with !, since ! is reserved notation in YAML format.
  So I feel that they could have done a better job with language design by taking some lessons from the past.
  Gitlab has the same kind of hacky language on top of YAML as far as I remember

Karellen 7 months ago

I thought part of the beauty of homoiconicity, which doesn't seem to be mentioned here, is not just that it's natural to interpret tokens as code, but that it's possible to interpret the code of the program that's currently running as tokens, and manipulate them as you would any other data in the program?

tines 7 months ago

Yeah, exactly. The whole point is macros and metaprogramming!

zzo38computer 7 months ago

It is not only Lisp. PostScript is also homoiconic; tokens have values like any other values (and procedures are just executable arrays (executing an array involves executing each element of that array in sequence), which can be manipulated like any other arrays). The {} block in PostScript is a single token that contains other tokens; the value of the token is an executable array whose elements are the values of the tokens that it contains.

Strings don't make it "homoiconic" in the usual way, I think; so, JavaScript does not count.

ashton314 7 months ago

You might be interested in what the author has to say about weak vs strong homoiconicity then…
- lmm 7 months ago
  
  The author doesn't go far enough; eval operating on strings is still very weak (unless your language is something like BrainFuck that really doesn't have a more structured representation available). The point is exposing the structured form that the language implementation runs as datastructures within the language - and not as some second-class reflection API, but directly as they are. You want to be able to capture something like an AST representation (not necessarily literally an AST), manipulate it, and then run it.
  I think "Bicameral" isn't really a great way to capture this, because there are often multiple layers of parsing/lexing/compilation/interpretation and you might want to hook in at multiple of them (e.g. in lisps you may have both reader macros that operate at a low-level stage and higher-level macros that operate after parsing). And of course it's a spectrum, but essentially the more the language exposes itself as a set of compositional libraries rather than just being a monolithic service.
  
  astrobe_ 7 months ago
  
  On a side note, I was expecting "bicameral" as in [1].
  [1] https://en.wikipedia.org/wiki/Bicameral_mentality

clausecker 7 months ago

Another language with this property is FORTH, which has many surprising similarities with LISP. I like to call it “LISP, but the other way round.” It usues RPN instead of PN, stacks/arrays instead of lists, and is procedural instead of functional.

obijohn 7 months ago

I was thinking about this reading the article. In fact, I’ve recently seen Lisp implemented in Forth[0] and Forth implemented in Lisp[1]. In both cases, the implementations are decently complete and surprisingly efficient (i.e. not “toy” interpreters).
I think this is due to a significant property shared by both languages: the parser’s primary role is distinguishing between numbers and anything that’s not a number. No need to worry about operator precedence, keywords, or building complex syntax trees. Tokens are numbers and “not-numbers”, and that’s it.
In Forth, a “not-number” is a Word, and in Lisp a Symbol, both of which can be variables or functions. The only difference between the two is that Forth checks for Word definitions first, and Lisp checks for numbers first. If you wanted to redefine 4 to 5 for some reason, Forth’s got your back, but Lisp will save you ;).
A Forth Dictionary is very similar to a Lisp Environment; they both serve as a lookup table for definitions, and they both allow the programmer (or program!) to redefine words/symbols.
They also both have REPLs to facilitate a much more dynamic development cycle than other REPLs in most languages.
I could go on, but on a fundamental level the similarities are striking (at least to me, anyway). It’s an interesting rabbit hole to explore, with lots of “drink me” bottles laying around. It’s fun here.
[0] https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/co...
[1] https://github.com/gmpalter/cl-forth

ValentinA23 7 months ago

As a long time lisper I don't think homoiconicity is that relevant, at least when comparing lisps with other programming language. What I miss when writing C++ is the incremental compilation model of lisps, and in particular the ability to have compile time data drive code generation.

Homoiconicity is more useful when comparing lisps IMO, and pondering on how they could be improved. To me, homoiconicity is a constant struggle and should be appreciated in degrees because homoiconicity is about immediacy.

A lisp that doesn't allow you to embed data along with code, JSON/Javascript style, is less homoiconic than a language that does, and it's more about what the core library allows than the language itself. For instance I'd say Clojure is more homoiconic than Scheme because it allows you to embed hashmaps in your code natively, whereas in scheme you only have `(make-hash-table)` without the corresponding reader macro. Similarly, a lisp without syntax quote would be less homoiconic than one that has it.

This is why I say it's about immediacy. When you don't have to deal with hashmaps, or templated s-exprs in terms of the process that builds them, the mediation layer disappears.

Things I'd like to be more immediate in Clojure:

- keeping track of whitespaces within s-exprs. Useful when you want to print code as it is indented in the source file. There's a library for that (rewrite-clj), but it isn't integrated in the reader+compiler pipeline, so it's a bit of an headache as you have to read code from files, which implies, bridging the gap between the compilation pipeline and this library on your own.

- accessing semantic info within macros. Which functions use which variables. Which variables are global vs local (in particular when lexically shadowed), which variables are closed over by which lambdas, etc. To do this you have to use clojure.core.analyzer, which is very complex and poorly documented: not immediate enough.

taeric 7 months ago

https://taeric.github.io/CodeAsData.html was my take at exploring parts of this idea. Being able to manipulate code with the same constructs as you generally write the code is pretty cool.

djaouen 7 months ago

How one could have spent any time at all studying Lisp starting in the 80s (!) and not understand what the word "homoiconic" means is baffling to me!

kazinator 7 months ago

The term homoiconic does not come from the Lisp culture. I think it might have been in the 1990s that it came into use as a way of describing a property of languages in the Lisp family, using a different definition from the original homoiconic, and it might have been introduced by outsiders.
Using Google Books search, we can identify that a 1996 book called Advanced Programming Language Design by Raphael A. Finkel uses the word in this new way, claiming that TCL and Lisp are homoiconic.
The word returns to flatlining towards the end of the 1990s, and then surges after 2000.
- mikelevins 7 months ago
  
  I feel like use of the term "homoiconic" is misguided. It seems like an attempt to turn an incidental attribute of some Lisps into a sort of Platonic ideal. I don't think that's helpful.
  I think the property being discussed is more understandable if you just describe it simply: in some Lisps (notably Common Lisp and its direct ancestors) source code is not made of text strings; it's made of symbolic expressions consisting of cons cells and atoms.
  The text that you see in "foo.lisp" isn't Lisp source code; it's a serialization of Lisp source code. You could serialize it differently to get a different text file, but the reader would turn it into the same source code. The actual source code is distinct from any specific text serialization of it.
  We write programs in the form of text serialization because the reader will convert it for us, and because it's easier and more rewarding to write good and comfortable text editors than to write good and comfortable s-expression editors.
  There are of course text editors and addons that attempt to make text editing act more like s-expression editing, but I don't know of many actual s-expression editors. The canonical one, I suppose, is Interlisp's DEdit, which operates on actual s-expression data structures in memory.
  From this point of view, what people mean by "homoiconic" is just that source code is all made of convenient arrangements of standard data structures defined by the language that can be conveniently operated on by standard functions defined by the language.
  Or, to put it another way, "homoiconic" basically means "convenient", and "non-homoiconic" means "inconvenient".
  That's all there is to it, really, but it has far-reaching consequences. In a Lisp designed this way, basic manipulation of source code is trivially easy to do with operations that are all provided for you in advance by the language itself. That makes all sorts of code-processing tools exceptionally easy to write.
  That's not true in most languages. Take C, for example: sure, a C compiler parses text and turns it into an abstract syntax tree before processing it further in order to eventually yield executable machine code. Is all of that machinery part of the language definition? Can you count on those APIs and data structures to be exposed and documented by any arbitrary C compiler?
  No.
  In that sense, any programming language could be made "homoiconic" if enough people wanted it. They manifestly don't, because most languages aren't.
  But some programmers prefer working with a language implementation that makes it so very easy to manipulate code. So that's what we use.
  It's not some Platonic ideal of language design, but it doesn't need to be. It's a pragmatic design decision made by certain implementors in a certain lineage, and it has consequences that a certain fraction of programmers find congenial. Congenial enough that it makes some of us prefer to work with languages and implementations that work that way.
  
  082349872349872 7 months ago
  
  Nice description; it makes me wonder if there are any languages in which code and data have different serialisations, but these are isomorphic in the sense that code and data can be turned into each other losslessly? (we ought to be able to round trip between the two: code->data->code and data->code->data ought to produce equivalent structures to what they started from)
  
  kazinator 7 months ago
  
  What would prevent us from being used the "wrong" way around: code being written with a data notation and vice versa. When the system prints data it could choose one or the other, based on some educated guess as to whether it is cold or data.
  
  082349872349872 7 months ago
  
  nothing — I was assuming the conversions would need to be explicit.
  Do you have a candidate in mind?
samth 7 months ago

Perhaps in that case you should supply the definition that the author ought to have known.

wduquette 7 months ago

TCL is exactly "strongly homoiconic" in the OP's sense; one does metaprogramming by creating and evaluating strings in some desired context. It's an advanced technique, but works quite well in practice. Many years ago I wrote a complete object system for TCL, SNIT, that executes a type definition script (using TCL syntax); this produces a new TCL script that actually implements the type; and then executes this new script. It's been used in commercial products.

TCL is not "bicameral" in the OP's sense, but that doesn't seem to stop anyone from doing metaprogramming.

cmacleod4 7 months ago

I would argue that Tcl is almost "bicameral" in the OP's sense. The application of the "dodekalogue" rules - https://wiki.tcl-lang.org/page/Dodekalogue - largely corresponds to the "Reader". It goes further in that it also specifies substitution and evaluation rules, but it is similar in that it only applies a few basic structural rules, and knows nothing about the specifics of individual commands.
Tcl's equivalent of the "Parser" is built-in to each command, which decides whether to interpret its arguments as data, code, option flags, etc..
I suspect this division of responsibilities is very helpful for metaprogramming techniques.
- wduquette 7 months ago
  
  This is true. In Lisp terms every TCL command is effectively a special form, and can do whatever it pleases with its arguments.
  On the other hand, TCL provides much less support for building up the string to be evaluated if it's more complex than a single command; and even for a single command it can be tricky.

peanut-walrus 7 months ago

> Data are data, but programs—entities that we can run—seem to be a separate thing.

Is this a view some people actually hold? Would be interesting to see some argumentation why someone would think this is the case.

zokier 7 months ago

Harvard architecture is a thing. If you can not access or manipulate the program in any way then its not really meaningful to call it data even if it is stored as bytes somewhere.
spiritplumber 7 months ago

The Story Of Mel

aidenn0 7 months ago

All I want for Christmas is the ability to redefine the CL scanner.

Seriously; if we could redefine the CL scanner, then e.g. package-local-nicknames could be a library instead of having to have been reimplemented in every single CL implementation.

acka 7 months ago

"We started with Lisp, so let’s go back there. What is Lisp? Lisp is a feeling, an emotion, a sentiment; Lisp is a vibe; Lisp is the dew on morning grass, it’s the scent of pine wafting on a breeze, it’s the sound of a cricket ball on a bat, it’s the…oh, wait, where was I. Sorry."

Leaving this here, with the deepest respect.

Eternal Flame - Julia Ecklar https://www.youtube.com/watch?v=u-7qFAuFGao

svilen_dobrev 7 months ago

i have been using python as syntax "carrier" for many Domain languages/DSL. Re-purposing what constructs like class:.., with..: certain func-calls, etc. mean within that. Works wonders.. though one has to be careful as it may not look like python at all :/

nycticorax 7 months ago

Thoroughly enjoyed this!

Minor typo: "an syntax" in the second-to-last paragraph should be "a syntax".