-
-
Notifications
You must be signed in to change notification settings - Fork 166
Language Design Principles
andychu edited this page Nov 22, 2021
·
95 revisions
- The Common Subset Principle -- OSH is very compatible, but it might not run every last bash script. However, in those cases, you should be able make small modifications to make your script run under both OSH and bash (sometimes these changes may improve clarity). In general, OSH shouldn't introduce incompatible semantics for the same syntax.
- Example: The meaning of
()
indeclare -A assoc=()
is changed to obey the common subset principle. It means empty assoc array rather than empty indexed array because the context is clear, and because in bashdeclare -A dict
means something different.
- Example: The meaning of
-
Static Parsing
- Dynamic Parsing (parsing at runtime) Confuses Code and Data.
- Consider Interactions Between Language Features (bash doesn't do this, e.g. extended globs)
- Minimize the combined OSH+Oil language size to the degree possible.
- Often we do have duplicate functionality in OSH and Oil (like arithmetic), but it has to be significantly better.
- This partly explains why we keep OSH string literals in Oil, and why bash
declare -i/-a/-A
isn't supported in OSH - It also explains some constraints on the syntax, i.e. that we only have a
ShCommand
lexer mode, and noOilCommand
lexer mode
- Oil should be familiar to Python and JavaScript users. Common features like assignment should behave similarly.
- This principle has "leaked" into OSH when omitting
declare -i
. Also to some degree our reluctance to implement$a == $ {a[0]} is shaped by this.
- This principle has "leaked" into OSH when omitting
- Every Feature Should Have Predictable, Linear Performance (extended globs break this rule with backtracking, so they're in OSH but not Oil)
-
Syntax and Semantics Should Correspond
- The same semantics should use the same syntax
- Different semantics should use different syntax
- e.g. discussion in The Five Meanings of #
- Shell has "topped out" in terms of its syntax. It's too elaborate and unfamiliar. We won't add more syntax that looks like
${x@P)
,${x^^}
,cat <<< 'hi'
, orexec 2>&-
. - The common behavior should be the default behavior. The short thing should be the right thing.
- For example, simple word evaluation makes it so that you can use
$var
instead of"$var"
. That's almost always what you want. -
read -r
should have been the default in bash -- i.e. it inhibits backslash processing, which most people didn't intend withread
- For example, simple word evaluation makes it so that you can use
- There Should Only Be One Kind of Expression
- Shell has 3 to 4 recursive expression languages: arith, bool, word. And bash has regexes.
- In contrast, Oil has just one expression language. Note that eggexes are "first class".
- Exception: Globs are still a separate expression language. (But in Oil, they're unchanged and compatible. And they don't have recursive structure, unlike extended globs.)
-
Avoid inventing new syntax. Most of Oil should look familiar to programmers and shell users.
-
@
has precedent in Perl, PowerShell, etc. - the expression syntax comes from Python, JavaScript, etc.
- However, a corollary of the principle above is: If Oil has completely new semantics, then inventing a new syntax is justified.
- See Oil Language Influences
-
-
Minimize the use of global options (
shopt
)- Oil started out with many such options, but I eliminated them over time because it got unwieldy to explain and document.
- There are still many of them and they should be used sparingly. But note that the
strict_
ones don't really have any cost, because they abort your program on disallowed behavior. They don't silently change the semantics. - Rationale: Global state makes code harder to read. It's a "hidden mode".
- They should mostly be hidden under groups like
oil:all
- Counterexample:
simple_word_eval
is probably the most important one that silently changes behavior, and I think it's justified in that case.
- Arrays are first class
- In particular, no silent splitting and joining, as happens with unquoted substitutions,
$@
,echo
andeval
, etc.
- In particular, no silent splitting and joining, as happens with unquoted substitutions,
- You should be able to express arbitrary byte strings. Everything should be "8-bit clean" by default.
- UTF-8 is an optional (but common) layer on top. (Ditto for other encodings.)
- You should be able to use existing Unix tools with new protocols. (e.g.
grep
still works with lines of QSN. In contrast, the\0
delimited format offind -print0
is doesn't work withgrep
.)
(referring to: CSTR Proposal and TSV2 Proposal. And the deferred Shellac Protocol Proposal, and Coprocess Protocol Proposal)