e11. types, sets, and functions

the foundations of a new semantics

Sep 05, 2025

To summarize, our intuitive semantics consists of three types of things: properties, events, and entities, and says that events and properties need to be combined with entities, and vice versa, to make a propositional statement.

We also saw in episode five that events and entities exist in both type and token formats. One huge bonus of event semantics as opposed to Montague grammar was that it introduced the event variable, a way to represent event tokens. Extending that, I wrote that all tokens are variables, and that all types are functions. But this idea needs a bit more nuance. Today we’re going to bring in some tools from computer science and discrete math to describe our conceptual intuitions more concretely.

We’re going to forget about properties for the time being because they are weird, but they will make a big comeback a few episodes from now!

type theory

All functions can be characterized by what kinds of things they take as inputs, and what kinds of things they produce as outputs. These ‘kinds’ are called types, and semanticists notate them like so:

λx.doctor(x): <n,t>

This means that the function doctor() takes an argument of type n (an entity) as input, and produces something of type t (a truth value) as output. The type of the function itself is therefore defined as <n,t>.

How does doctor() know whether to output true or false based on its input? We saw before that doctor() returns true if its input is a doctor, and false otherwise, but how do we represent this formally?

functions in programming languages

In programming languages, functions also have specific types they can take as inputs, and types they can take as outputs. We can abstract function definitions in lambda calculus and programming languages as:

λinput.output: <inputType,outputType>

function (inputType -> outputType): 
  return output;

For example, here are three different ways of writing the same function:

y = 2x

λx.2x: <int, int>

doubler (int -> int):
  return 2 * int;

This function basically says ‘take as input an integer x, and return as output the integer 2x.’

We could also write a function that takes as input a list of characters, and returns true if ‘s’ is in that list:

checkerForS (char[] -> t):
  if 's' is in char[], return true;
  else, return false;

Note that char[] here is a type meaning ‘a list of characters.’ More generally, putting [] after a type means ‘a list of things of that type.’ Lists are ordered and can contain duplicates—there could be more than one ‘s’ in that list. In contrast, sets are unordered, and cannot contain duplicates, and are represented with these kind of brackets instead: {}.

set theory

We can represent semantic functions like doctor() as functions which check if a given entity token is in a set of entity tokens:1

doctor (n -> t):
  if n is in doctor{}, return true;
  else, return false;

We can in fact represent all entity-types in this way:

entityType (n -> t):
  if n ∈ entityType{}, return true;
  else, return false;

This is our way of describing functions in terms of what actually happen in our brains computationally, and a key insight into how we represents types as opposed to tokens. Tokens are specific things, and types are groups of things. Types are sets. Events work the same way as entities here:

eventType (v -> t):
  if v ∈ eventType{}, return true;
  else, return false;

Tokens are variables, types are sets, and functions (so far) are the things that check if certain variables are in certain sets.

IMPORTANT NOTATION CHANGE:
We are now going to stop using the word ‘type’ to mean the opposite of ‘token’. That’s way too confusing. Instead we are going to use the word ‘set’ for this, and use the word ‘type’ to describe any kind of set or token. n{}, n, v{}, and v are all types, but only n{} and v{} are sets (n and v are tokens instead).

our new types

From now on, we’ll represent types (again, as in ‘the type of thing functions can take’) as the following:

n: specific entity
v: specific event
t: truth value (true or false)
n{}: set of entities (like “dog” or “doctor”)
v{}: set of events (like “kill” or “drink”)

But how is this all useful? Why does this solve the problems with Montague grammar and neo-Davidsonian event semantics?

See you next time where we'll talk once again about the lexicon.

note that this is *a* theory. not the most well researched theory or the most accurate theory. just the one david came up with in his backyard for these reasons. david might sound confident but he is not.

Here is some clarification on why we do it this way and not in other, less intuitive ways.

I could imagine three possible sort-of intuitive ways that our brain could check whether a given entity is a doctor or not.

Wherever our brain represents the concept doctor, there is a set of properties that define what it is to be a doctor. To check if an entity is a doctor or not, we check if the entity has each of these properties. If it has all of them (or maybe, most of them), it’s a doctor! Otherwise, it’s not.
doctor: {whiteCoat, hasMD, ...}
Wherever our brain represents the concept doctor, there is a set of entities that are doctors. To check if an entity is a doctor or not, we search through this set for the given entity. If it’s there, it’s a doctor! Otherwise, it’s not.
doctor: {Joel, Deven, Austin, Nathan, ...}
Wherever our brain represents each entity token, there is a set of entity types that this token belongs to. To check if an entity is a doctor or not, we search through this set. If we find a pointer to the entity-type ‘doctor’, then it’s a doctor! Otherwise, it’s not.
Joel: {doctor, grandpa, human, ...}

The first thing to note is that (7) is computationally much more expensive than the other two, and we’ll rule it out for this reason. That’s because for each property, we’d then have to go and figure out whether the entity has that property, and so on. It’s just punting the question from “is this entity a doctor” to “does this entity have an MD.” Either way if you follow the trail long enough you’ll eventually have to end up with a representation like the one in (8) or (9).

Choosing between (8) and (9) is harder. To start with, imagine that we have a table where each row is a entity token, and each column is an entity type:

         doctor grandpa human teacher ...
Joel:   {1      1       1     0       ...}
Marion: {0      0       1     1       ...}
...

Each cell is filled with true or false, depending on whether the row’s entity is of the column’s entity type. (8) is representing the set of rows that are true for any given column, and (9) is representing the set of columns that are true for any given row.

So my brain checks if Joel is a doctor by finding the cell that corresponds to the row Joel and the column doctor in our table. To do this we can either use the set doctor{}, which contains all the entity tokens (rows) for which there’s a ‘1’ in our table, or we use the set Joel{}, which contains all the entity types (columns) for which there’s a ‘1’ in our table. The first method is (8); the second is (9).

Thus, we could theoretically define a function to check whether a given entity is a doctor in one of two ways:

check if Joel is in doctor{} (8):

doctor (n -> t):
  if n ∈ doctor{}, return true;
  else, return false;

check if doctor{} is in Joel (9):

doctor (n -> t):
  if *doctor{} ∈ n{}, return true;
  else, return false;

Unfortunately, although these processes are equivalent in what they do, (11) doesn’t work as a function definition, because our given input is an entity token, not a set of entity types. We can’t input n and then use n{}. Joel isn’t a set—it’s a token.

The TLDR of this footnote is: entity types are sets of entity tokens.

backyard biolinguistics

Discussion about this post