Giuseppe Castiglione, aka Caesar, is a research developer and also our in-house encyclopedia and tango instructor, which probably sounds like a strange combination but will all make perfect sense at the end of this interview. 

Besides distracting everyone with discourse-level explanations about literally everything that has ever come up in human-on-human conversation, Giuseppe’s main projects are understanding causality in the news, a problem which corresponds to understanding the structure of international events and identifying how they feed into one another, and he can’t really talk about the other project because it’s top secret, a professional restriction that has nearly killed him on four separate occasions from the unnatural physical effort required by him to not talk about something interesting.

Your area of academic study was physics. When did you know you wanted to be a scientist?

As far as I can pin back the origin of this stuff, it’s dinosaurs – but not a passive interest in dinosaurs. My earliest memories are of dinosaurs. That’s all I wanted to talk about. Then when I was around 5, I remember realizing, “You know, I’m never going to see one… unless I build a time machine.”

So… did you?

I have time machine blueprints from when I was six. They’re really shitty. They’re the perfect intersection between someone who thinks they know things and someone who believes the science in the first Superman movie is accurate. But they made me want to start learning about physics and I eventually discovered that physics is awesome. Almost more awesome than dinosaurs.

What made you abandon your time machine track, then, and focus on machine learning?

In university, my ambition was to be a physicist and AI was my side thing. Now it’s completely reversed. Once upon a time, to do “science” people would sit down and try to answer questions by arguing back and forth philosophically. That’s a fancy way of saying people would throw around possible explanations and critique the best stories. But they were just that: stories. Then they figured out that instead of arguing, they could run experiments to choose the correct explanations. Suddenly, there was direction. Science boomed. Now it’s becoming progressively more difficult to design experiments in physics – they’re billion dollar undertakings, requiring massive international cooperation. With AI, however, it’s the Wild West again. It’s what the 1920s were for quantum mechanics. There’s so much untapped potential. It’s so easy to design experiments, you just need a commercial laptop and some familiarity with basic math. And there’s so much data out there you can play with. The conditions are stacked for innovation and that’s why the field is exploding right now.

How does someone make a transition from traditional physics into AI at a level where you’re already skilled enough to innovate?

The mathematics are essentially identical. In fact, the math by which the neural networks function is much closer to the way electrons work in crystals than the way neurons work in the brain. The main algorithm we use to train neural networks – backpropagation – isn’t really biologically plausible. The things you use in relativity/quantum physics are linear algebra, differential geometry, and probability theory. With these subfields, it’s very simple to go directly into AI because it uses many of the same concepts. The only difference is you need to have sufficient computer programming skills so you can work with data, which you’ll probably have from lab work anyway.

Is this the field you want to dedicate your career to?

The way I consider it, AI is going to be incorporated into everything, like programming is today. It’s just another way computers can help us automate certain tasks, so it’ll probably follow me around. The thing I find the most interesting, though, as a scientific problem, is pattern formation. In popular discussion there’s a general understanding that patterns are everywhere, whether we’re talking the stripes on a zebra or harmony in music. What machine learning does is it gives you a way to be more formal and abstract about what constitutes a pattern. It reduces most questions about patterns to studying the geometry of different spaces. So for music, you start envisioning a space that holds all possible sequences of notes. Then you use data and probability theory to start building a map of this space, to see which sequences are allowed, to fill in the blanks, or to generate new sequences that your model has never seen before. In a way, the machine can actually imagine.

Honestly, this whole field is about self-organizing systems, and I think that’s one of the more productive veins of scientific investigation.

Based on what the network can accomplish right now what would be your dream problem to solve?

I think one of the more challenging problems to address is scale in Natural Language Processing. Most of our algorithms are built around studying the meanings of words, ways to generate and analyze sentences, or modelling topics of documents. But you can’t, for instance, feed Hamlet into a network and have it generate a cohesive essay. Or upload a bunch of textbooks and have it solve problems. Could you imagine how R&D would flourish with that? It’d be like Wolfram Alpha on steroids. Like having your own personal Jarvis.

Is that what you’re working on at Borealis AI?

Yes, my area of focus is NLP and, as I mentioned before, how language organizes itself. How do you actually organize words into sentences? Is a sentence just a collection of words that are ordered or is there something more to it? In AI, you have these things called Word Embeddings which are very good at describing the meaning of words. But if you want to actually generate sentences, you need other models with opaque hidden layers where you don’t really see what’s going on. They’re not interpretable.

Part of what I’m doing is establishing a baseline to understand the computational challenges associated with producing sentences, so we can see how these black box models scale the problem. At the same time, the idea is that there are some strong analogies between the ways words organize themselves in sentences, and the way events organize themselves in history. To test that, we have a massive corpus of news data that goes back to the 1970s, and we’ll be building different tools to try and understand historical shifts from the perspective of dynamical systems theory.

While this is fascinating, let’s talk about tango, which you practice daily in the office [Ed. Note: There is physical video evidence]. When did that passion emerge?

I have a bit of an internal need to learn everything. Walking around U of T one day, I saw a poster for drop-in tango sessions. I went with a friend and it was outstanding how mathematical [tango] is. The thing to understand about tango is it’s an improvised dance. You go to an event, walking into a dimly lit room, and for a moment you lock eyes with a stranger, signalling with a nod of a head that you intend to dance. Then, on the spot, to a song you may have never heard before, the two of you essentially construct a full sequence, mindful of the fact that others are doing the same around you, and you move together as one. That level of connection is only made possible by maintaining precise geometric proportions between you and your partner. And that’s not even touching on how the music itself affects the geometry of the sequence.

What do tango and dinosaurs have in common?

Is this a set-up for you to tell me a joke or do I need to reply with something witty?

[Laughs] No, they’re the two driving passions of your life so far. What link do they have that they’ve both singularly captivated your mind when you seem to have an interest in everything in the world?

I suppose, looking back, what impressed me the most about dinosaurs were their scale and variation. As much as we think the world belongs to humans, dinosaurs were the great survivors. They had the world from 230-to-65 million years ago. We’ve barely had it for one million. In a way, life is this problem and it’s solved over many generations because the environment is harsh and different adaptations are created. A lot of that is the same with tango – there are so many ways to build different things with so few parts. If there’s a common theme, it’s the thrill of building things, of creating, and the thrill of having a bunch of simple rules that, by virtue of their creative combination, obtain a form unto themselves which hides very simple dynamics. There’s a part of me, being the artist, which takes a look at this grand scale and is awed by its magnificence. And then the scientist in me is thrilled by the process of deconstruction, of looking at the simple parts, and the chance to assemble those parts into things never seen before.