Kazabyte: 2024

Monday, June 17, 2024

DEIJ, SRA, and Me

Lyle Spencer and a Box of SRA Reading Cards

I just finished applying for a Spencer Foundation Grant. Whew!

I didn't know much about the Spencer Foundation, so I read up on it, its history, and, it's benefactor, Lyle Spencer.

Lyle Spencer (1911-1968) graduated from the University of Washington, earning an undergraduate and master's degree in Sociology (His father was the President of UW, from 1927-1933). In 1938, while a graduate student at the University of Chicago, he founded Science Research Associates (SRA), a Chicago-based publisher of education materials and school room reading comprehension products. In 1962, he founded the Spencer Foundation (largely funded by SRA) to support education research. In 1962 Spencer sold SRA to IBM. He continued as CEO of SRA until 1968, but he directed much of his efforts and fortune to the Spencer Foundation until his death in 1968.

The Spencer Foundation continues to thrive today. The foundation’s mission is basically to support education research through diversity, equity, inclusion, and justice (DEIJ) — seemingly the scourge of MAGA and platform of progressives.

If you went to elementary school when I did, you may recall learning to read and write using SRA (the product of Science Research Associates). If you don’t, SRA was a big box of cards where each card had a short story and questions at the end to test what you read. The stories were arrange in color coded sections that progressively got more advanced as you passed through the sections.

It was all self paced. Every individual started at the level that was appropriate for them, you progressed as fast as you wanted, and it was accessible to everyone. If you squint a little, it was a reading and writing education program that was based on DEI — though it wasn’t called that.

For me, it was an effective way to learn to read and write. I often wondered why I was judged a slow reader but excellent at math in the second grade. I didn’t get placed in the “advanced” reading track. I sometimes wondered if there was a racial/ethnic stereotype in play. Regardless, it was SRA that allowed me to start at an appropriate level ("color"), work hard at my own pace, and “scratch and claw” my way into the top reading group by making appropriate progress on the SRA system. I thank my 4th grade teacher, Kay Brebner for recognizing my progress and promoting me.

SRA was instrumental in teaching me to learn to read and write. If I get the Spencer (or even if I don’t), it is not a coincidence I owe much to SRA and Lyle Spencer. My personal experience is a testament to SRA (Scientific Research Associates) and Mr. Spencer. Further, if you criticize or applaud DEIJ, my experience with SRA — a self paced, available to everyone, and accessible at any entry point reading system — is an example of DEIJ in practice.

Thank you Mr. Spencer. (Fingers crossed that I get my grant!)

Finally, I give another hat tip to the University of Washington, where I hail as an alumni!

Tuesday, May 7, 2024

AI & The Right to Learn on an Open Internet

Last week, the Common Crawl Foundation and Jeff Jarvis hosted an important conversation: AI & The Right to Learn on an Open Internet. However, this conversation has much broader scope -- who (or what) has the right to read anything and what can they do with this information?

td;dr; Everyone and everything has the right to read anything and everything. Including computers. Everyone and everything has the right to learn. However, the right to communicate what you've read or learned -- including synthesizing new ideas or content with or without attribution or compensation -- is at question.

The current hand wringing has to do with what are the intellectual property rights of information on the Internet -- how should content producers be compensated for their work and what attributions must be made? This is an old question (well at least a few decades). With generative AI and Large Language Models, the question becomes murkier -- how should attribution (and compensation) be given for "new" work that has been synthesized from previous work or content (crawled from the Internet)? This conference last Tuesday (April 30, 2024), focussed mostly on this.

But this question is much larger than content that is crawled on the Internet. For all the books that have ever been published, for any music or movie that has been created, or any content of any kind in general, the question of who should be compensated for what work under what kind of derivation is unanswered.

Kudos to the Common Crawl Foundation and Jeff Jarvis for addressing this broader question in addition to the specific about information on the Internet. In particular, through Generative AI and Large Language Models, there are no obvious answers of how to handle content (including text, audio, video) generated, learned, aggregated, and synthesized by machines (computers).

As pointed out, our current legal system is not up-to-speed on handling this question. The music industry has grapple with this for years, to no satisfactory solution. Ed Sheeran, Led Zeppelin, Robin Thicke, and even Taylor Swift are near the center of this controversy of who owns the right to music and derivative works. The labor dispute resulting in strikes by the Writers Guild of America and SAG-AFTRA show what is at stake for Hollywood (though this was not directly addressed by this conference).

"Fair use" doesn't seem to be an adequate "tool" for current use of content by Artificial Intelligence (or "derivative" content generation, AI or otherwise). Copyright, trademarks, patents, and trade secrets provide some guidance but seem inadequate to address current and emerging content largely driven by AI but technology in general. Ultimately, the courts need to get involved but it will be a long slog before it all gets sorted out. Perhaps the music industry provides some hints -- there are detailed rules and payment agreements on how copyright holders of music shall be compensated. However, this is far from settled, as disputes are working their way through the courts. Further, this is no guidance on how music created from Gen AI should be dealt with.

Ultimately, the issues are financial. It was posited that we should first get the ethical and moral issues sorted out first, and the monetary solutions will follow. Not surprising, this did not get much traction.

Right to Be Removed

One side topic of particular interest to me was a discussion about the "right to be removed." Once crawled and indexed into an archive from the open Internet, what are the rights of a publisher to be removed? At one level, once the information is "out there" can it ever really be "removed?" The Common Crawl data has been down loaded by many. Versions and back ups have been squired away. How is the data to be removed from all these copies?

Common Crawl is making efforts to remove content from an archive to satisfy practical requests, say remove links links to child pornography. However, this obviously does not affect the copies that have been distributed. And, even within the Common Crawl repository, I think these "removals" are just an update or branch from a given crawl. Think of the Common Crawl as a git repo -- all the versions are still there, even if there is an update to a branch.

Right to Read and Learn; Freedom of Speech; Intellectual Property Limits

As Americans, we have basic rights to read anything. Nothing should stop us from learning. Further, the First Amendment gives us broad rights to speak about nearly anything. There are some restrictions on what we do with what we synthesize from what we read and what we learn -- that's what Intellectual Property law is all about. The new question is whether or not machines have the same rights and if there are also somehow restricted. This is the issue discussed at this conference. With the onslaught of Generation AI and Large Language Models, the machines are spewing arguably new ideas synthesized from old. What should prevent the machines from reading, learning, and synthesizing?

Return to Xanadu

I've long been fascinated by the promise of Ted Nelson's Xanadu. While wildly impractical and unimplementable, especially when it was envisioned in the 1960s, the 17 principles are largely becoming true. In particular, in the context of attribution of source material for synthesized content by Gen AI systems, seems like "transclusions" and links as envisioned by Xanadu are relevant. The World Wide Web is often referred to as the simplified and practical implementation of Xanadu, even though Ted Nelson himself rejects this. I'd say Generative AI, transformers, and Large Language Models take us one step closer to Xanadu. And the hand wringing over how content creators (especially journalists) can be compensated for their work is found in the links/transclusions of Xanadu.

Hat tip to Rich Skrenta (Executive Director of Common Crawl) for bringing "Computer Lib : Dream Machines" (basically Ted Nelson and Xanadu) to the discussion.

Props and Acknowedgements

Thank you Rich Skrenta and Jeff Jarvis for envisioning and hosting this event. It is an important topic. But, all of the Common Crawl team were instrumental in making it happen. Joy Jing did a great job coordinating. Gil Elbaz is the founder of Common Crawl. Amazon AWS hosts the Common Crawl data (for free, I think). Mike Masnick was particularly insightful. Kearney, who hosted and sponsored much of the event, was particularly (and surprisingly) engaged in the issues of this conference. Great meeting the team from Tola Capital, sponsors of the event.

Sunday, April 28, 2024

Exceeding System Design Scale Limits: Why the 737 MAX Is Failing

(I’m flying on a 737-8 MAX today. I’m sitting 2 rows in front of the infamous door plug in row 26L. I reflect on the 737 MAX and its problems.)

An important engineering concept is “system design scale.” When designing a system, how much can you scale up a system before the architecture is no longer suitable for performance, reliability, or economics?

In software, I generally design for 10x, maybe 100x scale up/out. So, if you design for say 100 CPU cores, can you push the system to 1,000 cores? If you look as how Google or Facebook has scaled, you will see complete rethinking/redesigns of their technology stack as they have grown/scaled up. For example, I once wrote about how Facebook has dramatically and aggressively scaled up image serving — from buying a third party proprietary solutions (Netapp), to homegrown, to Haystack, to whatever they are doing now.

Google scaled from a machine at Stanford, to using many machines in the CS department, to off the shelf racks, to custom racks, to completely rewiring from TPU to dynamic optical interconnects in data centers to support AI. I hear their Advanced Development work is even more extraordinary — Go Hank Levy? 😀

This is long, round about way to get to my point — what’s with the Boeing 737 MAX? Sure, the usual criticism is warranted — Boeing has lost its way as it transformed from to a business culture from an engineering culture.

But I think (without talking to anyone at Boeing for direct evidence) that there is an issue with system design scale. Latest incantation of the 737 had a design architecture. When Boeing wanted to address the business threat of Airbus, it decided to scale up the 737 to build bigger planes — the MAX. And they stretched the design to build a bigger plane. The scale up went past the “system design scale” limit.

Now, I don’t know what the limits of scale for airplanes are. Certainly not 10x - 100x like software — that might mean 737’s with say 20,000 seats. Maybe to address the Airbus threat, it was only a bigger engine or a few more rows. But it was past the design scale imperative.

Bigger engines, more rows, new range limits, and probably hundreds of (small) design changes were needed to meet the “stretch” goal. Instead of being able to think systemically from first principles to build a plane, it was a bunch of tweaks to the original design. In software parlance, “design” (and implementation) modifications became “hacks” to make the system work. Coherent, architectural principles were tossed aside to make it work. The system is held together by “gum and baling wire.”

MCAS and the door plug were artifacts/symptoms of the problem. Procedural and manufacturing hacks, in addition to product deficiencies followed. For example pilot training and certification, QA oversight controls, and assembly issues (bolts on doors). It’s the classic “putting your finger in a dam to stop leaks” problem. Problems will continue to surface because the MAX was implemented beyond the design scale limits.

So what can be done? Maybe Boeing can continue to hack away and plug all the problems — hopefully without catastrophic failures (crashes). Perhaps a new design with a different design scale is in the works. However, I don’t think Boeing can back away from the MAX and wait for a new plane. But if they prove unsafe, would the FAA ground them? Seems like Boeing must stay the course, continue to hack away, and work with regulators to keep the MAX in the air.

I hope Boeing finds its footing and emerges as a better, stronger, and safer company. Ironically, maybe it’s Marketing and PR that will be needed to save the company, even if they fix their engineering and manufacturing issues. I’m rooting for them. Boeing has been a pillar of success in Seattle. It has been a key contributor to what makes Seattle great today. They set a culture and built an ecosystem that has grounded the greater Seattle area for nearly a century. I personally owe much to Boeing — my dad worked there for 30+ years. Go Boeing!

Some side observations:

“Move Fast and Break Things”

Facebook/Meta has been criticized for their ethos of “Move fast and break things.” It’s an important value of many startups, not just Facebook. It works when you are the small underdog and not deploying mission critical apps (e.g. where lives are at stake). But not when you are big, as Facebook is now. In the early days, no one died when posting, “I had a great hamburger for lunch.” If I squint, I could say Boeing was “moving fast and breaking things” with the MAX. This was not appropriate — failure (“breaking things”) put lives at risk and Boeing is a large company.

787 Dreamliner Problems Were Different

Problems with the 787 is a different issue. The 787 was designed to be built in a using a loosely coupled distributed system where subcomponents could be built by independent manufacturers. Final assembly (integration and system engineering) would be done by Boeing. Boeing would have to write specifications for the independent suppliers to implement. This was a complex process with supply chain, vendor/partner management, manufacturing, and integration issues.

These design principles are often used in software. Loose coupling, strong specifications, separating interface from implementation, interoperability of different subcomponents, etc. has been used to great success. The Internet was built on these principles. This did not work for the 787 Dreamliner. Boeing had to pull back from many of its partners and bring things back “in-house.” In general, what works for software might not work for many complex physical world systems.

Afternote: I just read this article — apparently, the 737 is also plagued by the distributed, outsourced production system too. And Boeing is trying to reel it in.