Connect with us


Perceptron: AI bias can arise from annotation instructions



Research in the field of machine learning and AI, now a key technology in practically every industry and company, is far too voluminous for anyone to read it all. This column, Perceptron (previously Deep Science), aims to collect some of the most relevant recent discoveries and papers — particularly in, but not limited to, artificial intelligence — and explain why they matter.

This week in AI, a new study reveals how bias, a common problem in AI systems, can start with the instructions given to the people recruited to annotate data from which AI systems learn to make predictions. The coauthors find that annotators pick up on patterns in the instructions, which condition them to contribute annotations that then become over-represented in the data, biasing the AI system toward these annotations.

Many AI systems today “learn” to make sense of images, videos, text, and audio from examples that have been labeled by annotators. The labels enable the systems to extrapolate the relationships between the examples (e.g., the link between the caption “kitchen sink” and a photo of a kitchen sink) to data the systems haven’t seen before (e.g., photos of kitchen sinks that weren’t included in the data used to “teach” the model).

This works remarkably well. But annotation is an imperfect approach — annotators bring biases to the table that can bleed into the trained system. For example, studies have shown that the average annotator is more likely to label phrases in African-American Vernacular English (AAVE), the informal grammar used by some Black Americans, as toxic, leading AI toxicity detectors trained on the labels to see AAVE as disproportionately toxic.

As it turns out, annotators’ predispositions might not be solely to blame for the presence of bias in training labels. In a preprint study out of Arizona State University and the Allen Institute for AI, researchers investigated whether a source of bias might lie in the instructions written by data set creators to serve as guides for annotators. Such instructions typically include a short description of the task (e.g. “Label all birds in these photos”) along with several examples.

Parmar et al.

Image Credits: Parmar et al.

The researchers looked at 14 different “benchmark” data sets used to measure the performance of natural language processing systems, or AI systems that can classify, summarize, translate, and otherwise analyze or manipulate text. In studying the task instructions provided to annotators that worked on the data sets, they found evidence that the instructions influenced the annotators to follow specific patterns, which then propagated to the data sets. For example, over half of the annotations in Quoref, a data set designed to test the ability of AI systems to understand when two or more expressions refer to the same person (or thing), start with the phrase “What is the name,” a phrase present in a third of the instructions for the data set.

The phenomenon, which the researchers call “instruction bias,” is particularly troubling because it suggests that systems trained on biased instruction/annotation data might not perform as well as initially thought. Indeed, the coauthors found that instruction bias overestimates the performance of systems and that these systems often fail to generalize beyond instruction patterns.

The silver lining is that large systems, like OpenAI’s GPT-3, were found to be generally less sensitive to instruction bias. But the research serves as a reminder that AI systems, like people, are susceptible to developing biases from sources that aren’t always obvious. The intractable challenge is discovering these sources and mitigating the downstream impact.

In a less sobering paper, scientists hailing from Switzerland concluded that facial recognition systems aren’t easily fooled by realistic AI-edited faces. “Morphing attacks,” as they’re called, involve the use of AI to modify the photo on an ID, passport, or other form of identity document for the purposes of bypassing security systems. The coauthors created “morphs” using AI (Nvidia’s StyleGAN 2) and tested them against four state-of-the art facial recognition systems. The morphs didn’t post a significant threat, they claimed, despite their true-to-life appearance.

Elsewhere in the computer vision domain, researchers at Meta developed an AI “assistant” that can remember the characteristics of a room, including the location and context of objects, to answer questions. Detailed in a preprint paper, the work is likely a part of Meta’s Project Nazare initiative to develop augmented reality glasses that leverage AI to analyze their surroundings.

Meta egocentric AI

Image Credits: Meta

The researchers’ system, which is designed to be used on any body-worn device equipped with a camera, analyzes footage to construct “semantically rich and efficient scene memories” that “encode spatio-temporal information about objects.” The system remembers where objects are and when the appeared in the video footage, and moreover grounds answers to questions a user might ask about the objects into its memory. For example, when asked “Where did you last see my keys?,” the system can indicate that the keys were on a side table in the living room that morning.

Meta, which reportedly plans to release fully-featured AR glasses in 2024, telegraphed its plans for “egocentric” AI last October with the launch of Ego4D, a long-term “egocentric perception” AI research project. The company said at the time that the goal was to teach AI systems to — among other tasks — understand social cues, how an AR device wearer’s actions might affect their surroundings, and how hands interact with objects.

From language and augmented reality to physical phenomena: an AI model has been useful in an MIT study of waves — how they break and when. While it seems a little arcane, the truth is wave models are needed both for building structures in and near the water, and for modeling how the ocean interacts with the atmosphere in climate models.

Image Credits: MIT

Normally waves are roughly simulated by a set of equations, but the researchers trained a machine learning model on hundreds of wave instances in a 40-foot tank of water filled with sensors. By observing the waves and making predictions based on empirical evidence, then comparing that to the theoretical models, the AI aided in showing where the models fell short.

A startup is being born out of research at EPFL, where Thibault Asselborn’s PhD thesis on handwriting analysis has turned into a full-blown educational app. Using algorithms he designed, the app (called School Rebound) can identify habits and corrective measures with just 30 seconds of a kid writing on an iPad with a stylus. These are presented to the kid in the form of games that help them write more clearly by reinforcing good habits.

“Our scientific model and rigor are important, and are what set us apart from other existing applications,” said Asselborn in a news release. “We’ve gotten letters from teachers who’ve seen their students improve leaps and bounds. Some students even come before class to practice.”

Image Credits: Duke University

Another new finding in elementary schools has to do with identifying hearing problems during routine screenings. These screenings, which some readers may remember, often use a device called a tympanometer, which must be operated by trained audiologists. If one is not available, say in an isolated school district, kids with hearing problems may never get the help they need in time.

Samantha Robler and Susan Emmett at Duke decided to build a tympanometer that essentially operates itself, sending data to a smartphone app where it is interpreted by an AI model. Anything worrying will be flagged and the child can receive further screening. It’s not a replacement for an expert, but it’s a lot better than nothing and may help identify hearing problems much earlier in places without the proper resources.


Tesla more than tripled its Austin gigafactory workforce in 2022



Tesla’s 2,500-acre manufacturing hub in Austin, Texas tripled its workforce last year, according to the company’s annual compliance report filed with county officials. Bloomberg first reported on the news.

The report filed with Travis County’s Economic Development Program shows that Tesla increased its Austin workforce from just 3,523 contingent and permanent employees in 2021 to 12,277 by the end of 2022. Bloomberg reports that just over half of Tesla’s workers reside in the county, with the average full-time employee earning a salary of at least $47,147. Outside of Tesla’s factory, the average salary of an Austin worker is $68,060, according to data from ZipRecruiter.

TechCrunch was unable to acquire a copy of the report, so it’s not clear if those workers are all full-time. If they are, Tesla has hired a far cry more full-time employees than it is contracted to do. According to the agreement between Tesla and Travis County, the company is obligated to create 5,001 new full-time jobs over the next four years.

The contract also states that Tesla must invest about $1.1 billion in the county over the next five years. Tesla’s compliance report shows that the automaker last year invested $5.81 billion in Gigafactory Texas, which officially launched a year ago at a “Cyber Rodeo” event. In January, Tesla notified regulators that it plans to invest another $770 million into an expansion of the factory to include a battery cell testing site and cathode and drive unit manufacturing site. With that investment will come more jobs.

Tesla’s choice to move its headquarters to Texas and build a gigafactory there has helped the state lead the nation in job growth. The automaker builds its Model Y crossover there and plans to build its Cybertruck in Texas, as well. Giga Texas will also be a model for sustainable manufacturing, CEO Elon Musk has said. Last year, Tesla completed the first phase of what will become “the largest rooftop solar installation in the world,” according to the report, per Bloomberg. Tesla has begun on the second phase of installation, but already there are reports of being able to see the rooftop from space. The goal is to generate 27 megawatts of power.

Musk has also promised to turn the site into an “ecological paradise,” complete with a boardwalk and a hiking/biking trail that will open to the public. There haven’t been many updates on that front, and locals have been concerned that the site is actually more of an environmental nightmare that has led to noise and water pollution. The site, located at the intersection of State Highway 130 and Harold Green Road, east of Austin, is along the Colorado River and could create a climate catastrophe if the river overflows.

The site of Tesla’s gigafactory has also historically been the home of low-income households and has a large population of Spanish-speaking residents. It’s not clear if the jobs at the factory reflect the demographic population of the community in which it resides.

Continue Reading


Launch startup Stoke Space rolls out software tool for complex hardware development



Stoke Space, a company that’s developing a fully reusable rocket, has unveiled a new tool to let hardware companies track the design, testing and integration of parts. The new tool, Fusion, is targeting an unsexy but essential aspect of the hardware workflow.

It’s a solution born out of “ubiquitous pain in the industry,” Stoke CEO Andy Lapsa said in a recent interview. The current parts tracking status quo is marked by cumbersome, balkanized solutions built on piles of paperwork and spreadsheets. Many of the existing tools are not optimized “for boots on the ground,” but for finance or procurement teams, or even the C-suite, Lapsa explained.

In contrast, Fusion is designed to optimize simple inventory transactions and parts organization, and it will continue to track parts through their lifespan: as they are built into larger assemblies and go through testing. In an extreme example, such as hardware failures, Fusion will help teams connect anomalous data to the exact serial numbers of the parts involved.

Image credit: Stoke Space

“If you think about aerospace in general, there’s a need and a desire to be able to understand the part pedigree of every single part number and serial number that’s in an assembly,” Lapsa said. “So not only do you understand the configuration, you understand the history of all of those parts dating back to forever.”

While Lapsa clarified that Fusion is the result of an organic in-house need for better parts management – designing a fully reusable rocket is complicated, after all – turning it into a sell-able product was a decision that the Stoke team made early on. It’s a notable example of a rocket startup generating pathways for revenue while their vehicle is still under development.

Fusion offers particular relevance to startups. Many existing tools are designed for production runs – not the fast-moving research and development environment that many hardware startups find themselves, Lapsa added. In these environments, speed and accuracy are paramount.

Brent Bradbury, Stoke’s head of software, echoed these comments.

“The parts are changing, the people are changing, the processes are changing,” he said. “This lets us capture all that as it happens without a whole lot of extra work.”

Continue Reading


Amid a boom in AI accelerators, a UC Berkeley-focused outfit, House Fund, swings open its doors



Companies at the forefront of AI would naturally like to stay at the forefront, so it’s no surprise they want to stay close to smaller startups that are putting some of their newest advancements to work.

Last month, for example, Neo, a startup accelerator founded by Silicon Valley investor Ali Partovi, announced that OpenAI and Microsoft have offered to provide free software and advice to companies in a new track focused on artificial intelligence.

Now, another Bay Area outfit — House Fund, which invests in startups with ties to UC Berkeley — says it is launching an AI accelerator and that, similarly, OpenAI, Microsoft, Databricks, and Google’s Gradient Ventures are offering participating startups free and early access to tech from their companies, along with mentorship from top AI founders and executives at these companies.

We talked with House Fund founder Jeremy Fiance over the weekend to get a bit more color about the program, which will replace a broader-based accelerator program House Fund has run and whose alums include an additive manufacturing software company, Dyndrite, and the managed app development platform Chowbotics, whose most recent round in January brought the company’s total funding to more than $60 million.

For founders interested in learning more, the new AI accelerator program runs for two months, kicking off in early July and ending in early September. Six or so companies will be accepted, with the early application deadline coming up next week on April 13th. (The final application deadline is on June 1.) As for the time commitment involved across those two months, every startup could have a different experience, says Fiance. “We’re there when you need us, and we’re good at staying out of the way.”

There will be the requisite kickoff retreat to spark the program and founders to get to know one another. Candidates who are accepted will also have access to some of UC Berkeley’s renowned AI professors, including Michael Jordan, Ion Stoica, and Trevor Darrell. And they can opt into dinners and events in collaboration with these various constituents.

As for some of the financial dynamics, every startup that goes through the program will receive a $1 million investment on a $10 million post-money SAFE note. Importantly, too, as with the House Fund’s venture dollars, its AI accelerator is seeking startups that have at least one Berkeley-affiliated founder on the co-founding team. That includes alumni, faculty, PhDs, postdocs, staff, students, dropouts, and other affiliates.

There is no demo day. Instead, says Fiance, founders will receive “directed, personal introductions” to the VCs who best fit with their startups.

Given the buzz over AI, the new program could supercharge House Fund, the venture organization, which is already growing fast. Fiance launched it in 2016 with just $6 million and it now manages $300 million in assets, including on behalf of Berkeley Endowment Management Company and the University of California.

At the same time, the competition out there is fierce and growing more so by the day.

Though OpenAI has offered to partner with House Fund, for example, the San Francisco-based company announced its own accelerator back in November. Called Converge, the cohort was to be made up of 10 or so founders who received $1 million each and admission to five weeks of office hours, workshops and other events that ended and that received their funding from the OpenAI Startup Fund.

Y Combinator, the biggest accelerator in the world, is also oozing with AI startups right now, all of them part of a winter class that will be talking directly with investors this week via demo days that are taking place tomorrow, April 5th, and on Thursday.

Continue Reading