How We Score 1.98 Million Products

A transparent look at the data sources, scoring algorithms, and validation processes behind our two-score system for measuring food processing and nutrition.

15 min readMethodologyData Science

Overview and Mission

Ultra Processed Food List exists to answer a simple question: how processed is the food you eat? We apply algorithmic scoring to every branded food product in the United States, transforming raw ingredient lists and nutrition facts into two actionable numbers -- a Processing Score and a Nutrition Score.

Our database currently contains 1.98 million products spanning 13 consumer-friendly categories, from 36,000+ brands. Every product receives both scores, calculated through a deterministic pipeline that treats each item identically regardless of brand, price, or marketing claims.

Transparency is central to our mission. This page documents exactly how scores are calculated, what data sources we use, where the system has known limitations, and how we validate results. No black boxes, no proprietary algorithms hidden behind vague descriptions -- just open methodology you can evaluate for yourself.

1.98M
Products Analyzed
36K+
Brands Covered
13
Food Categories
2
Independent Scores

Why two scores? Processing and nutrition measure fundamentally different things. A food can be heavily processed yet nutritionally dense, or minimally processed yet nutritionally poor. Collapsing both dimensions into a single number loses information consumers need.

Data Sources

Every score in our system traces back to publicly available data from the United States Department of Agriculture. We do not rely on proprietary datasets, manufacturer partnerships, or user submissions.

Primary Source

USDA FoodData Central

The Branded Food Products database is our primary source. Manufacturers and retailers submit detailed product information including complete ingredient lists, nutrition facts panels, UPC barcodes, brand names, and product categories.

1.98 million branded products

What We Have Access To

  • Full ingredient lists (ordered by weight)
  • Complete nutrition facts (calories, macros, vitamins, minerals)
  • UPC/barcode identifiers
  • Brand names and product descriptions
  • Serving size information

Data Freshness

The USDA dataset receives continuous updates from manufacturers. We run our complete processing pipeline quarterly to incorporate new products, updated formulations, and corrected entries.

Between major updates, our database reflects a point-in-time snapshot. Product formulations do change -- always verify current ingredients on the physical label.

Data coverage note: Approximately 93% of products have complete macronutrient data (calories, protein, fat, carbohydrates). Vitamin and mineral data is available for a smaller subset. Image coverage stands at approximately 25% through UPC-based matching. Products without ingredient data cannot receive a Processing Score and are excluded from scoring.

The Two-Score System

We deliberately separated processing measurement from nutrition measurement. Each score captures a distinct dimension of food quality, and the interplay between them reveals patterns that neither score could surface alone.

1-32+

Processing Score

Lower is better

Measures how far a food has been transformed from its natural state through industrial processing. Based on ingredient count, the presence of artificial additives, preservatives, and manufacturing indicators.

Key inputs: ingredient count, artificial ingredients, HFCS, preservatives, hydrogenated oils, modified starches, chemical additives
0-10

Nutrition Score

Higher is better

Evaluates the nutritional profile based on the presence of beneficial nutrients and the absence of harmful ones. Starts at a baseline and adjusts based on positive and negative factors.

Key inputs: protein, fiber, added sugars, sodium, saturated fat, trans fat, fermented dairy content
The two scores are calculated independently:
Processing Score = base_score(ingredient_count) + sum(penalties)
Nutrition Score  = baseline + positive_factors - negative_factors
A product receives BOTH scores -- they are never combined into one number.

For detailed breakdowns of each scoring system, see our Processing Score Guide and Nutrition Score Guide.

Processing Score Deep Dive

The Processing Score is built from two components: a base score determined by the number of ingredients, and a set of additive penalties triggered by the presence of specific processing markers. The final score is the sum of both.

Base Scores by Ingredient Count

Ingredient CountBase ScoreDescription
1 ingredient1.0Single-ingredient foods (olive oil, plain rice, raw honey)
2-3 ingredients1.5Simple combinations (salted butter, nut butter with salt)
4-6 ingredients3.5Basic processed foods (simple bread, canned soup)
7-10 ingredients5.0Standard processed foods (crackers, sauces, dressings)
11-15 ingredients6.5Complex processed foods (frozen meals, protein bars)
16+ ingredients8.0+Heavily formulated products (score continues to scale with count)

Processing Penalties

When our algorithm detects specific processing markers in the ingredient list, additional penalty points are added to the base score. A single product can trigger multiple penalties.

Processing MarkerPenaltyWhy It Matters
Artificial ingredients+2.0Artificial colors, flavors, or sweeteners indicate industrial formulation not replicable in a home kitchen
High fructose corn syrup (HFCS)+1.5Industrially produced sweetener requiring enzymatic conversion of corn starch
Hydrogenated oils+1.5Industrial hydrogenation process that creates trans fats and extends shelf life
BHA/BHT preservatives+1.5Synthetic antioxidants used to prevent rancidity in processed foods
Modified ingredients+0.8Modified starches, protein isolates, and chemically altered food components
Other preservatives+0.6Sodium benzoate, potassium sorbate, calcium propionate, and similar shelf-life extenders

Example Calculation

Example: Packaged chocolate chip cookies
Base score (18 ingredients): 8.5
+ High fructose corn syrup: +1.5
+ Artificial flavors: +2.0
+ Preservatives (sodium benzoate): +0.6
+ Modified corn starch: +0.8
Final Processing Score: 13.4 (Level 4 -- Ultra-Processed)

Four Processing Levels

Final Processing Scores are mapped to four levels that provide plain-language context. The thresholds and current distribution across our database:

LevelScore RangeLabel% of ProductsTypical Products
Level 1≤ 2.5Minimally Processed~9.2%Single-ingredient foods, plain nuts, pure oils
Level 2≤ 5.0Processed~31.1%Simple bread, canned vegetables, cheese, yogurt
Level 3≤ 8.0Highly Processed~18.4%Frozen meals, flavored yogurts, granola bars
Level 4> 8.0Ultra-Processed~41.3%Snack cakes, soft drinks, candy, instant meals

Nutrition Score Deep Dive

The Nutrition Score evaluates what a food provides nutritionally, independent of how it was manufactured. Scores range from 0 to 10, with higher scores indicating more nutritional value. The database average is 4.82.

Positive Factors (Increase Score)

FactorMax Bonus
Protein content+3.0
Fiber content+2.0
Fermented dairy+1.0

Negative Factors (Decrease Score)

FactorMax Penalty
Added sugars-3.0
Sodium-2.0
Saturated fat-2.0
Trans fat-1.5
Example: Greek yogurt with fruit
Baseline score: 5.0
+ Protein (high): +2.5
+ Fermented dairy: +1.0
- Added sugars (moderate): -1.5
Final Nutrition Score: 7.0 / 10
0-3
Low Nutrition
High sugar, sodium, or fat with little protein or fiber
4-6
Moderate Nutrition
Mix of positive and negative factors; most products land here
7-10
Strong Nutrition
High protein/fiber, low sugar/sodium/fat

How Our System Relates to NOVA

The NOVA food classification system is the most widely cited academic framework for categorizing food by processing level. Our system was informed by NOVA principles but designed to address specific limitations of categorical classification.

AspectNOVA SystemOur System
Scale type4 categorical groupsContinuous score (1–32+)
Within-group distinctionNone -- all Group 4 foods are equivalentFull gradation (8.5 vs 22.0 are visibly different)
Nutrition assessmentNot includedSeparate 0–10 Nutrition Score
Assignment methodManual classification by trained researchersAlgorithmic analysis of ingredient lists
ScaleTypically applied to hundreds or thousands of products in studiesApplied to 1.98 million products automatically

Where the Systems Agree

  • Single-ingredient whole foods consistently receive the lowest processing classifications in both systems
  • Products with artificial additives, HFCS, and hydrogenated oils are flagged as highly processed by both systems
  • The same ingredient markers (emulsifiers, protein isolates, artificial sweeteners) drive classification in both frameworks

Where They Differ

  • NOVA treats all ultra-processed foods equally; our system distinguishes a PS of 8.5 from a PS of 25.0
  • NOVA does not account for nutritional content; we provide a separate Nutrition Score for that dimension
  • NOVA classification requires expert judgment for edge cases; our algorithm applies the same rules to every product deterministically

Our position: NOVA is a valuable research framework and our system is complementary to it, not a replacement. We use a continuous score because consumers benefit from knowing that a product scoring 9.0 is meaningfully less processed than one scoring 22.0, even though both would fall into NOVA Group 4. For a full explanation of NOVA, see our NOVA Food Classification Guide.

Limitations and Edge Cases

No scoring system is perfect. We believe transparency about limitations is more valuable than projecting false precision. Here are the known areas where our methodology has weaknesses or produces counterintuitive results.

Products That Score Unexpectedly High

Fortified health foods

Protein powders, meal replacement shakes, and fortified cereals often contain protein isolates, emulsifiers, and synthetic vitamins that trigger processing penalties -- even though these products are marketed as healthy. The Nutrition Score helps balance this by reflecting actual nutrient content.

Products with long but simple ingredient lists

A trail mix with 15 types of nuts, seeds, and dried fruits will receive a higher base score than a product with 5 ingredients, even though each individual ingredient is minimally processed. The base score reflects ingredient complexity, which is an imperfect proxy for processing level.

Products That Score Unexpectedly Low

Simple but nutritionally poor foods

A candy made from just sugar, corn syrup, and food coloring will score lower on processing than a complex whole-grain bread with 12 identifiable ingredients. Fewer ingredients means a lower base score, even when the product provides little nutritional value.

Incomplete ingredient data

Some products in the USDA database have abbreviated or incomplete ingredient lists. When ingredients are missing from the data, our algorithm cannot detect processing markers it cannot see. These products may receive artificially low scores.

Known Data Gaps

~7%
Products missing nutrition data
~75%
Products without matched images
Varies
Products with incomplete ingredient lists

Categories that are harder to classify: Supplements, baby food, and ethnic/specialty foods present particular challenges. Supplements often contain dozens of synthetic vitamins and minerals that inflate processing scores even when the base product is simple. Baby food formulations vary widely. Ethnic and specialty foods may use traditional ingredients that our detection algorithm does not yet recognize as processing markers (or incorrectly flags as such). We continue to refine these categories with each pipeline update.

Update Process and Data Quality

Our data pipeline is a 13-step incremental process. Each step runs independently, validates its output, and can be rolled back without affecting other steps. This architecture allows us to update individual components (such as ingredient normalization) without reprocessing the entire database.

The 13-Step Pipeline

1
Core Product Data
Load 1.98M products from raw USDA data
2
Brand Normalization
Clean and standardize 36K+ brand names
3
Category Mapping
Map 120+ variations to 13 categories (99.7% coverage)
4
SEO URLs
Generate human-readable product URLs
5
Ingredients Processing
Parse and normalize ingredient lists
6
Processing Scores
Calculate base scores and apply penalties
7
Nutrition Data
Extract macros, vitamins, minerals (93% coverage)
8
Nutrition Scores
Calculate 0-10 scores with positive/negative factors
9
Image Matching
Match product images via UPC barcodes (~25% coverage)
10
Dietary Attributes
Detect organic, gluten-free, vegan, allergens
11
Affiliate Links
Match products to retail purchase options
12
Score Summaries
Generate neutral, factual score descriptions
13
Ingredient Normalization
Map 412K unique ingredients into standardized categories

Validation at Every Step

Row Count Verification

Each step confirms that expected row counts are maintained and no products are silently dropped from the pipeline.

Distribution Checks

Score distributions are compared against expected ranges. If Level 4 products suddenly drop below 30% or above 50%, the step is flagged for manual review.

Rollback Capability

Each step operates on a working copy. If validation fails, the step is rolled back and the previous state is preserved without affecting downstream data.

Processing time: The full 13-step pipeline takes approximately 4 hours to complete, with each step using no more than 4GB of memory. This incremental approach replaced an earlier monolithic pipeline that required 12GB+ of RAM and offered no per-step rollback. The production database backup is preserved separately and is never modified by the pipeline.

How to Use These Scores

Scores are tools for comparison, not absolute judgments. They are most useful when comparing similar products within the same category or when evaluating the overall composition of your grocery basket.

Effective Uses

  • Compare brands within a category: If you are choosing between three pasta sauces, search our database to find which has the lowest Processing Score
  • Spot hidden processing: Two products may look similar on the front label, but their ingredient lists -- and therefore their scores -- can differ significantly
  • Explore categories and brands: See which food categories and brands trend toward lower processing
  • Read both scores together: A product with PS 12.0 / NS 7.5 tells a different story than PS 12.0 / NS 1.5

What Scores Cannot Tell You

  • ×Whether a food is "good" or "bad": Scores are descriptive, not prescriptive. Dietary context, individual health needs, portion sizes, and overall eating patterns matter enormously
  • ×Real-time formulation changes: Manufacturers update recipes regularly. Our scores reflect the most recent USDA data, which may lag behind current labels
  • ×Allergen safety: While we detect some allergen indicators, always read the actual product label for allergen warnings
  • ×Health outcomes: We do not make health claims. Processing level correlates with certain dietary patterns but does not determine individual health outcomes

Frequently Asked Questions

Why do some healthy foods score poorly?

Our Processing Score measures the degree of industrial transformation, not nutritional value. A fortified protein bar may contain beneficial nutrients like protein, fiber, and vitamins, but if it also contains protein isolates, emulsifiers, artificial sweeteners, and stabilizers, its Processing Score will reflect that industrial complexity. This is exactly why we use two separate scores: a product can have a high Processing Score (indicating heavy industrial processing) while still earning a strong Nutrition Score (indicating meaningful nutrient content). The two scores together give a more complete picture than either one alone.

How often is the data updated?

Our database is built on the USDA FoodData Central Branded Food Products dataset, which receives updates from manufacturers throughout the year. We process major data refreshes on a quarterly basis, running our complete 13-step incremental pipeline to incorporate new products, updated ingredient lists, and revised nutrition facts. Between major refreshes, we apply targeted corrections when significant errors are identified. Each update goes through validation at every step before reaching production.

Why don't you use the NOVA system directly?

NOVA is an excellent categorical framework and our system is informed by its principles. However, NOVA assigns all ultra-processed foods to a single group (Group 4), which means a lightly sweetened yogurt with one emulsifier and a heavily processed snack cake with 30 additives receive the same classification. Our continuous Processing Score (1 to 32+) preserves the gradations within each category, letting consumers see that a product scoring 8.5 is meaningfully different from one scoring 22.0. We also pair the Processing Score with a separate Nutrition Score, which NOVA does not address.

Can a product have a high Processing Score but good Nutrition Score?

Yes, and this happens more often than you might expect. Fortified breakfast cereals, protein powders, and meal replacement shakes frequently score above 8.0 on the Processing Score (ultra-processed) while earning 6.0 or higher on the Nutrition Score thanks to added protein, fiber, vitamins, and minerals. Conversely, a product like cotton candy might score moderately on processing (fewer ingredients) but receive a very low Nutrition Score due to being almost entirely sugar. The two-score system captures these nuances that a single score would miss.

Where does your data come from?

Our primary data source is the USDA FoodData Central Branded Food Products database, which contains detailed information submitted by food manufacturers and retailers. This includes ingredient lists, nutrition facts panels, UPC barcodes, brand names, and product categories. We supplement this with UPC-based image matching and affiliate link data from retail sources. All 1.98 million products in our database originate from the USDA dataset, making it the most comprehensive publicly available source of branded food product information in the United States.

Disclaimer: All tools and data visualizations are provided for educational and informational purposes only. They are not intended as health, medical, or dietary advice. Product formulations change frequently — always check the actual label for current ingredients and nutrition facts before making purchasing decisions. Consult healthcare professionals for personalized dietary guidance.