บทวิเคราะห์

A sustainability team calls about a soil carbon baseline, usually because a Scope 3 removal report or a supplier programme needs a defensible starting number. The second question is almost always the same: how many samples will we need? It is a reasonable question. It is also the wrong one.

Sample size is a consequence, not a starting point. It is the output of four upstream decisions: the scale of inference (field vs. project vs. programme), the variance of the stock across that scale, the minimum detectable difference you care about, and the confidence you need. Lock those down before the budget conversation, not after. Campaigns that skip this step arrive in year five with data that cannot resolve the effect everyone hoped it would show — a pattern now empirically documented across dozens of commercial cropland fields1.

Scale of inference: the decision that reshapes the budget

Before any formula, fix the scale. A baseline report for a single supplier farm is a field-scale precision problem: you want a tight confidence interval on the mean stock now. A 500-farm sourcing programme claiming Scope 3 removals is a project-scale change-detection problem: you want to distinguish a sequestration signal from natural noise across many fields over time. The two problems have different formulas, different data requirements, and very different price tags.

Bradford and colleagues tested this empirically on 45 commercial cropland fields and found that individual-field detection of meaningful accrual rates (around 3 tonnes C per hectare over ten years) is unreliable even at sampling densities of 1.2 hectares per sample, marked apparent gains and losses appear where no real change exists1. At project scale, by contrast, 30 paired fields at the same density produced accurate mean estimates of stock change about 80 percent of the time. Potash et al. (2025) made the economic case that sampling only around 10 percent of fields in a large project, with paired resampling, delivers a competitive return on MRV investment within five years2. The takeaway is stark: if your design assumes every field must individually prove its own sequestration, it is the wrong design.

Three numbers that drive sample size

Once the scale is fixed, three statistical inputs drive the calculation. None is optional, and none can be guessed without cost.

Spatial variance (σ²)

Soil carbon is heterogeneous at every scale. In temperate croplands, Poeplau and colleagues documented plot-scale mean absolute errors of 5 to 8 tonnes C per hectare (7 to 9 percent of stock) even when resampling the same profile with a minimal positional shift, and found that spatial dependence is weak, which limits how much stratification can help in relatively uniform fields3. In tropical perennials, variance is typically higher. For cocoa in Ghana and Côte d'Ivoire, published standard deviations for 0–30 cm SOC stocks cluster around 10 to 14 tonnes C per hectare4, driven by hillslope position, shade architecture, and chronosequence age.

In design, σ² is either taken from comparable published work on similar soil and management, the tropical SOC literature is finally rich enough for cocoa, coffee, and oil palm to do this honestly, or estimated from a 20 to 30 core pilot. Do not skip the pilot: the sensitivity of the final n to σ² is quadratic.

Minimum detectable difference (MDD)

The MDD is the smallest change the campaign must distinguish from noise. For a baseline it is the precision of the stock estimate itself (for instance ±3 tonnes C per hectare at 95 percent confidence). For change detection it is the smallest annual or cumulative change that matters for the claim (for instance 0.5 tonnes C per hectare per year cumulated over five years). Halving the MDD quadruples the sample count; it is the most punishing relationship in the design.

A warning follows. Credible MDDs for cocoa and coffee agroforestry on a 5 to 10 year horizon are in the range 2 to 5 tonnes C per hectare total, consistent with meta-analytic accrual rates of 0.2 to 0.3 tonnes per hectare per year for temperate agroforestry. Designs targeting less than 1 tonne per hectare of cumulative change are not detecting management effects; they are detecting rainfall. The 2023 Australian grazing-credit issuance dissected this exact failure mode: roughly 250,000 tonnes of credits were issued largely because of a Decile-10 wet year, not because of management change5.

Confidence (α) and power (1 − β)

α is the tolerance for a false positive; β for a false negative. Conventional values are α = 0.05 and power = 0.8. Two formulas are in use, and they are not interchangeable. For a baseline precision design, the relevant form is n ≈ (z_α/2 · σ / ε)², where ε is the half-width of the desired confidence interval. For a paired change-detection design across two time points, the relevant form is the two-sample power formula below. The original framing of this article mixed the two; both are shown here with their appropriate use cases.

Baseline precision n = (z_α/2 · σ / ε)²
Change detection n = 2 σ² × (z_α/2 + z_β)² / MDD²

For the change-detection form with α = 0.05 and power = 0.8, (z_α/2 + z_β)² ≈ 7.85. In either case, variance and MDD dominate everything, and they trade off against each other in real, calculable ways.

Stratification: the variance multiplier that usually (but not always)pays for itself

Stratified sampling with Neyman allocation typically reduces sample counts by 20 to 60 percent for the same precision in tropical agroforestry and mixed systems, but the gain depends on whether the strata actually capture variance. In relatively homogeneous temperate fields, Poeplau and colleagues found the gain to be modest3. In cocoa landscapes where topography, shade architecture, and chronosequence drive large differences in SOC, stratification is transformative , Adiyah et al. (2022, 2023) show significant hillslope-position effects and up to fivefold differences in SOC change across age classes4.

A defensible stratification combines three layers: remote sensing (canopy and bare-soil signals from Sentinel-2, with Sentinel-1 backscatter where moisture or structural variance matter), topography and drainage from a DEM, and a reconstructed management history of the preceding decade, now routinely recoverable from grower records or time-series imagery. Two cautions. First, stratum boundaries are themselves a source of variance when the covariates are noisy; inspect stratum means from the pilot before locking allocation. Second, Verra's VM0042 v2.2 , ICVCM-endorsed under the Core Carbon Principles in October 2025, sets a minimum of 15 samples per stratum regardless of what Neyman would allocate, and does not yet accept digital soil mapping as an approved CCP method. If the project intends to credit under VM0042 or the forthcoming EU CRCF methodologies, align the stratification with those constraints from day one.

A worked example, grounded in published cocoa data

A 500-hectare cocoa plantation in southern Côte d'Ivoire. Baseline target ±3 tonnes C per hectare at 95 percent confidence on a mean 0–30 cm SOC stock of roughly 45 tonnes C per hectare, in line with Adiyah et al. (2022) for Ashanti hillslope cocoa4. Pilot sampling (25 cores) yields σ ≈ 11 tonnes C per hectare, consistent with the published literature rather than an assumed value.

Unstratified n ≈ (1.96 × 11 / 3)² ≈ 52 cores

Stratified into three management zones (young replant, mature stand, recent renovation) with within-stratum σ of 7, 8, and 9 tonnes per hectare, Neyman allocation reduces the requirement to roughly 28 to 30 cores, a ~45 percent saving. If VM0042 v2.2's minimum of 15 samples per stratum applies, the count climbs back to 45, essentially erasing the stratification gain at this farm size. This is a real-world tension practitioners should confront explicitly rather than paper over.

For change detection at 0.5 tonnes C per hectare per year over five years, the economically viable structure is project-level paired resampling across 10 to 30 linked farms, not a heroic effort on one2. Bradford's results make the point explicit: insisting on per-farm detection with 60 to 100 paired cores per time point will still produce unreliable per-farm estimates, where the same budget redistributed across 30 farms at project scale produces defensible mean accrual1. This is the shift in framing the 2023–2025 literature has forced on MRV design.

Two measurement-stack upgrades that reshape the economics

First, sample to 0–60 cm in at least three increments (0–10, 10–30, 30–60). Adiyah et al. (2023) show that in cocoa agroforestry older than about 18 years, SOC gains at 20–60 cm exceed those in the topsoil and are invisible to a 0–30 cm design4. Raffeld et al. (2024) generalise the warning across cropland and perennial systems6. For deep-rooted perennials, subsoil sampling is no longer optional.

Second, use Equivalent Soil Mass (ESM), not fixed depth, for stock-change calculations. Fowler et al. (2023) and Raffeld et al. (2024) document errors of 15 to 100 percent in stock-change estimates under fixed-depth accounting when bulk density shifts, which it does systematically under reduced tillage or organic amendment7 6. Verra's forthcoming VM0042 v3.0 Soil Sampling and Analysis Handbook (in public consultation through March 2026)makes ESM the default. A 2026 article (or project) that ignores ESM is a credibility flag.

Interactive · Fixed depth vs. equivalent soil mass (ESM)

After five years of reduced tillage and cover cropping, topsoil bulk density drops and the soil “fluffs up.” A fixed 0–30 cm sample captures less soil mass at t₁ than at t₀, and silently under-reports the stock gain. ESM accounting tracks a constant soil mass and corrects the bias.

ΔBulk density, t₀ → t₁-0.10g/cm³

-0.250

Fixed-depth (0–30 cm)

what the protocol reports

+3.00Mg/ha

ESM (true stock change)

constant-mass accounting

+5.40Mg/ha

Fixed-depth bias vs. ESM truth

under-reports by 44%

ESM adds 2.4 cm equivalent depth to compensate BD loss

Baseline BD 1.35 g/cm³ and %C 1.5 are typical of a temperate cropland topsoil. Reduced tillage with cover crops routinely drops BD by 0.10–0.15 g/cm³ over five years — directly in the range where fixed-depth accounting fails by 30–70%. Fowler et al. (2023) and Raffeld et al. (2024) document errors of 15–100%.

Field note · Where sample-size calculations go wrong

Targeting field-scale detection when the signal only exists at project scale. Empirical work shows individual-field change detection is unreliable; project-level paired designs are the viable structure.
Underestimating σ. Homogeneous-looking cocoa fields routinely hide σ ≥ 10 tonnes per hectare once slope position and management age are unfolded.
Conflating composites with independent cores. Composites reduce analytical cost but destroy the variance structure needed for inference. Use them within a stratum, never as a substitute for replication.
Confusing precision design with change-detection design. The formulas differ and the sample sizes differ by a factor of two or more.
Skipping subsoil. In deep-rooted perennials the 30–60 cm layer is where much of the long-term signal lives.
Ignoring ESM. A stock-change claim based on fixed-depth sampling after any management change affecting bulk density will not survive 2026-era verification.
Sampling during anomalous years without a counterfactual. Rainfall anomalies can generate apparent SOC gains several times larger than realistic management-driven rates.

Counterfactuals and climate confounding: why controls are non-negotiable

A change-detection design without a counterfactual does not measure management, it measures weather. Wet years can shift measured SOC by 2 to 8 tonnes per hectare in a single season, three to ten times realistic management-driven accrual5. Any credible change-detection campaign therefore needs either a co-located control (paired fields under baseline management, sampled on the same dates) or an explicit climate normalisation using a biogeochemical model calibrated locally. A third-party verifier in 2026 will ask which one the project used. “Neither” is not an answer that survives.

What the 2026 MRV landscape actually requires

Three frameworks now frame any serious SOC campaign. Verra VM0042 v2.2, ICVCM-endorsed under the Core Carbon Principles in October 2025, specifies stratification rules, minimum samples per stratum, uncertainty deductions, and ESM-based accounting; v3.0 with the Soil Sampling and Analysis Handbook is expected during 2026. EU Regulation 2024/3012 (CRCF) entered force in December 2024; carbon-farming certification methodologies arrive through 2026, with the QU.A.L.ITY criteria (quantification, additionality, long-term storage, sustainability) shaping how EU-facing claims will be audited. The GHG Protocol Land Sector and Removals Standard takes effect 1 January 2027, requires quantitative uncertainty reporting, and mandates empirical (measurement-based) data for productive agricultural removals in Scope 3 inventories8. SBTi FLAG aligns with these.

Design the campaign to the standard the buyer will be audited against, not the one most familiar to the soil lab.

โครงการเด่น

สวนโกโก้สมัยใหม่ขนาดใหญ่: ดิน คาร์บอน และปฐพีศาสตร์

สวนเชิงพาณิชย์ในโคลอมเบียประสบปัญหาผลผลิตที่ผันผวนสูงและสงสัยว่าสุขภาพดินกำลังเสื่อมโทรม เราแบ่งสวนขนาด 5,000 เฮกตาร์ออกเป็นโซนการจัดการที่แตกต่างกัน และเก็บตัวอย่างดินผสมกว่า 300 ตัวอย่างเพื่อวิเคราะห์เต็มรูปแบบ

ดูโครงการที่เกี่ยวข้อง →

The punchline

The right sample size is whatever makes the claim survive the scrutiny it actually faces. For a baseline on a single cocoa farm with realistic σ, that is typically 30 to 60 cores with proper stratification, 0–60 cm depth, and ESM accounting. For a credible multi-year sequestration claim, it is a project-level paired design across many farms, not a heroic effort on one. The proximal-sensing revolution of the past four years (mid-infrared and VisNIR with regional or global calibrations9) has lowered per-sample cost by roughly an order of magnitude, which should be spent on more samples, deeper profiles, and more farms, not on hitting the same old number for less money.

Key takeaways

Fix the scale of inference first. Field-level change detection is unreliable for realistic sequestration rates; project-level paired designs are the viable structure.
Sample size is the output of scale, variance, minimum detectable difference, and confidence. Pilot-based σ estimation is non-negotiable.
Stratification typically cuts sample counts by 20 to 60 percent, but the gain is smaller in homogeneous fields, and the VM0042 15-per-stratum floor can erase it on small farms.
Sample to 0–60 cm and use ESM. Fixed-depth 0–30 cm designs miss the subsoil signal and are no longer verification-proof.
A change-detection campaign without a counterfactual or climate normalisation measures rainfall, not management.
Design to the standard the buyer will be audited against: VM0042 v2.2/3.0, EU CRCF, and the GHG Protocol Land Sector and Removals Standard now govern what defensible looks like.

References

1.Bradford, M.A. et al. (2023). Testing the feasibility of quantifying change in agricultural soil carbon stocks through empirical sampling. Geoderma, 440, 116719.
2.Potash, E. et al. (2025). Measure-and-remeasure as an economically feasible approach to crediting soil organic carbon at scale. Environmental Research Letters, 20(2), 024021.
3.Poeplau, C. et al. (2022). Plot-scale variability of organic carbon in temperate agricultural soils , implications for soil monitoring. Journal of Plant Nutrition and Soil Science, 185(4), 466–477.
4.Adiyah, F. et al. (2022, 2023). Effects of land-use change and topography on SOC stocks on Acrisol catenas in Ghanaian cocoa systems, CATENA 217, 106446; and Soil organic carbon changes under selected agroforestry cocoa systems in Ghana, Geoderma Regional 35, e00715.
5.Mitchell, E. et al. (2024). Making soil carbon credits work for climate change mitigation. Carbon Management, 15(1), 2430780.
6.Raffeld, A.M. et al. (2024). The importance of accounting method and sampling depth to estimate changes in soil carbon stocks. Carbon Balance and Management, 19, 1.
7.Fowler, A.F. et al. (2023). A simple soil mass correction for a more accurate determination of soil carbon stock changes. Scientific Reports, 13, 2401.
8.GHG Protocol. (2024). Land Sector and Removals Standard. World Resources Institute and WBCSD. Effective 1 January 2027. Cited alongside Verra VM0042 v2.2 (ICVCM-endorsed Oct. 2025) and EU Regulation 2024/3012 (CRCF, in force Dec. 2024).
9.Hutengs, C. et al. (2024). Enhanced VNIR and MIR proximal sensing of soil organic matter via machine learning ensembles and external parameter orthogonalization. Geoderma, 441, 116753. See also Greenberg, I. et al. (2022), Geoderma 409, 115614, and Hong, Y. et al. (2024), CATENA 235, 107667.

How Large Should Your Soil Carbon Sampling Campaign Be?

Overview

Topics

Authors