GEDI L4A Footprint-Level AGBD

GEDI L4A data searching, downloading, and pre-processing
Author

Xiuyu Cao

Published

January 9, 2026

Modified

March 3, 2026

Packages
from os import path, listdir
import datetime as dt
import pandas as pd
import geopandas as gpd
import numpy as np
import earthaccess
from shapely.geometry import MultiPolygon, Polygon, box
from shapely.ops import orient
import h5py
from glob import glob
import contextily as ctx

1 Introduction

Global Ecosystem Dynamics Investigation (GEDI) is a high resolution laser ranging of Earth’s forests and topography located in the International Space Station (ISS). GEDI data products are assigned different levels, which indicate the amount of processing that the data has undergone after collection (see Table 1).

GEDI Level 4A (L4A) data product provides footprint-level estimates of forest structure and aboveground biomass density (AGBD). The footprints are available starting 2019-04-18, covering latitudes of 52 degrees North to South. They are ~25 m in diameter and spaced approximately every 60 m along-track. Each GEDI L4A granule (file) contains footprints from a specific orbit segment and time period.

In this blog, I will be downloading and subsetting GEDI L4A data in defined study area. The code in this blog is mostly copied and adapted from the Github repository gedi_tutorials.

2 Search GEDI L4A data in the defined area

Tool functions
def convert_umm_geometry(gpoly):
    """Convert UMM geometry to multipolygons."""
    multipolygons = []
    for gl in gpoly:
        ltln = gl["Boundary"]["Points"]
        points = [(p["Longitude"], p["Latitude"]) for p in ltln]
        multipolygons.append(Polygon(points))
    return MultiPolygon(multipolygons)


def convert_list_gdf(datag):
    """Convert List[] to geopandas dataframe."""
    df = pd.json_normalize([vars(granule)['render_dict'] for granule in datag])  # convert granule metadata to a dataframe
    df.columns=df.columns.str.split('.').str[-1]  # simplify column names
    df["geometry"] = df["GPolygons"].apply(convert_umm_geometry)  # get multipolygon geometries
    return gpd.GeoDataFrame(df, geometry="geometry", crs="EPSG:4326")

In this section, I will search for and retrieve GEDI L4A granules that intersect with my defined study area. The area of interest can be specified either as a bounding box or a polygon.

2.1 Using a bounding box

In this example, I will search the data using a bounding box in Brazil, chosen to match the spatial coverage of a subset of the ESA BIOMASS Level 1 Cal/Val data. The bounding box spans 62°W to 60.8°W longitude and 4.2°S to 2.6°S latitude.

# Define bounding box and temporal range
bbox = (-62, -4.2, -60.8, -2.6)
start_date = dt.datetime(2024, 6, 1)
end_date = dt.datetime(2024, 10, 1)

# Convert bbox to geodataframe
b = list(bbox)
gdf_boundary = gpd.GeoDataFrame(crs='epsg:4326', geometry=[box(b[0], b[1], b[2], b[3])])
# Search data
granules = earthaccess.search_data(
    count = -1,  # get all granules
    bounding_box = bbox,
    temporal = (start_date, end_date),
    doi = '10.3334/ORNLDAAC/2056'
)

print(f"Found {len(granules)} granules")
granules[0]
Found 17 granules

Data: GEDI04_A_2024158222446_O31065_04_T08222_02_004_01_V002.h5

Size: 123.62 MB

Cloud Hosted: True

Visualize the bounding box and granules after converting the granule metadata from JSON formatted to GeoPandas dataframe.

Tip: When visualizing via .explore(), a warning might appear: “Make this Notebook Trusted to load map: File -> Trust Notebook”. To enable generating the interactive map, we can trust the notebook by running jupyter trust your_notebook.ipynb in the terminal (reference).

Visualization
# Get geodataframe and leave only 2 columns
gdf_granules = convert_list_gdf(granules)[['GranuleUR', 'geometry']]

# Visualize
m = gdf_granules.explore(color='green', fillcolor='green', location=[(b[1]+b[3])/2, (b[0]+b[2])/2], zoom_start=8)
gdf_boundary.explore(m=m, color='red', fill=False)
Make this Notebook Trusted to load map: File -> Trust Notebook

3 Download and expore the data

3.1 Download GEDI L4A data

Use earthaccess to download all granules that intersect the study area.

Download data
# works if the EDL login already been persisted to a netrc
auth = earthaccess.login(strategy="netrc") 
if not auth.authenticated:
    # ask for EDL credentials and persist them in a .netrc file
    auth = earthaccess.login(strategy="interactive", persist=True)

# Download data
downloaded_files = earthaccess.download(granules, local_path="../data/raw/full_orbits_brazil")

for file in downloaded_files:
    print(file)
../data/raw/full_orbits_brazil/GEDI04_A_2024158222446_O31065_04_T08222_02_004_01_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024163090813_O31134_01_T01727_02_004_01_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024167072950_O31195_01_T09454_02_004_01_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024181133245_O31416_04_T03035_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024185115749_O31477_04_T04764_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024189102239_O31538_04_T02224_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024193084707_O31599_04_T02530_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024197071112_O31660_04_T01413_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024197193400_O31668_01_T08230_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024201175727_O31729_01_T07266_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024205162022_O31790_01_T02186_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024215235340_O31950_04_T03188_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024219221820_O32011_04_T03494_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024223204226_O32072_04_T11068_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024227190550_O32133_04_T04412_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024232055057_O32202_01_T07572_02_004_02_V002.h5
../data/raw/full_orbits_brazil/GEDI04_A_2024270022944_O32789_04_T00189_02_004_01_V002.h5

3.2 Explore the downloaded data structure

As mentioned before, each GEDI L4A granule contains footprints from a specific orbit segment and time period. The structure of each granule is shown below. Each file stores information and data in METADATA and BEAMXXXX groups and three ANCILLARY group datasets.

The BEAMXXXX root group contains the AGBD prediction, associated uncertainty metrics, quality flags, and model inputs including the scaled and transformed GEDI02_A RH metrics and other information about the waveform for the selected algorithm setting group. The subgroups (e.g., agbd_prediction, geolocation) include information for each algorithm selection group (i.e., 1, 2, 3, 4, 5, 6, and 10). Check the user guide on the data main page for more details.

Some of the variables that can be used for preliminary quality filtering (Holcomb et al., 2023; de Conto et al., 2024):

  • algorithm_run_flag = 1 - selects data that have sufficient waveform fidelity for AGBD estimation.
  • l4_quality_flag = 1 and l2_quality_flag = 1 - Neither the underlying L2A data nor the L4A shot is marked as low-quality
  • degrade_flag = 0 - shot is not marked as degraded, which means the shots were not taken during a period in which sensor problems were detected, including periods in which the GEDI sensor was suffering from poor geolocation accuracy (error >> 10.2 m)
  • sensitivity > 0.95 - Maximum canopy cover that can be penetrated considering the SNR of the waveform. Get high sensitivity shots (0.98 at the tropics).
  • land_cover_data/landsat_water_persistence < 10 - The percent UMD GLAD Landsat observations with classified surface water between 2018 and 2019 (>80=represent permanent water, <10=permanent land).
  • land_cover_data/urban_proportion < 50 - The percentage proportion of land area within a focal area surrounding each shot that is urban land cover. Urban land cover was derived from the DLR 12 m resolution TanDEM-X Global Urban Footprint Product.
  • land_cover_data/pft_class - GEDI 1 km EASE 2.0 grid Plant Functional Type (PFT) derived from the MODIS MCD12Q1 V006 product. Values follow the Land Cover Type 5 Classification scheme (e.g., get PFT with tree cover: in 1, 2, 3, 4, 5, 6, 11).
Explore data structure
dir = '../data/raw/full_orbits_brazil'
downloaded_files = glob(path.join(dir, '*.h5'))

print("<---------- Structure of Each Downloaded File ---------->")
for file in downloaded_files:
    file_name = path.basename(file)
    file_dat = h5py.File(file, 'r')
    print(f"{file_name}: {file_dat.keys()}")

test_file = h5py.File(downloaded_files[0], 'r')

print("\n<---------------- Variables in METADATA ----------------->")
print(test_file.get('METADATA').keys())

print("\n<---------------- Variables in ANCILLARY ---------------->")
print(test_file.get('ANCILLARY').keys())

print("\n<---------------- Variables in BEAM0110 ----------------->")
print(test_file.get('BEAM0110').keys())

print("\n\n<------------- Group Variables in BEAM0110 -------------->")
for key, value in test_file['BEAM0110'].items():
    if isinstance(value, h5py.Group):
        print(f"\n-------------> Group: {key}")
        print(value.keys())
<---------- Structure of Each Downloaded File ---------->
GEDI04_A_2024223204226_O32072_04_T11068_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024158222446_O31065_04_T08222_02_004_01_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'METADATA']>
GEDI04_A_2024167072950_O31195_01_T09454_02_004_01_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024197071112_O31660_04_T01413_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024163090813_O31134_01_T01727_02_004_01_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'METADATA']>
GEDI04_A_2024181133245_O31416_04_T03035_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024201175727_O31729_01_T07266_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024219221820_O32011_04_T03494_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024189102239_O31538_04_T02224_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024227190550_O32133_04_T04412_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024197193400_O31668_01_T08230_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024270022944_O32789_04_T00189_02_004_01_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024232055057_O32202_01_T07572_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024185115749_O31477_04_T04764_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024215235340_O31950_04_T03188_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024205162022_O31790_01_T02186_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>
GEDI04_A_2024193084707_O31599_04_T02530_02_004_02_V002.h5: <KeysViewHDF5 ['ANCILLARY', 'BEAM0000', 'BEAM0001', 'BEAM0010', 'BEAM0011', 'BEAM0101', 'BEAM0110', 'BEAM1000', 'BEAM1011', 'METADATA']>

<---------------- Variables in METADATA ----------------->
<KeysViewHDF5 ['DatasetIdentification']>

<---------------- Variables in ANCILLARY ---------------->
<KeysViewHDF5 ['model_data', 'pft_lut', 'region_lut']>

<---------------- Variables in BEAM0110 ----------------->
<KeysViewHDF5 ['agbd', 'agbd_pi_lower', 'agbd_pi_upper', 'agbd_prediction', 'agbd_se', 'agbd_t', 'agbd_t_se', 'algorithm_run_flag', 'beam', 'channel', 'degrade_flag', 'delta_time', 'elev_lowestmode', 'geolocation', 'l2_quality_flag', 'l4_quality_flag', 'land_cover_data', 'lat_lowestmode', 'lon_lowestmode', 'master_frac', 'master_int', 'predict_stratum', 'predictor_limit_flag', 'response_limit_flag', 'selected_algorithm', 'selected_mode', 'selected_mode_flag', 'sensitivity', 'shot_number', 'solar_elevation', 'surface_flag', 'xvar']>


<------------- Group Variables in BEAM0110 -------------->

-------------> Group: agbd_prediction
<KeysViewHDF5 ['agbd_a1', 'agbd_a10', 'agbd_a2', 'agbd_a3', 'agbd_a4', 'agbd_a5', 'agbd_a6', 'agbd_pi_lower_a1', 'agbd_pi_lower_a10', 'agbd_pi_lower_a2', 'agbd_pi_lower_a3', 'agbd_pi_lower_a4', 'agbd_pi_lower_a5', 'agbd_pi_lower_a6', 'agbd_pi_upper_a1', 'agbd_pi_upper_a10', 'agbd_pi_upper_a2', 'agbd_pi_upper_a3', 'agbd_pi_upper_a4', 'agbd_pi_upper_a5', 'agbd_pi_upper_a6', 'agbd_se_a1', 'agbd_se_a10', 'agbd_se_a2', 'agbd_se_a3', 'agbd_se_a4', 'agbd_se_a5', 'agbd_se_a6', 'agbd_t_a1', 'agbd_t_a10', 'agbd_t_a2', 'agbd_t_a3', 'agbd_t_a4', 'agbd_t_a5', 'agbd_t_a6', 'agbd_t_pi_lower_a1', 'agbd_t_pi_lower_a10', 'agbd_t_pi_lower_a2', 'agbd_t_pi_lower_a3', 'agbd_t_pi_lower_a4', 'agbd_t_pi_lower_a5', 'agbd_t_pi_lower_a6', 'agbd_t_pi_upper_a1', 'agbd_t_pi_upper_a10', 'agbd_t_pi_upper_a2', 'agbd_t_pi_upper_a3', 'agbd_t_pi_upper_a4', 'agbd_t_pi_upper_a5', 'agbd_t_pi_upper_a6', 'agbd_t_se_a1', 'agbd_t_se_a10', 'agbd_t_se_a2', 'agbd_t_se_a3', 'agbd_t_se_a4', 'agbd_t_se_a5', 'agbd_t_se_a6', 'algorithm_run_flag_a1', 'algorithm_run_flag_a10', 'algorithm_run_flag_a2', 'algorithm_run_flag_a3', 'algorithm_run_flag_a4', 'algorithm_run_flag_a5', 'algorithm_run_flag_a6', 'l2_quality_flag_a1', 'l2_quality_flag_a10', 'l2_quality_flag_a2', 'l2_quality_flag_a3', 'l2_quality_flag_a4', 'l2_quality_flag_a5', 'l2_quality_flag_a6', 'l4_quality_flag_a1', 'l4_quality_flag_a10', 'l4_quality_flag_a2', 'l4_quality_flag_a3', 'l4_quality_flag_a4', 'l4_quality_flag_a5', 'l4_quality_flag_a6', 'predictor_limit_flag_a1', 'predictor_limit_flag_a10', 'predictor_limit_flag_a2', 'predictor_limit_flag_a3', 'predictor_limit_flag_a4', 'predictor_limit_flag_a5', 'predictor_limit_flag_a6', 'response_limit_flag_a1', 'response_limit_flag_a10', 'response_limit_flag_a2', 'response_limit_flag_a3', 'response_limit_flag_a4', 'response_limit_flag_a5', 'response_limit_flag_a6', 'selected_mode_a1', 'selected_mode_a10', 'selected_mode_a2', 'selected_mode_a3', 'selected_mode_a4', 'selected_mode_a5', 'selected_mode_a6', 'selected_mode_flag_a1', 'selected_mode_flag_a10', 'selected_mode_flag_a2', 'selected_mode_flag_a3', 'selected_mode_flag_a4', 'selected_mode_flag_a5', 'selected_mode_flag_a6', 'shot_number', 'xvar_a1', 'xvar_a10', 'xvar_a2', 'xvar_a3', 'xvar_a4', 'xvar_a5', 'xvar_a6']>

-------------> Group: geolocation
<KeysViewHDF5 ['elev_lowestmode_a1', 'elev_lowestmode_a10', 'elev_lowestmode_a2', 'elev_lowestmode_a3', 'elev_lowestmode_a4', 'elev_lowestmode_a5', 'elev_lowestmode_a6', 'lat_lowestmode_a1', 'lat_lowestmode_a10', 'lat_lowestmode_a2', 'lat_lowestmode_a3', 'lat_lowestmode_a4', 'lat_lowestmode_a5', 'lat_lowestmode_a6', 'lon_lowestmode_a1', 'lon_lowestmode_a10', 'lon_lowestmode_a2', 'lon_lowestmode_a3', 'lon_lowestmode_a4', 'lon_lowestmode_a5', 'lon_lowestmode_a6', 'sensitivity_a1', 'sensitivity_a10', 'sensitivity_a2', 'sensitivity_a3', 'sensitivity_a4', 'sensitivity_a5', 'sensitivity_a6', 'shot_number', 'stale_return_flag']>

-------------> Group: land_cover_data
<KeysViewHDF5 ['landsat_treecover', 'landsat_water_persistence', 'leaf_off_doy', 'leaf_off_flag', 'leaf_on_cycle', 'leaf_on_doy', 'pft_class', 'region_class', 'shot_number', 'urban_focal_window_size', 'urban_proportion']>

4 Subsetting the data

4.1 Subsetting one of the granules

Now that all the granules that intersect the area of interest are downloaded, we can subset the files to retrieve only the footprints that fall within the bounding box. In the above list of variables in BEAMXXXX, lat_lowestmode and lon_lowestmode record the latitude and longitude of the ground location of each GEDI shot. Let’s plot all the beams in the map.

First, from one of the granules, I convert it to a GeoDataFrame test_file_gdf. It contains beam names, coordinates, and geometry information. This particular granule recorded a total of 490,127 shots.

Turn one granule into a GeoDataFrame
test_file = h5py.File(downloaded_files[4], 'r')

lat_all = []
lon_all = []
shot_id = []
for var in list(test_file.keys()):
    if var.startswith('BEAM'):
        beam_dat = test_file.get(var)
        beam_lat = beam_dat.get('lat_lowestmode')[:]
        beam_lon = beam_dat.get('lon_lowestmode')[:]
        beam_n = beam_lat.shape[0]  # number of shots in the beam group
        lat_all.extend(beam_lat.tolist())
        lon_all.extend(beam_lon.tolist())
        shot_id.extend(np.repeat(str(var), beam_n).tolist())

test_file_df = pd.DataFrame({
    'beam': shot_id,
    'lat': lat_all,
    'lon': lon_all})
test_file_gdf = gpd.GeoDataFrame(
    test_file_df, crs="EPSG:4326",
    geometry=gpd.points_from_xy(test_file_df.lon, test_file_df.lat))
test_file_gdf
beam lat lon geometry
0 BEAM0000 0.486341 -64.828217 POINT (-64.82822 0.48634)
1 BEAM0000 0.485921 -64.827921 POINT (-64.82792 0.48592)
2 BEAM0000 0.485467 -64.827608 POINT (-64.82761 0.48547)
3 BEAM0000 0.485046 -64.827312 POINT (-64.82731 0.48505)
4 BEAM0000 0.484626 -64.827015 POINT (-64.82701 0.48463)
... ... ... ... ...
490122 BEAM1011 -24.686229 -45.413423 POINT (-45.41342 -24.68623)
490123 BEAM1011 -24.686620 -45.413058 POINT (-45.41306 -24.68662)
490124 BEAM1011 -24.687012 -45.412694 POINT (-45.41269 -24.68701)
490125 BEAM1011 -24.687403 -45.412329 POINT (-45.41233 -24.6874)
490126 BEAM1011 -24.687794 -45.411965 POINT (-45.41196 -24.68779)

490127 rows × 4 columns

Visualize after clipping to the bounding box
# Subset
test_file_gdf_bbox = test_file_gdf[test_file_gdf['geometry'].within(gdf_boundary.geometry[0])]

# Visualize
m = test_file_gdf_bbox.explore(column='beam', legend=True, location=[(b[1]+b[3])/1.8, (b[0]+b[2])/2], zoom_start=9)
gdf_boundary.explore(m=m, color='red', fill=False)
Make this Notebook Trusted to load map: File -> Trust Notebook

The above map shows the GEDI L4A shots from all beams in the granule. GEDI footprints are spaced ~60 m along the track and ~600 m across the track.

4.2 Subsetting all downloaded files

I now loop over each of these files and create a clipped version of the files into a new directory subset_brazil.

Subset and save all granules
indir = '../data/raw/full_orbits_brazil'
outdir = '../data/output/subset_brazil'
# Set variables to skip when copying data to output file.
# skip_vars = []
skip_vars = ['agbd_prediction',  # ancillary information and AGBD preds for all algorithm selection groups (i.e., 1, 2, 3, 4, 5, 6, and 10)
             'geolocation'  # geo data for all algorithm selection groups (i.e., 1, 2, 3, 4, 5, 6, and 10)
             ]

input_granules = glob(path.join(indir, 'GEDI04_A*.h5'))

for infile in input_granules:
    infilename, ext = path.splitext(path.basename(infile))
    outfilename = f"{infilename}_sub{ext}"
    outfile = path.join(outdir, outfilename)

    if not path.isfile(outfile):  # Check if file already exists
        inhf = h5py.File(infile, 'r')
        outhf = h5py.File(outfile, 'w')

        # Copy ANCILLARY and METADATA groups
        inhf.copy('ANCILLARY', outhf)
        inhf.copy('METADATA', outhf)

        # Loop through BEAMXXXX groups
        for var in list(inhf.keys()):
            if var.startswith('BEAM'):
                beam_dat = inhf.get(var)
                beam_lat = beam_dat.get('lat_lowestmode')[:]
                beam_lon = beam_dat.get('lon_lowestmode')[:]
                i = np.arange(0, len(beam_lat), 1)  # index
                beam_df = pd.DataFrame({'lat': beam_lat, 'lon': beam_lon, 'index_ori': i})
                beam_gdf = gpd.GeoDataFrame(beam_df, crs="EPSG:4326", geometry=gpd.points_from_xy(beam_df.lon, beam_df.lat))
                beam_gdf_clipped = beam_gdf[beam_gdf['geometry'].within(gdf_boundary.geometry[0])]
                i2copy = beam_gdf_clipped.index_ori

                # Copy this BEAM group to output file
                for key, value in beam_dat.items():
                    if key in skip_vars:
                        continue
                    if isinstance(value, h5py.Group):
                        for subkey, subvalue in value.items():
                            if subkey in skip_vars:
                                continue
                            group_path = subvalue.parent.name
                            outgroup = outhf.require_group(group_path)  # create empty group if not exists
                            dataset_path = group_path + '/' + subkey
                            dataset_clipped = subvalue[:][i2copy]
                            outhf.create_dataset(dataset_path, data=dataset_clipped)
                            for attr in subvalue.attrs.keys():  # copy metadata
                                outhf[dataset_path].attrs[attr] = subvalue.attrs[attr]
                    else:
                        group_path = value.parent.name
                        outgroup = outhf.require_group(group_path)  # create empty group if not exists
                        dataset_path = group_path + '/' + key
                        dataset_clipped = value[:][i2copy]
                        outhf.create_dataset(dataset_path, data=dataset_clipped)
                        for attr in value.attrs.keys():  # copy metadata
                            outhf[dataset_path].attrs[attr] = value.attrs[attr]
        
        inhf.close()
        outhf.close()
Visualize subsetted AGBD footprints
# Read data and convert to GeoDataFrame
outdir = '../data/output/subset_brazil'
outfiles = glob(path.join(outdir, 'GEDI04_A*.h5'))

lat = []
lon = []
agbd = []
for file in outfiles:
    hf = h5py.File(file, 'r')
    for var in list(hf.keys()):
        if var.startswith('BEAM'):
            beam_dat = hf[var]
            lat.extend(beam_dat.get('lat_lowestmode')[:].tolist())
            lon.extend(beam_dat.get('lon_lowestmode')[:].tolist())
            agbd.extend(beam_dat.get('agbd')[:].tolist())
    hf.close()

df = pd.DataFrame({'agbd': agbd, 'lat': lat, 'lon': lon})
gdf = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=gpd.points_from_xy(df.lon, df.lat))

# Visualize (exclude -9999 agbd value data points)
ax =gdf[gdf['agbd'] != -9999].to_crs(epsg=3857).plot(column='agbd', vmin = 0, vmax = 800, alpha=0.1, 
                                                     linewidth=0, legend=True, figsize=(22, 7))
ctx.add_basemap(ax)

4.3 Save the subsetted data to different formats

The subsetted output HDF5 files can be converted to different formats (e.g., GeoJSON, CSV, ESRI Shapefile) for convenient further analyses.

Store all subsetted data into a single DataFrame
outdir = '../data/output/subset_brazil'
outfiles = glob(path.join(outdir, 'GEDI04_A*.h5'))

subset_df = pd.DataFrame()
for file in outfiles:
    hf = h5py.File(file, 'r')
    for var in list(hf.keys()):
        if var.startswith('BEAM'):
            col_names = []
            col_val = []
            beam_dat = hf[var]
            for key, value in beam_dat.items():
                if isinstance(value, h5py.Group):
                    for subkey, subvalue in value.items():
                        if (subkey != 'shot_number'):
                            if (subkey.startswith('xvar')):  # xvar variables have 2D
                                for r in range(4):
                                    col_names.append(subkey + '_' + str(r+1))
                                    col_val.append(subvalue[:, r].tolist())
                            else:
                                col_names.append(subkey)
                                col_val.append(subvalue[:].tolist())
                else:
                    if (key.startswith('xvar')):  # xvar variables have 2D
                        for r in range(4):
                            col_names.append(key + '_' + str(r+1))
                            col_val.append(value[:, r].tolist())
                    else:
                        col_names.append(key)
                        col_val.append(value[:].tolist())

            # Create a dataframe
            beam_df = pd.DataFrame(map(list, zip(*col_val)), columns=col_names)
            # Insert BEAM names
            beam_df.insert(0, 'BEAM', np.repeat(str(var), len(beam_df.index)).tolist())

            # Append to the subset_df dataframe
            if len(beam_df.index) > 0:
                subset_df = pd.concat([subset_df, beam_df])

    hf.close()

# Seting 'shot_number' as dataframe index. shot_number column is unique
res = subset_df.set_index('shot_number')
print(res.index.is_unique)
res.head()
True
BEAM agbd agbd_pi_lower agbd_pi_upper agbd_se agbd_t agbd_t_se algorithm_run_flag beam channel ... selected_algorithm selected_mode selected_mode_flag sensitivity solar_elevation surface_flag xvar_1 xvar_2 xvar_3 xvar_4
shot_number
320720000400286365 BEAM0000 93.128365 13.674403 243.444717 13.088727 9.178169 3.440835 1 0 0 ... 10 7 3 0.974751 4.283905 1 10.594338 10.984990 0.0 0.0
320720000400286366 BEAM0000 121.130402 25.540184 287.577850 13.087777 10.467468 3.440710 1 0 0 ... 10 5 3 0.957877 4.283472 1 10.699397 11.073260 0.0 0.0
320720000400286367 BEAM0000 100.411980 16.548042 255.143982 13.089769 9.530328 3.440972 1 0 0 ... 10 7 3 0.974387 4.283041 1 10.645313 10.986933 0.0 0.0
320720000400286368 BEAM0000 129.080185 29.263245 299.744812 13.086012 10.805500 3.440478 1 0 0 ... 10 7 3 0.954970 4.282610 1 10.694391 11.128792 0.0 0.0
320720000400286369 BEAM0000 57.110344 2.570586 182.546387 13.094980 7.187411 3.441656 1 0 0 ... 10 4 3 0.953118 4.282178 1 10.489714 10.791389 0.0 0.0

5 rows × 42 columns

Write to CSV
res.to_csv('../data/output/subset_brazil.csv')
Write to shapefile
res_gdf = gpd.GeoDataFrame(res, crs='EPSG:4326', geometry=gpd.points_from_xy(res.lon_lowestmode, res.lat_lowestmode))
for col in res_gdf.columns:
    if res_gdf[col].dtype == 'object':
        res_gdf[col] = res_gdf[col].astype(str)

res_gdf.to_file('../data/output/subset_brazil.shp')

APP A: GEDI Data Products

Table 1: GEDI Data Products
ID Data Type Format
L1B Geolocated Waveform Footprint HDF5
L2A Elevation and Height Metrics Footprint HDF5
L2B Canopy Cover and Vertical Profile Metrics Footprint HDF5
L3 Land Surface Metrics Gridded COG
L4A Aboveground Biomass Density Footprint HDF5
L4B Aboveground Biomass Density Gridded COG
L4C Waveform Structural Complexity Index Footprint HDF5