Featured image of post DuckDB.ExtensionKit Complete Guide: Building Native DuckDB Extensions in C#

DuckDB.ExtensionKit Complete Guide: Building Native DuckDB Extensions in C#

Master DuckDB.ExtensionKit — build native DuckDB extensions in C# with .NET Native AOT. From scalar to table functions, create extensible data pipelines from scratch.

DuckDB.ExtensionKit Complete Guide: Building Native DuckDB Extensions in C#

DuckDB’s extension mechanism has long been the cornerstone of its flexibility. Now, C# developers can finally write high-performance native DuckDB extensions in their familiar language.

DuckDB has always been renowned for its powerful extension mechanism — through dynamic extension loading, you can add support for new file formats, custom types, and scalar/table functions without modifying the core engine. A significant portion of DuckDB’s own functionality is implemented as extensions, including Parquet reading, JSON parsing, HTTP filesystem access, and spatial data processing. However, extension development has historically been the domain of C/C++ and Rust developers, leaving .NET developers on the sidelines.

That’s changing now. DuckDB.ExtensionKit is an experimental project that enables .NET/C# developers to write native DuckDB extensions in pure C#, compiling to runtime-free binaries via .NET Native AOT. This comprehensive guide dives deep into ExtensionKit’s technical principles, walks you through building a practical JWT parsing extension from scratch, and explores real-world business applications and monetization strategies.

DuckDB’s Extension Architecture: A Full Overview

Before diving into ExtensionKit, let’s understand DuckDB’s extension ecosystem. DuckDB organizes extensions across three distinct layers:

Layer 1: Core Extensions. These are extensions developed alongside the DuckDB engine itself, including data import extensions (Parquet, JSON, CSV, HTTPFS) and functional extensions (SpatiaLite, Math). They use the C++ API and are tightly coupled to DuckDB’s internal APIs.

Layer 2: Community Extensions. Maintained by community members, these third-party extensions cover a wide range of use cases and integrations. Examples include the crypto extension for additional cryptographic functions, the postgres_scanner for querying PostgreSQL databases directly, and many more.

Layer 3: C Extension API. This is DuckDB’s stable, backward-compatible C interface designed specifically for external extension development. It allows extensions to remain compatible across different DuckDB versions and can be used from C, C++, Rust, and other languages. However, even with the C API, developers still face manual memory management and substantial boilerplate code.

ExtensionKit aims to carve out a new path for the .NET ecosystem within this extension framework.

Why Does the .NET Ecosystem Need ExtensionKit?

For developers familiar with the .NET stack, facing DuckDB’s extension development has presented several significant barriers:

First, the C++ API has an extremely steep learning curve. It requires deep understanding of DuckDB’s internal data structures, including Vector, Chunk, and Expression objects. Every DuckDB upgrade may require recompiling and adapting your extension code due to internal API changes.

Second, while the C Extension API is stable, the developer experience is poor. You must manually handle memory allocation, type conversion, and error propagation — low-level details that C# developers are accustomed to having abstracted away by the garbage collector and rich type system.

Third, Rust supports the C Extension API but has its own steep learning curve, and the Rust ecosystem isn’t widely adopted in .NET-centric enterprise environments.

ExtensionKit addresses all these pain points. Built on top of the C Extension API, it leverages C#’s type safety and source generators to make extension development feel like writing a regular C# library. Developers don’t need to worry about low-level memory management or complex type conversions — ExtensionKit automatically generates the glue code.

Deep Dive: Core Technical Principles

ExtensionKit’s power rests on three key technologies. Understanding these principles will help you use the toolkit more effectively.

1. Direct Mapping of C Function Pointers

DuckDB’s C Extension API is essentially a massive struct called duckdb_ext_api_v1, containing over 100 function pointer fields that cover database connections, type definitions, function registration, vector operations, and more.

ExtensionKit precisely mirrors this struct in C#, mapping each C function pointer to a C# unmanaged delegate (delegate* unmanaged[Cdecl]<...>). This approach offers several critical advantages:

  • Zero overhead: Direct function pointer calls eliminate the marshaling overhead of P/Invoke
  • Type safety: The C# compiler checks parameter types and return types at compile time
  • Equivalent performance: Call performance is nearly identical to native C code, suitable for high-throughput data processing scenarios

2. Automation via Source Generators

Traditional C extension templates use macros to generate entry points and initialization code — effective but hard to read and debug. ExtensionKit uses .NET Source Generators to emit this boilerplate at compile time.

When you mark your extension class with the [DuckDBExtension] attribute, the source generator performs the following tasks:

  1. Generates native entry point functions: Following DuckDB’s naming convention (<extension_name>_init_c_api), ensuring DuckDB can correctly locate the initialization function when loading the extension.
  2. Generates extension registration code: Automatically iterates through your registered scalar and table functions, calling the corresponding C APIs for registration.
  3. Generates parameter binding glue code: Maps C# method parameters to DuckDB’s Vector data structures, handling type conversion and null propagation automatically.

3. The Critical Role of Native AOT Compilation

This is ExtensionKit’s most innovative aspect. Traditional .NET applications rely on the CLR runtime, but DuckDB extensions must be purely native code at load time.

Through .NET Native AOT (Ahead-Of-Time) compilation, your C# project compiles to a pure native binary with zero .NET runtime dependencies. The compilation process includes:

  • Static analysis: The compiler analyzes all code paths to determine which types and methods need to be included
  • Just-in-time elimination: IL bytecode is directly compiled to machine code, bypassing the JIT compilation phase entirely
  • Link-time optimization: Unused code is stripped away, minimizing the final binary size

From DuckDB’s perspective, an ExtensionKit-built extension is indistinguishable from one written in C/C++ — it’s simply a standard shared library file (.so, .dll, or .dylib) that loads and runs via the LOAD command.

Complete Hands-On: Building a JWT Parsing Extension

Let’s walk through a complete example. We’ll build an extension called jwt_extension that provides two functions:

  • extract_claim_from_jwt(jwt_text, claim_name) — a scalar function that extracts a specific claim from a JWT
  • extract_claims_from_jwt(jwt_text) — a table function that extracts all claims from a JWT, returning a key-value table

Step 1: Create the .NET Project

dotnet new classlib -n JwtDuckDBExtension
cd JwtDuckDBExtension
dotnet add package DuckDB.ExtensionKit

Step 2: Define the Extension Class

using DuckDB.ExtensionKit;

[DuckDBExtension("jwt_extension", "1.0.0", "JWT token parsing extension")]
public static partial class JwtExtension
{
    private static void RegisterFunctions(DuckDBConnection connection)
    {
        // Scalar function: extract a specific claim from a JWT
        connection.RegisterScalarFunction<string, string, string?>
            ("extract_claim_from_jwt", ExtractClaimFromJwt);

        // Table function: extract all claims, returning a key-value table
        connection.RegisterTableFunction
            ("extract_claims_from_jwt",
             (string jwt) => ExtractClaimsFromJwt(jwt),
             c => new { claim_name = c.Key, claim_value = c.Value });
    }

    private static string? ExtractClaimFromJwt(string jwt, string claimName)
    {
        var parts = jwt.Split('.');
        if (parts.Length != 3) 
            throw new ArgumentException("Invalid JWT format: expected 3 parts");

        // Base64URL decode the payload
        var payload = System.Text.Encoding.UTF8.GetString(
            Convert.FromBase64String(PadBase64(parts[1])));

        // Parse JSON
        using var doc = System.Text.Json.JsonDocument.Parse(payload);
        foreach (var prop in doc.RootElement.EnumerateObject())
        {
            if (prop.Name == claimName)
                return prop.Value.GetString();
        }
        return null;
    }

    private static IEnumerable<KeyValuePair<string, string?>> ExtractClaimsFromJwt(string jwt)
    {
        var parts = jwt.Split('.');
        if (parts.Length != 3) 
            throw new ArgumentException("Invalid JWT format: expected 3 parts");

        var payload = System.Text.Encoding.UTF8.GetString(
            Convert.FromBase64String(PadBase64(parts[1])));

        using var doc = System.Text.Json.JsonDocument.Parse(payload);
        foreach (var prop in doc.RootElement.EnumerateObject())
        {
            yield return new KeyValuePair<string, string?>(
                prop.Name, prop.Value.GetString());
        }
    }

    private static string PadBase64(string input)
    {
        int pad = input.Length % 4;
        if (pad == 0) return input;
        return input + new string('=', 4 - pad);
    }
}

Step 3: Publish as a Native AOT Extension

dotnet publish -c Release -r linux-x64 --self-contained false -p:PublishAot=true

After successful compilation, you’ll find a .so file (Linux), .dll file (Windows), or .dylib file (macOS) in the bin/Release/netX.0/linux-x64/publish/ directory.

Step 4: Use in DuckDB

-- Load the extension
LOAD './JwtDuckDBExtension.so';

-- Use scalar function to extract a single claim
SELECT extract_claim_from_jwt(
    'eyJhbG...sw5c',
    'name'
) AS username;
-- Output: John Doe

-- Use table function to extract all claims
SELECT * FROM extract_claims_from_jwt(
    'eyJhbG...sw5c'
);

Table function output:

claim_nameclaim_value
sub1234567890
nameJohn Doe
iat1516239022

Real-World Application Scenarios

This JWT extension has multiple practical applications:

  1. Log Analysis: Extract user identity information from request logs containing JWTs for access pattern analysis
  2. Security Auditing: Batch-analyze JWT tokens in your system to detect expired tokens, anomalous signatures, and security vulnerabilities
  3. Data Pipelines: Parse identity tokens during ETL processes and filter data access permissions based on user roles
  4. API Gateways: Combine with other extensions to implement claim-based fine-grained data access controls

Comprehensive Comparison with Traditional Extension Development

FeatureC++ APIC Extension APIDuckDB.ExtensionKit
Stability❌ Changes per version✅ Backward compatible✅ Based on C API
Build DependencyFull engine buildSDK only.NET SDK only
Memory ManagementManualManualAutomatic (GC + AOT)
Type Safety⚠️ Partial❌ None✅ Full C# type system
Boilerplate CodeLots (macros)ModerateMinimal (source gen)
Cross-LanguageC/C++/RustC/C++/RustC#/.NET
Runtime DependencyNoneNoneNone (Native AOT)
Learning CurveSteepModerateGentle (.NET devs)
Dev EfficiencyLowMediumHigh
Debugging ExperienceDifficultModerateExcellent (VS/VSCode)

Use Cases and Monetization Strategies

Detailed Application Scenarios

Enterprise Data Pipelines: In .NET-dominated enterprise environments, ExtensionKit allows data engineers to write custom data transformation functions in familiar C#, seamlessly integrating into DuckDB analysis workflows. For example, you can write extensions specifically for processing internal corporate data formats or implementing industry-specific calculation logic.

Industry-Specific Function Libraries: Financial institutions can embed proprietary risk scoring and assessment calculations as DuckDB extensions; healthcare organizations can embed ICD coding transformations and clinical data standardization functions. These extensions can be sold as SaaS services, charged by usage or license.

Data Compliance and Auditing: Write encryption, data masking, and classification functions as extensions, providing data governance services for enterprises. Especially under regulations like GDPR and HIPAA, companies need frequent data masking and classification — all of which can be packaged as DuckDB extensions.

Third-Party System Integration: Build connector extensions for specific SaaS platforms (Salesforce, SAP, Oracle), reducing customers’ integration costs and operational complexity.

Revenue Streams and Income Estimates

Revenue StreamStrategyEstimated Income
Paid Extension PacksDevelop industry-specific function sets (financial risk control, medical coding, logistics tracking) as commercial extensions with annual licensing$500-$5,000/year
Data Consulting ServicesCustom DuckDB extension solutions for enterprises, including performance tuning, data pipeline architecture, and training$300-$1,500/day
Training Courses“.NET Data Engineer” series courses covering ExtensionKit + DuckDB hands-on projects$20-$70/student
SaaS ProductsBuild analytics SaaS on ExtensionKit — automated reporting, anomaly detection, real-time dashboard services$50-$500/month subscription
Open Source SponsorshipOpen-source general-purpose extensions, earn via GitHub Sponsors and enterprise sponsorships$50-$500/month

For individual developers and small teams, we recommend the following phased approach:

  1. Phase 1 (1-2 months): Choose a niche but high-frequency scenario (such as parsing a specific data format), develop a high-quality free extension, build community reputation and gather user feedback.
  2. Phase 2 (3-6 months): Expand functionality based on user feedback, launch a paid premium version with technical support and customization services.
  3. Phase 3 (6-12 months): Integrate extensions into a complete analytics product, forming a SaaS service or enterprise-grade solution.

Limitations and Future Outlook

Despite its promising prospects, ExtensionKit currently has several limitations to be aware of:

  • Experimental Stage: APIs are evolving rapidly; breaking changes may appear in any version
  • Platform Constraints: Each extension must be compiled separately for specific platforms, increasing distribution and maintenance costs
  • Feature Coverage: Not all DuckDB extension features are exposed yet; custom types and aggregate function support is limited
  • Community Size: Compared to C/C++ and Rust, .NET community participation in the DuckDB extension ecosystem remains relatively low

However, as the .NET ecosystem continues maturing and the DuckDB community invests further, ExtensionKit is poised to become the go-to tool for .NET developers participating in the DuckDB extension ecosystem. Particularly with .NET’s ongoing improvements in cross-platform Native AOT, extension distribution and maintenance costs will continue to decrease.

For teams already invested in .NET technology stacks, ExtensionKit offers a unique competitive advantage — the ability to write high-performance data analysis extensions in a familiar language without learning C++ or Rust. This is a technical direction worth exploring early.

DuckDB.ExtensionKit Architecture Overview


⚠️ This article is based on DuckDB’s official blog post. DuckDB.ExtensionKit is currently experimental and APIs may change. Please consult the latest official documentation and GitHub repository before using in production environments.

📺 Watch video tutorials → Olap Studio YouTube

Subscribe for more DuckDB & AI automation tutorials

Built with Hugo
Theme Stack designed by Jimmy

⚠️ This site is an independent community project, not affiliated with, endorsed by, or sponsored by the DuckDB Foundation or official DuckDB project.

"DuckDB" is a registered trademark of the DuckDB Foundation. This site uses the name solely for factual description purposes.

All content is for educational and community promotion purposes only and does not constitute any commercial service.