Skip to main content

Overview

Custom extractors let you define your own AI extraction tasks to pull specific data points from contracts. Unlike the built-in extractions (parties, dates, royalties, etc.), custom extractors are tailored to your organization’s unique needs.

Accessing Custom Extractors

  1. Open your project’s Drive
  2. Click the More Options menu (⋮) in the toolbar
  3. Select Manage Extractors

Creating an Extractor

1

Create New Extractor

Click the + button to create a new extractor. Enter a name that describes what you’re extracting (e.g., “Contract Code”, “Signing Tool”).
2

Write Description

Describe what you want to extract in plain language. Be specific about:
  • What data to look for
  • Where it typically appears in contracts
  • How to handle edge cases
3

Define Output

Specify the output format using a simple schema. For example:
- log_code: [extracted value]
4

Choose Context

Select what the extractor should analyze:
  • File title and content - Uses both filename and contract text
  • File title only - Only examines the filename
  • File content only - Only examines the contract text
5

Provide Examples

The system automatically selects your 3 most recent contracts. For each contract, provide the expected extraction result. These examples train the extractor.
6

Start Testing

Review your configuration and click Start to begin testing. Testing typically takes approximately 15 minutes.
7

Run Extractor

After training completes, run the extractor against your contracts. It will process all existing contracts in the project and automatically run on every newly uploaded contract going forward.

Writing Good Descriptions

The description is the most important part of your extractor. Write clear, specific instructions.

Example: Contract Code

Task: Extract the log code from the start of the file name.

Instructions:
- The code always appears at the start of the file name and continues until just before the year.
- If a code is found, return only the part before the year.
- If no valid code is found, return nothing.

Example: Signing Tool

Task: Extract the contract signing method information from contracts.

Instructions:
- The signing tool is usually mentioned in the page header, footer or near the signature block of the contract.
- If the signing tool is mentioned, return the name of the signing tool.
- If the signing tool is not mentioned, check for physical signatures.
- If physical signatures are mentioned, return "Physical signature".
- If neither are mentioned, return nothing.

Description Best Practices

DoDon’t
Be specific about where data appearsUse vague instructions like “find the code”
Explain edge cases and fallbacksAssume the AI knows your conventions
Describe the expected formatLeave output format ambiguous
Include inline examples (see below)Rely solely on the contract-based examples

Providing Examples

Good examples are critical for training accurate extractors. There are two ways to provide examples:

Examples Step (Contract-Based)

The system automatically selects a few of your recent contracts for the Examples step. For each contract, provide the expected extraction result. These real-world examples help the system understand your specific contract formats. Tips for contract-based examples:
  • Review the auto-selected contracts and verify they’re representative
  • Be consistent with your output format across all examples
  • Double-check that your expected outputs are accurate

Examples in the Description

You can include simple input/output examples directly in your description text. This is especially useful for showing the expected format and handling of edge cases.
Task: Extract the contract reference code.

Instructions:
- The code appears at the start of the file name before the year.
- Return only the code portion.

Examples:
- "PARDO_2024_Recording_Agreement.pdf" → log_code: "PARDO"
- "ABC123_2023_License.pdf" → log_code: "ABC123"  
- "Contract_2024.pdf" → (no code found, return nothing)
Best practices for inline examples:
  • Include 2-4 examples covering common cases
  • Show at least one edge case (e.g., missing data, unusual format)
  • Use the exact output format you defined
  • Keep examples simple and representative

Managing Extractors

The top bar of the Manage Extractors dialog shows all your extractors as tabs, displaying each extractor’s name and status. From here you can:
  • Switch between extractors - Click any tab to view/edit that extractor
  • Create new extractors - Click the + button to add a new extractor
  • Delete extractors - Use the menu on any extractor tab to remove it

Extractor Statuses

StatusDescription
DraftExtractor is being configured (description, examples, testing)
ActiveExtractor is trained and ready to run

Using Extraction Results

Once extractors have run, you can use the results throughout the platform:
  • Contract details - View extraction results in the Extractors section of any contract
  • Custom views - Add extractor columns to your Drive views to see results at a glance across all contracts
  • Exports - Include extractor results in your data exports for reporting and analysis

Limits

  • Maximum 5 extractors per project
  • Testing takes approximately 15 minutes per extractor

Best Practices

Begin with a straightforward extraction task. Once you understand how the system works, tackle more complex extractions.
The more specific your description, the better the results. Include exact formats, locations in documents, and handling for edge cases.
If results aren’t accurate, refine your description rather than creating a new extractor. Small wording changes can significantly improve accuracy.
If data is in the filename, use “File title only” for faster, more accurate extraction. Use “File content only” when filenames aren’t relevant.
Your data is never used to train external AI models. All testing is performed using an internal system specific to your project.