Data Products
Private engineering artifacts sourced for LLM fine-tuning, evals, and benchmarks. Each artifact type comes with full metadata, clear licensing, and optional cross-artifact linking.
Codebases
Full private repositories with commit history, branch structure, CI configs, and dependency graphs. These are production codebases from real engineering teams — not toy projects or tutorial repos.
Good for
- •Code generation fine-tuning
- •Code review model training
- •Repository-level reasoning benchmarks
- •Multi-file edit evaluation
Sample metadata fields delivered
Example use cases
- →Train models to understand cross-file dependencies and project structure.
- →Build evals that test whether a model can reason about real build systems.
- →Fine-tune on commit diffs paired with PR descriptions for code review tasks.
JIRA Exports
Complete project management exports including epics, stories, subtasks, sprint data, comments, status transitions, and custom fields. Exported with full relationship graphs between issues.
Good for
- •Task decomposition training
- •Project planning model evaluation
- •Requirements-to-code linking
- •Sprint velocity prediction
Sample metadata fields delivered
Example use cases
- →Train models to break down feature requests into well-structured subtasks.
- →Evaluate whether models can infer priority and effort from issue descriptions.
- →Link tickets to corresponding commits for end-to-end traceability datasets.
Communication Threadss
Engineering channel exports with threaded conversations, reactions, file attachments, and channel metadata. Covers incident response, design discussions, debugging sessions, and code reviews.
Good for
- •Conversational reasoning fine-tuning
- •Technical Q&A benchmarks
- •Incident triage model training
- •Knowledge retrieval evaluation
Sample metadata fields delivered
Example use cases
- →Fine-tune models on how engineers actually discuss and debug problems.
- →Build retrieval benchmarks over real internal knowledge bases.
- →Train incident classification models on real Slack-based triage flows.
Figma Files
Design files with component hierarchies, variant structures, design tokens, and page layouts. Includes both the visual assets and the underlying structural data from the Figma API.
Good for
- •Design-to-code model training
- •UI understanding benchmarks
- •Component extraction evaluation
- •Multi-modal model fine-tuning
Sample metadata fields delivered
Example use cases
- →Train models to generate frontend code from design specifications.
- →Evaluate whether models can identify reusable components across screens.
- →Build multi-modal datasets pairing visual layouts with structural metadata.
BRDs & PRDs
Business and product requirements documents including feature specs, acceptance criteria, user stories, wireframe references, and stakeholder sign-offs. Real documents from shipped products.
Good for
- •Requirements analysis training
- •Spec-to-code pipeline evaluation
- •Ambiguity detection benchmarks
- •Product reasoning fine-tuning
Sample metadata fields delivered
Example use cases
- →Train models to generate implementation plans from product requirements.
- →Evaluate whether models can identify gaps and ambiguities in specs.
- →Build datasets linking requirements to the code that implements them.
Postmortems
Incident postmortems with timelines, root cause analysis, contributing factors, remediation steps, and action items. Sourced from real production incidents across different infrastructure stacks.
Good for
- •Root cause analysis training
- •Incident response evaluation
- •SRE reasoning benchmarks
- •Failure pattern classification
Sample metadata fields delivered
Example use cases
- →Train models to identify root causes from incident descriptions and logs.
- →Evaluate whether models can suggest effective remediation steps.
- →Build classification datasets for failure modes across infrastructure types.
Need a custom dataset?
We can source specific artifact combinations, domains, or tech stacks. Tell us what you need.
Get in Touch