Open-Source Model Usage, Data Sources, and Academic Integrity
Principle: Using open-source models as components is standard academic practice — equivalent to citing published libraries. Our original contribution is the architecture: how signals are fused, how memory is built, how conflicts are detected, and how analysis becomes video prompts. Swap any model — the system still works.
Note on SER model: The audeering wav2vec2 emotion model uses CC BY-NC-SA 4.0, which permits academic research but not commercial use. This is the only non-commercial component in our pipeline. For commercial deployment, it can be replaced with an Apache-licensed alternative (e.g., emotion2vec) without affecting the system architecture.
All datasets obtained through legitimate academic channels. MELD is publicly available (HuggingFace). SMIC-HS, 4DME, MMEW are standard micro-expression research datasets used under research terms. GWTW (1939) is in the public domain. Dataset terms were individually reviewed — 2 dataset agreements from other providers were rejected due to overly restrictive terms (requiring intellectual property transfer).
Our Original Contribution
The following components are 100% original code, not derived from any external model or library:
Multimodal fusion layer — weighted voting with conflict detection, FER degradation tiers
Three-layer memory system — STM → Candidate Buffer → LTM with LSTM surprise promotion
Emotion State Machine — NATURAL/GRADUAL/TRIGGERED/SUDDEN with personality modulation
Scoring engine — 6-dimension white-box formulas with circumplex distance metrics
Prompt engine — 6-dimension pipeline-driven prompt generation with zero fabrication policy
Unlike many AI video models that scrape internet data without permission, Lightricks (Israel) obtained their training data through formal licensing agreements with authorized content providers. This is explicitly stated on their official license page. This means:
The model was trained on legally licensed data, not scraped/pirated content
Videos generated by LTX-Video are not subject to copyright claims from training data owners
Academic use is fully permitted under both Apache 2.0 (code) and Community License (weights)
Commercial use is free for organizations under $10M annual revenue
This legal clarity is a key reason we chose LTX-Video over alternatives with ambiguous data sourcing