Inkuntri
Chinese Research, tools & pedagogy

Creating Domain-Specific Chinese Glossaries From Source Texts

The reader can build Chinese glossaries for specific domains by extracting terms from real documents, defining them from context, and organizing them for reuse.

Published April 29, 2026 Chinese

Why this article matters

A domain glossary is not a vocabulary dump. It is a tool for reading a real set of texts: contracts, product pages, hospital forms, app policies, research papers, standards, job ads, menus, or transcripts. The goal is reuse, not collection.

Glossary field map

FieldWhy it matters
TermChinese form, including variants/abbreviations.
PinyinOnly when pronunciation matters or is uncertain.
DefinitionPlain-language meaning in this domain.
Source sentencePrevents false abstraction.
DomainLegal, medical, UI, education, logistics, etc.
Registerofficial, technical, colloquial, marketing, platform, etc.
Related termsBuilds word families and process maps.
TranslationContext-specific, not universal.
Confidenceverified, tentative, needs expert review.

The article

Domain vocabulary is learned best from source texts. If you need to read Chinese job ads, build a job-ad glossary from job ads. If you need Chinese medical forms, build from intake forms and hospital pages. If you need AI infrastructure Chinese, build from product docs, policy texts, and technical explainers. A general word list will not show genre behavior.

Choose sources before choosing terms. A glossary built from random web snippets will be uneven. Better source sets include five product pages from the same category, ten public notices from the same agency type, three contracts of the same genre, or one technical standard plus related manuals. The source set defines the domain.

Term extraction should be selective. Keep repeated terms, technical labels, official field names, unclear compounds, key collocations, abbreviations, and translation-risk terms. Do not include every unknown word. If a word is general and easy to understand from context, it may not belong in the domain glossary.

Definitions should come from context first. Suppose your term is 备案. In one domain it may mean filing/recordation; in another it may be regulatory registration; in an app context it might appear as ICP备案. The glossary entry should say where and how the term functions. A single English gloss is not enough.

Related terms are the hidden power of the glossary. For e-commerce, 售后 connects to 退款, 退货, 换货, 仅退款, 退货退款, 运费险. For legal documents, 义务 connects to 责任, 权利, 违约, 赔偿, 承担. For cloud computing, 部署 connects to 配置, 实例, 镜像, 容器, 监控. The learner should see systems, not isolated labels.

A glossary also needs maintenance. Merge duplicates. Mark outdated terms. Split broad terms into domain-specific entries. Add source examples. Flag region-specific or organization-specific usage. Delete low-value entries that never recur.

Extraction decision tree

Keep a term if it meets at least one condition:

  1. It appears repeatedly in the source set.
  2. It labels a field, role, procedure, status, or document section.
  3. It has a domain-specific meaning not obvious from general Chinese.
  4. It is a collocation needed for reading the genre.
  5. It creates translation risk.
  6. It belongs to a process map.

Worked example: app privacy glossary

TermDomain definitionRelated terms
个人信息Information relating to identifiable individuals敏感个人信息, 处理, 收集
处理Collection/use/storage/sharing/deletion etc. in privacy context收集, 使用, 共享, 删除
授权User permission/authorization同意, 拒绝, 撤回
出境Cross-border transfer/export of personal information境外接收方, 安全评估

Learner traps and repairs

TrapWhy it hurtsBetter habit
Adding every unknown wordGlossary becomes unusable.Extract terms with domain value.
Using universal translationsDomain meanings vary.Define in source context.
No source sentenceLater you cannot verify usage.Save one real sentence.
No confidence labelTentative guesses look final.Mark verified/tentative/expert-needed.
Ignoring related termsYou learn labels but not systems.Build process maps.

Practice protocol

Choose a source set of 5–10 texts in one domain. Extract 30 candidate terms. Cut to 15. For each, add source sentence, domain definition, related terms, and confidence. Review the glossary by reading a new text in the same domain.

Additional practice and repair

Glossary diagnostics

Bad glossary habitWhy it failsRepair
Adding every unknown wordGlossary becomes unreadable.Add repeated, domain-critical, or translation-risk terms.
One English equivalent onlyDomain terms rarely map one-to-one.Include definition, source sentence, and confidence.
No source traceTerms lose register and authority.Save source title, URL/file, date, and document type.
Mixing domainsMeanings blur across law, medicine, tech, policy.Tag domain and subdomain.
Never retiring entriesLow-value terms clutter review.Merge, archive, or delete.

Entry template upgrade

FieldRequirement
TermChinese form; include variants/abbreviations if relevant.
PinyinUseful for oral review, not a substitute for examples.
Plain definitionDefine in the project’s context.
Source sentencePreserve enough context to prove usage.
Domain/registerLegal, medical, AI, e-commerce, public notice, etc.
Related termsOpposites, parent category, near-synonyms, abbreviations.
TranslationProvisional, final, or do-not-translate.
ConfidenceLow/medium/high with reason.

Before/after repair set

Weak entryStrong entry
算力 = computing power算力: computing capacity in AI/cloud policy/product texts; collocates 算力基础设施, 算力需求, 算力中心. Confidence medium; verify by source.
风控 = risk control风控: fintech/platform risk-control system/process; not generic “be careful.” Related 反洗钱, 实名认证, 可疑交易.
结项 = finish结项: formal grant/project closure after review/acceptance, not ordinary ending.

The glossary builder should support duplicate detection, source snippets, confidence levels, domain tags, variant forms, and export to reading tools or flashcards. Add a “translation risk” flag for terms that look easy but are domain-specific.

Practice visualization

Build a glossary-builder template with import, extraction, tagging, definition, source sentence, related-term graph, review, and export stages. Include warning flags for terms with high translation risk.

Check workflow claims against translation/glossary documentation and domain-terminology practices. Make clear that glossary building improves reading, not professional qualification in a domain.

Related reading