Inkuntri
Japanese Research, tools & pedagogy

How to Use Japanese Corpora Without Mistaking Frequency for Importance

The reader can use Japanese corpora responsibly by distinguishing frequency, distribution, register, genre, collocation, learner usefulness, and curricular importance.

Published April 23, 2026 Japanese

Core examples: 頻度, コーパス, 用例, 共起, ジャンル, レジスター, 表記揺れ, 活用形, 機能語, 新聞, 会話, 専門用語.

Frequency is evidence, not a curriculum

A corpus says a word appears 50,000 times. Another word appears 300 times. Which should you learn first?

The naive answer is: learn the frequent word.

The serious answer is: it depends.

A highly frequent function word may already be known. A low-frequency technical term may be vital if you are reading immigration forms, lab manuals, or medical labels. A word may be frequent only in newspapers, not conversation. Another may appear often but in fixed phrases. A form may be rare but important for recognition.

The key principle is:

Corpus frequency helps you ask better questions. It does not decide learning priority by itself.

コーパス

コーパス

means corpus: a structured collection of texts or speech data used for language analysis.

Japanese corpora may include:

  • newspaper text,
  • spoken conversation,
  • books,
  • web text,
  • legal documents,
  • academic writing,
  • subtitles,
  • learner language,
  • balanced samples across genres.

Learner action: always ask what texts are inside the corpus.

頻度

頻度

means frequency.

Related:

出現頻度 occurrence frequency

高頻度語 high-frequency word

低頻度語 low-frequency word

Frequency counts how often something occurs in the corpus. It does not automatically tell you usefulness.

Example problem:

する

will be extremely frequent. That does not mean you should make one flashcard for する and stop studying Japanese.

用例

用例

means usage example.

A corpus is valuable because it gives 用例: real contexts where a word or phrase occurs.

Learner action: inspect examples before trusting a dictionary gloss.

A word’s frequency is less useful than seeing:

  • what particles it takes,
  • what nouns it modifies,
  • what genre it appears in,
  • whether it is spoken/written,
  • whether it is technical or casual,
  • what verbs it combines with.

共起

共起

means co-occurrence/collocation.

Examples:

強い雨 heavy rain

高い可能性 high possibility

対策を講じる take measures

申請を行う submit/make an application

Collocation tells you what words naturally appear together.

Learner action: learn words with common neighbors.

ジャンル and レジスター

ジャンル

genre.

レジスター

register.

A word may be frequent in one genre but awkward in another.

Examples:

Word/phraseLikely genre/register
申し上げますformal/business/service
めっちゃcasual/spoken/media
施策government/policy
やばいcasual/reaction
だと考えられるacademic/formal
お控えくださいpublic-service/signage

Learner action: tag corpus examples by genre.

表記揺れ

表記揺れ

means spelling/orthographic variation.

Examples:

取り扱い / 取扱い / 取扱 handling

問い合わせ / お問い合わせ / 問合せ inquiry

コンピューター / コンピュータ computer

If you search only one spelling, you may miss examples.

Learner action: search variants.

活用形

活用形

means inflected/conjugated form.

A corpus search for:

食べる

may miss:

食べた 食べない 食べられる 食べている

depending tool.

Learner action: know whether the corpus searches surface forms or lemmas.

機能語

機能語

means function word: particles, auxiliaries, connectors, grammatical items.

Examples:

は が に て こと もの よう

Function words are very frequent, but their importance lies in grammar and discourse, not lexical meaning.

Learner action: frequent function words require pattern study, not one-word memorization.

新聞, 会話, 専門用語

新聞

newspaper.

会話

conversation.

専門用語

technical/specialist term.

A word can rank high in newspapers and low in conversation. A specialist term can be low-frequency overall but essential in its domain.

Example:

施行

may be common enough in legal/government prose but not everyday conversation.

Learner action: frequency must be interpreted by learner goal.

Dispersion

A word can be frequent because it appears everywhere, or because one text repeats it constantly.

Example:

  • Word A appears once in 1,000 different texts.
  • Word B appears 1,000 times in one legal document.

Both may show 1,000 occurrences. They are not equally general.

Learner action: check distribution/dispersion if the corpus provides it.

Frequency traps

TrapExplanation
corpus mismatchcorpus genre differs from your goal
function-word overloadhigh frequency but hard grammar
proper-name inflationnames or places appear often due to news topic
topic burstone event causes temporary frequency
domain blindnessrare overall but crucial in target domain
surface-form missinflected forms not counted together
spelling variationsearch misses variants
production mistakerecognizing a word does not mean you should use it

Learner priority is goal-dependent

A travel learner needs:

  • station signs,
  • menus,
  • hotel phrases,
  • weather/disaster notices,
  • polite requests.

A researcher needs:

  • academic connectors,
  • citation language,
  • argument verbs,
  • field terminology.

A resident needs:

  • municipal notices,
  • medical forms,
  • school communication,
  • banking and insurance terms.

A corpus cannot choose this for you.

Example bank walkthrough

頻度

Frequency.

Learner action: evidence, not curriculum.

コーパス

Corpus.

Learner action: check source composition.

用例

Usage example.

Learner action: inspect context.

共起

Co-occurrence/collocation.

Learner action: learn natural word partners.

ジャンル

Genre.

Learner action: source type matters.

レジスター

Register.

Learner action: formality and social setting.

表記揺れ

Spelling variation.

Learner action: search variants.

活用形

Inflected form.

Learner action: lemma versus surface search.

機能語

Function word.

Learner action: grammar pattern study.

新聞

Newspaper.

Learner action: news-register bias.

会話

Conversation.

Learner action: spoken-language source.

専門用語

Technical term.

Learner action: domain importance.

Corpus workflow

When using a Japanese corpus:

  1. Define your question.
  2. Check corpus source/genre.
  3. Search base form and variants.
  4. Check inflected forms if needed.
  5. Inspect examples, not just counts.
  6. Look for collocations.
  7. Check distribution across texts/genres.
  8. Identify register.
  9. Compare with dictionary definitions.
  10. Decide: learn for production, recognition, domain glossary, or defer.

Frequency interpretation table

Corpus frequency needs interpretation.

Corpus resultPossible meaningWhat to check
very frequentcore word or function wordknown already? grammar-heavy?
frequent in one genregenre-specific termdistribution
rare overallmaybe unimportant or domain-criticallearner goal
burst frequencytied to recent event/topictime period
many spelling forms表記揺れvariant search
many inflections活用形 issuelemma search
common collocationphrase worth learning共起 examples
mostly proper nounstopic/name effectsource context

Frequency is a clue, not a command.

Usefulness score

Before adding a corpus result to study, rate it:

  1. appears in your target genre,
  2. appears across multiple sources,
  3. has useful collocations,
  4. fills a known comprehension gap,
  5. is needed for active production,
  6. is domain-critical even if rare,
  7. has clear register.

A low-frequency immigration, medical, or safety term may beat a high-frequency word you already understand.

Corpus question discipline

Good corpus questions:

What words co-occur with 申請? Is お控えください public-sign language or ordinary conversation? Does this expression appear in news, blogs, or official pages?

Weak corpus question:

Should I learn this word?

The corpus gives evidence. The learner chooses priority.

A strong tool for this article would keep learners from overreading frequency.

Suggested functions:

  1. Frequency display.
  2. Genre distribution panel.
  3. Collocation list.
  4. Example sentence viewer.
  5. Spelling-variant search.
  6. Register labels.
  7. Learner-priority decision field.

Final rule

Japanese corpus work is powerful when it is disciplined.

頻度 tells how often. 用例 shows how. 共起 shows with what. ジャンル and レジスター show where. 表記揺れ and 活用形 prevent search errors. 専門用語 reminds you that importance is not only frequency.

Use corpora to investigate Japanese, not to outsource judgment.

Related reading