Korean Research, tools & pedagogy

Designing a Korean Error Corpus From Your Own Mistakes

The reader can turn personal Korean mistakes into a searchable error corpus that reveals patterns and supports targeted improvement.

Published May 21, 2026 Korean

Why this matters

Most learners waste their mistakes.

A tutor corrects a sentence. A language partner rewrites a message. A teacher marks an essay. A speech-recognition tool fails to understand a phrase. The learner nods, feels briefly enlightened, and then the correction vanishes into a chat log, notebook margin, or memory fog.

The mistake was useful for ten seconds. Then it stopped being data.

A personal Korean error corpus changes that. It is a structured record of your recurring mistakes: wrong sentence, corrected sentence, context, error type, explanation, date, and follow-up practice. The goal is not to collect shame. The goal is to detect patterns.

One particle error may be random. Twenty particle errors sorted by context reveal a grammar problem. One awkward email sentence is annoying. Ten awkward email sentences tagged by register reveal a workplace Korean problem. One misheard phrase is normal. Fifty misheard phrases categorized by sound change reveal a listening problem.

What counts as an error?

An error is not only a red-marked grammar mistake. For serious Korean study, include several types.

Error type	Korean label	Example problem
Particle	조사 오류	은/는 vs 이/가, 을/를 omission, 에 vs 에서
Ending	어미 오류	wrong connective, stiff final ending, mismatched politeness
Word choice	단어 선택	using 연령 where 나이 fits, or 고치다 where 수정하다 fits
Spacing	띄어쓰기	할수 있다 instead of 할 수 있다
Honorifics	높임법	subject honorific missing or overused
Speech level	말높임	switching 해요체 and 합쇼체 inconsistently
Pronunciation	발음	liaison, tensification, batchim, vowel contrast
Collocation	연어	grammatical but unnatural word pairing
Word order	어순	English-shaped sequencing
Omitted information	정보 누락	sentence assumes context the listener does not have
Register	사용역	casual word in formal document, bureaucratic word in friendly chat
Pragmatics	화용	sentence is grammatically correct but socially wrong

A personal corpus should preserve why the sentence failed, not merely what the correct sentence was.

The minimum viable corpus

Start with a spreadsheet. Do not overengineer the first version.

Use these fields:

Field	Purpose
ID	Unique number for the error
Date	When it occurred
Source	Tutor, teacher, chat, essay, speaking, self-recording, corpus check
Context	What you were trying to say and to whom
Wrong sentence	Your original Korean
Corrected sentence	Better Korean
Error type	Particle, ending, word choice, etc.
Explanation	Why the correction matters
Severity	Low, medium, high
Recurrence	First time, repeated, fossilized
Follow-up example	A new sentence using the repaired pattern
Review date	When to revisit it
Status	Open, practiced, improved, stable

A simple entry might look like this:

Field	Entry
Wrong sentence	저는 어제 친구에 만났어요.
Corrected sentence	저는 어제 친구를 만났어요.
Error type	Particle
Explanation	만나다 takes the person met as an object, so 친구를 is expected.
Follow-up	내일 선생님을 만날 예정이에요.
Status	Open

A more advanced entry:

Field	Entry
Wrong sentence	고객님, 이 문제는 제가 생각해 볼게요.
Corrected sentence	고객님, 이 문제는 내부적으로 확인 후 다시 안내드리겠습니다.
Error type	Register / workplace service Korean
Explanation	생각해 볼게요 sounds personal and informal for customer-service context; 확인 후 안내드리겠습니다 is more institutional and appropriate.
Follow-up	요청하신 내용은 담당 부서 확인 후 안내드리겠습니다.
Status	Practiced

The second example is not a basic grammar correction. It is exactly the kind of error advanced learners need to track.

Build categories that produce action

A bad error category is too vague:

grammar

A useful error category points toward a drill:

이/가 vs 은/는 in contrastive contexts

에 vs 에서 with activity-location verbs

formal email closing formula

noun-modifying clause length too long

spacing around bound nouns: 수, 것, 때, 중

Categories should help you decide what to practice next.

Use a two-level taxonomy:

Error type: Particle
Subtype: location particle
Pattern: 에 vs 에서
Trigger: activity verb misread as destination verb
Practice: 20 sentences with 가다/오다/있다/공부하다/회의하다

or:

Error type: Register
Subtype: workplace request
Pattern: too direct for superior/client
Trigger: literal translation from English “Please send...”
Practice: rewrite with 확인 부탁드립니다 / 공유해 주시면 감사하겠습니다 / 가능하실 때

From collection to diagnosis

After 30–50 entries, start analyzing.

Ask:

Which error type appears most often?
Which errors recur after correction?
Which errors happen only in writing?
Which errors happen only in speech?
Which errors appear under pressure?
Which errors are low-level but persistent?
Which errors create social or professional risk?

A learner may discover:

Pattern	Evidence	Action
Particle errors cluster around motion and location	에/에서/으로 mistakes in 12 entries	Build location-particle drill set
Honorifics fail in workplace writing	inconsistent 드리다/주시다/계시다	Make role-map templates
Writing is grammatically correct but translationese	too many explicit subjects and passive forms	Mine native Korean examples from notices/emails
Listening errors come from sound change	못 읽다, 한국말, 연락해요 misheard	Build audio cards for liaison/tensification

The corpus turns vague frustration into a queue.

Weekly remediation loop

Use a 45-minute weekly review.

Sort new errors by type.
Pick the top two recurring patterns.
Write a plain-language rule for each.
Find or create five correct examples.
Produce five new sentences or short spoken responses.
Get at least one checked if possible.
Mark the original errors as practiced, not solved.
Re-test after one week.

Do not try to fix everything. The point of a corpus is prioritization.

Privacy and safety

A personal error corpus may contain private chats, workplace details, health information, immigration details, or identifying names. Treat it carefully.

Rules:

Replace real names with placeholders: [친구], [회사], [병원].
Do not store private messages without consent if you plan to share the corpus.
Avoid saving full screenshots when a typed excerpt is enough.
For speech recordings, store only short clips needed for analysis.
Keep medical, legal, immigration, and employment content in a private file.
Never publish corrections from another person’s message without permission.

A useful corpus does not need to expose your life.

Sample error entries by level

Beginner / lower intermediate

Wrong:

저는 한국어 공부해요.

Better:

저는 한국어를 공부해요.

Note: Object marker omission may be acceptable in some casual contexts, but at this stage the learner needs to learn the transitive frame 공부하다 + 을/를.

Intermediate

Wrong:

제가 시간이 있으면 연락할게요.

Better, depending on intent:

시간이 되면 연락드리겠습니다. 시간이 괜찮으시면 연락드리겠습니다.

Note: The original sentence may sound speaker-centered. The repair depends on whether the speaker is promising contact or asking permission.

Advanced

Wrong:

본 서비스는 사용자의 정보를 모읍니다.

Better:

본 서비스는 서비스 제공을 위해 이용자의 정보를 수집·이용합니다.

Note: 모으다 is too general and informal for privacy-policy register. 수집·이용하다 is domain-appropriate.

Suggested interactive/tool module

Tool name: Korean Personal Error Corpus Builder

Core functions:

Form-based entry for wrong sentence, correction, context, and error type.
Tag system: 조사, 어미, 띄어쓰기, 높임법, 연어, 사용역, 발음, 담화.
Dashboard showing recurring error types.
Drill generator that turns corrected sentences into cloze, rewrite, and production tasks.
Privacy toggle that masks names and sensitive fields.

Good tool behavior:

Encourage short, high-quality entries over hoarding.
Distinguish “needs teacher confirmation” from “confirmed correction.”
Let learners mark uncertainty rather than forcing fake certainty.

Use Korean Learners’ Corpus resources as a model for how learner errors can be categorized, but keep the personal workflow simple.
Cross-reference NIKL dictionary examples when explaining collocation and register.
Treat automated grammar correction as a weak signal unless verified by a teacher, native speaker, or strong corpus evidence.

QA checklist

Does the article turn mistakes into a practice system, not shame?
Does it include register and pragmatic errors, not only grammar?
Does it provide a real schema?
Does it include privacy guidance?
Does it show how to convert corpus patterns into drills?

Remediation and upgrade layer: errors are data only if they are structured

The first draft introduces the personal error corpus. The upgrade must prevent the most common failure: learners collect corrections like souvenirs and never turn them into behavior change.

An error corpus should answer four questions:

What did I produce?
What was the better version?
What kind of error was it?
What drill will make this less likely next time?

If the corpus cannot answer the fourth question, it is not yet a remediation tool.

Add a stricter error taxonomy

The article should keep the categories practical. Too many categories make the system unusable; too few hide patterns.

Error type	Korean label	Example problem	Repair action
Particle	조사 오류	학교를 가요 for 학교에 가요	collect verb-place frames
Ending	어미 오류	만나면 좋겠어요 used where 만나서 좋았어요 is needed	contrast cause/condition/result endings
Word choice	단어 선택	시키다 used where 하게 하다 or 주문하다 is needed	build object-type cards
Collocation	연어 오류	강한 비 instead of 많은 비 / 폭우	collect adjective+noun pairs
Spacing	띄어쓰기	할수있다 for 할 수 있다	use spelling/norm checker + pattern notes
Honorifics	높임법 오류	선생님이 말했어요 in a context needing 말씀하셨어요	role-map drill
Speech level	말높임 오류	반말 mixed into workplace email	rewrite by audience
Register	사용역 오류	너무 casual word in formal document	tag source genre
Word order / focus	어순·초점 오류	English-shaped topic order	rewrite from predicate outward
Omission	정보 누락	missing object, time, or relation because English context supplied it	context completeness check
Pronunciation/listening	발음·청취 오류	못 들어요 vs 못 들었어요; batchim liaison missed	record and compare

This taxonomy should be introduced as a default, not a prison. Learners can merge categories if they do not generate enough data.

Add a minimum viable corpus schema

A spreadsheet is enough. The article should explicitly discourage starting with a complicated database unless the learner already likes that kind of tool.

id
date
source_context
mode: writing / speaking / chat / tutor / self-recording
wrong_version
corrected_version
error_type
subtype
explanation
trigger: why I made the mistake
native_or_teacher_evidence
follow_up_drill
next_review_date
status: open / drilling / improved / retired
privacy_level

Example entry:

Field	Entry
wrong_version	저는 어제 친구를 만났으면 카페에 갔어요.
corrected_version	저는 어제 친구를 만나서 카페에 갔어요.
error_type	ending
subtype	condition vs sequence/cause
trigger	transferred English “when/after meeting” too loosely
follow_up_drill	make 10 sentences contrasting -면, -고, -아서/어서
status	open

The key is the trigger field. Without it, the learner only knows what was wrong, not why it keeps happening.

Add a weekly remediation loop

The article should give a rhythm.

Monday: collect errors without overanalyzing.
Tuesday: classify the top five.
Wednesday: find native examples for the top two patterns.
Thursday: build drills or cards.
Friday: produce new sentences using the repaired pattern.
Weekend: review whether the same error reappeared.

This turns mistakes into a cycle: collect → classify → verify → drill → produce → recheck.

Add a “do not hoard errors” warning

A personal error corpus can become emotionally toxic if learners treat it as a wall of failure. The article should say plainly: the goal is not to preserve every mistake. The goal is to find recurring patterns and remove them.

Retire an error when:

it has not appeared in a month of similar output;
the learner can explain the repair without looking;
the learner has produced three correct examples in real contexts;
the error no longer blocks communication or register control.

Archive, do not worship.

Add a pattern-analysis example

After 40 entries, a learner sorts by type and sees:

Error type	Count	Interpretation	Action
particle	14	not random; place/event/object frames unstable	study verbs with required particles
spacing	8	mechanical but recurring	run spelling review and card common chunks
register	6	too much textbook/casual mixing	tag source genre before writing
honorifics	5	role mapping weak	create address-term scenarios
word choice	7	English glosses too broad	use dictionary examples and collocations

This is the moment the corpus becomes useful. The learner no longer says “my Korean is bad.” The learner says “my next month is particles and register.”

Because the corpus may include chat messages, tutoring corrections, names, workplaces, or personal stories, the article needs a stronger privacy section.

Minimum privacy rules:

Replace names with placeholders: [친구], [회사], [선생님].
Do not store other people's private messages without permission.
Do not upload sensitive learner data into public AI tools or public notebooks.
Separate language examples from personal details.
Mark entries as private, anonymized, or publishable.

Example anonymization:

Before:

민지 씨한테 삼성동 사무실 계약서 보내줬어요.

After:

[동료]에게 [장소] 사무실 계약서를 보내 줬어요.

The grammar remains useful. The private content disappears.

Add a “correction quality” field

Not all corrections are equal. Tutors, friends, AI tools, grammar checkers, and native speakers may correct different things. The article should teach learners to mark correction reliability.

Source	Strength	Risk
trained teacher	explanation and level fit	may simplify for pedagogy
native friend	naturalness judgment	may not explain rule
grammar checker	spelling/spacing	misses context and register
corpus/dictionary	usage evidence	requires interpretation
AI tool	fast alternatives	may hallucinate or overcorrect
published source	authentic model	may not match learner's intended context

Add fields:

correction_source:
confidence: high / medium / low
needs_verification: yes/no
verified_with:

Module name: Korean Error Corpus Builder

Core functions:

Add wrong and corrected sentence.
Select error type and subtype.
Store context and register.
Prompt the learner for a trigger hypothesis.
Generate a follow-up drill template.
Show recurring patterns by month.
Allow anonymization fields.

Dashboard views:

Error frequency by type.
Error frequency by source context: chat, essay, speech, email, reading summary.
Open errors vs retired errors.
Top three remediation targets.
Examples needing verification.

Drill generation examples:

Particle error → generate verb-frame table.
Ending error → generate contrast sentences.
Register error → generate casual/formal rewrites.
Honorific error → generate role-map scenarios.
Collocation error → generate native-example search prompts.

Designing a Korean Error Corpus From Your Own Mistakes

Why this matters

What counts as an error?

The minimum viable corpus

Build categories that produce action

From collection to diagnosis

Weekly remediation loop

Privacy and safety

Sample error entries by level

Beginner / lower intermediate

Intermediate

Advanced

Suggested interactive/tool module

QA checklist

Remediation and upgrade layer: errors are data only if they are structured

Add a stricter error taxonomy

Add a minimum viable corpus schema

Add a weekly remediation loop

Add a “do not hoard errors” warning

Add a pattern-analysis example

Add a “correction quality” field

Related reading

When CJK Comparison Helps Korean Learners and When It Becomes Noise

Using Speech Recognition Carefully for Korean Pronunciation

Hanja Beneath Hangul: The Hidden Sino-Korean Layer

Sentence Rhythm in Korean: Eojeol, Particles, and Breath Groups

Causatives in Korean: Authority, Permission, and Agency

How to Mine Korean Sentences Without Collecting Translationese

Designing a Korean Error Corpus From Your Own Mistakes

Why this matters

What counts as an error?

The minimum viable corpus

Build categories that produce action

From collection to diagnosis

Weekly remediation loop

Privacy and safety

Sample error entries by level

Beginner / lower intermediate

Intermediate

Advanced

Suggested interactive/tool module

QA checklist

Remediation and upgrade layer: errors are data only if they are structured

Add a stricter error taxonomy

Add a minimum viable corpus schema

Add a weekly remediation loop

Add a “do not hoard errors” warning

Add a pattern-analysis example

Add privacy and consent rules

Add a “correction quality” field

Related reading

When CJK Comparison Helps Korean Learners and When It Becomes Noise

Using Speech Recognition Carefully for Korean Pronunciation

Hanja Beneath Hangul: The Hidden Sino-Korean Layer

Sentence Rhythm in Korean: Eojeol, Particles, and Breath Groups

Causatives in Korean: Authority, Permission, and Agency

How to Mine Korean Sentences Without Collecting Translationese