Inkuntri
Korean Research, tools & pedagogy

Designing a Korean Error Corpus From Your Own Mistakes

The reader can turn personal Korean mistakes into a searchable error corpus that reveals patterns and supports targeted improvement.

Published May 21, 2026 Korean

Why this matters

Most learners waste their mistakes.

A tutor corrects a sentence. A language partner rewrites a message. A teacher marks an essay. A speech-recognition tool fails to understand a phrase. The learner nods, feels briefly enlightened, and then the correction vanishes into a chat log, notebook margin, or memory fog.

The mistake was useful for ten seconds. Then it stopped being data.

A personal Korean error corpus changes that. It is a structured record of your recurring mistakes: wrong sentence, corrected sentence, context, error type, explanation, date, and follow-up practice. The goal is not to collect shame. The goal is to detect patterns.

One particle error may be random. Twenty particle errors sorted by context reveal a grammar problem. One awkward email sentence is annoying. Ten awkward email sentences tagged by register reveal a workplace Korean problem. One misheard phrase is normal. Fifty misheard phrases categorized by sound change reveal a listening problem.

What counts as an error?

An error is not only a red-marked grammar mistake. For serious Korean study, include several types.

Error typeKorean labelExample problem
Particle조사 오류은/는 vs 이/가, 을/를 omission, 에 vs 에서
Ending어미 오류wrong connective, stiff final ending, mismatched politeness
Word choice단어 선택using 연령 where 나이 fits, or 고치다 where 수정하다 fits
Spacing띄어쓰기할수 있다 instead of 할 수 있다
Honorifics높임법subject honorific missing or overused
Speech level말높임switching 해요체 and 합쇼체 inconsistently
Pronunciation발음liaison, tensification, batchim, vowel contrast
Collocation연어grammatical but unnatural word pairing
Word order어순English-shaped sequencing
Omitted information정보 누락sentence assumes context the listener does not have
Register사용역casual word in formal document, bureaucratic word in friendly chat
Pragmatics화용sentence is grammatically correct but socially wrong

A personal corpus should preserve why the sentence failed, not merely what the correct sentence was.

The minimum viable corpus

Start with a spreadsheet. Do not overengineer the first version.

Use these fields:

FieldPurpose
IDUnique number for the error
DateWhen it occurred
SourceTutor, teacher, chat, essay, speaking, self-recording, corpus check
ContextWhat you were trying to say and to whom
Wrong sentenceYour original Korean
Corrected sentenceBetter Korean
Error typeParticle, ending, word choice, etc.
ExplanationWhy the correction matters
SeverityLow, medium, high
RecurrenceFirst time, repeated, fossilized
Follow-up exampleA new sentence using the repaired pattern
Review dateWhen to revisit it
StatusOpen, practiced, improved, stable

A simple entry might look like this:

FieldEntry
Wrong sentence저는 어제 친구에 만났어요.
Corrected sentence저는 어제 친구를 만났어요.
Error typeParticle
Explanation만나다 takes the person met as an object, so 친구를 is expected.
Follow-up내일 선생님을 만날 예정이에요.
StatusOpen

A more advanced entry:

FieldEntry
Wrong sentence고객님, 이 문제는 제가 생각해 볼게요.
Corrected sentence고객님, 이 문제는 내부적으로 확인 후 다시 안내드리겠습니다.
Error typeRegister / workplace service Korean
Explanation생각해 볼게요 sounds personal and informal for customer-service context; 확인 후 안내드리겠습니다 is more institutional and appropriate.
Follow-up요청하신 내용은 담당 부서 확인 후 안내드리겠습니다.
StatusPracticed

The second example is not a basic grammar correction. It is exactly the kind of error advanced learners need to track.

Build categories that produce action

A bad error category is too vague:

grammar

A useful error category points toward a drill:

이/가 vs 은/는 in contrastive contexts

에 vs 에서 with activity-location verbs

formal email closing formula

noun-modifying clause length too long

spacing around bound nouns: 수, 것, 때, 중

Categories should help you decide what to practice next.

Use a two-level taxonomy:

Error type: Particle
Subtype: location particle
Pattern: 에 vs 에서
Trigger: activity verb misread as destination verb
Practice: 20 sentences with 가다/오다/있다/공부하다/회의하다

or:

Error type: Register
Subtype: workplace request
Pattern: too direct for superior/client
Trigger: literal translation from English “Please send...”
Practice: rewrite with 확인 부탁드립니다 / 공유해 주시면 감사하겠습니다 / 가능하실 때

From collection to diagnosis

After 30–50 entries, start analyzing.

Ask:

  • Which error type appears most often?
  • Which errors recur after correction?
  • Which errors happen only in writing?
  • Which errors happen only in speech?
  • Which errors appear under pressure?
  • Which errors are low-level but persistent?
  • Which errors create social or professional risk?

A learner may discover:

PatternEvidenceAction
Particle errors cluster around motion and location에/에서/으로 mistakes in 12 entriesBuild location-particle drill set
Honorifics fail in workplace writinginconsistent 드리다/주시다/계시다Make role-map templates
Writing is grammatically correct but translationesetoo many explicit subjects and passive formsMine native Korean examples from notices/emails
Listening errors come from sound change못 읽다, 한국말, 연락해요 misheardBuild audio cards for liaison/tensification

The corpus turns vague frustration into a queue.

Weekly remediation loop

Use a 45-minute weekly review.

  1. Sort new errors by type.
  2. Pick the top two recurring patterns.
  3. Write a plain-language rule for each.
  4. Find or create five correct examples.
  5. Produce five new sentences or short spoken responses.
  6. Get at least one checked if possible.
  7. Mark the original errors as practiced, not solved.
  8. Re-test after one week.

Do not try to fix everything. The point of a corpus is prioritization.

Privacy and safety

A personal error corpus may contain private chats, workplace details, health information, immigration details, or identifying names. Treat it carefully.

Rules:

  • Replace real names with placeholders: [친구], [회사], [병원].
  • Do not store private messages without consent if you plan to share the corpus.
  • Avoid saving full screenshots when a typed excerpt is enough.
  • For speech recordings, store only short clips needed for analysis.
  • Keep medical, legal, immigration, and employment content in a private file.
  • Never publish corrections from another person’s message without permission.

A useful corpus does not need to expose your life.

Sample error entries by level

Beginner / lower intermediate

Wrong:

저는 한국어 공부해요.

Better:

저는 한국어를 공부해요.

Note: Object marker omission may be acceptable in some casual contexts, but at this stage the learner needs to learn the transitive frame 공부하다 + 을/를.

Intermediate

Wrong:

제가 시간이 있으면 연락할게요.

Better, depending on intent:

시간이 되면 연락드리겠습니다. 시간이 괜찮으시면 연락드리겠습니다.

Note: The original sentence may sound speaker-centered. The repair depends on whether the speaker is promising contact or asking permission.

Advanced

Wrong:

본 서비스는 사용자의 정보를 모읍니다.

Better:

본 서비스는 서비스 제공을 위해 이용자의 정보를 수집·이용합니다.

Note: 모으다 is too general and informal for privacy-policy register. 수집·이용하다 is domain-appropriate.

Suggested interactive/tool module

Tool name: Korean Personal Error Corpus Builder

Core functions:

  • Form-based entry for wrong sentence, correction, context, and error type.
  • Tag system: 조사, 어미, 띄어쓰기, 높임법, 연어, 사용역, 발음, 담화.
  • Dashboard showing recurring error types.
  • Drill generator that turns corrected sentences into cloze, rewrite, and production tasks.
  • Privacy toggle that masks names and sensitive fields.

Good tool behavior:

  • Encourage short, high-quality entries over hoarding.
  • Distinguish “needs teacher confirmation” from “confirmed correction.”
  • Let learners mark uncertainty rather than forcing fake certainty.
  • Use Korean Learners’ Corpus resources as a model for how learner errors can be categorized, but keep the personal workflow simple.
  • Cross-reference NIKL dictionary examples when explaining collocation and register.
  • Treat automated grammar correction as a weak signal unless verified by a teacher, native speaker, or strong corpus evidence.

QA checklist

  • Does the article turn mistakes into a practice system, not shame?
  • Does it include register and pragmatic errors, not only grammar?
  • Does it provide a real schema?
  • Does it include privacy guidance?
  • Does it show how to convert corpus patterns into drills?

Remediation and upgrade layer: errors are data only if they are structured

The first draft introduces the personal error corpus. The upgrade must prevent the most common failure: learners collect corrections like souvenirs and never turn them into behavior change.

An error corpus should answer four questions:

  1. What did I produce?
  2. What was the better version?
  3. What kind of error was it?
  4. What drill will make this less likely next time?

If the corpus cannot answer the fourth question, it is not yet a remediation tool.

Add a stricter error taxonomy

The article should keep the categories practical. Too many categories make the system unusable; too few hide patterns.

Error typeKorean labelExample problemRepair action
Particle조사 오류학교를 가요 for 학교에 가요collect verb-place frames
Ending어미 오류만나면 좋겠어요 used where 만나서 좋았어요 is neededcontrast cause/condition/result endings
Word choice단어 선택시키다 used where 하게 하다 or 주문하다 is neededbuild object-type cards
Collocation연어 오류강한 비 instead of 많은 비 / 폭우collect adjective+noun pairs
Spacing띄어쓰기할수있다 for 할 수 있다use spelling/norm checker + pattern notes
Honorifics높임법 오류선생님이 말했어요 in a context needing 말씀하셨어요role-map drill
Speech level말높임 오류반말 mixed into workplace emailrewrite by audience
Register사용역 오류너무 casual word in formal documenttag source genre
Word order / focus어순·초점 오류English-shaped topic orderrewrite from predicate outward
Omission정보 누락missing object, time, or relation because English context supplied itcontext completeness check
Pronunciation/listening발음·청취 오류못 들어요 vs 못 들었어요; batchim liaison missedrecord and compare

This taxonomy should be introduced as a default, not a prison. Learners can merge categories if they do not generate enough data.

Add a minimum viable corpus schema

A spreadsheet is enough. The article should explicitly discourage starting with a complicated database unless the learner already likes that kind of tool.

id
date
source_context
mode: writing / speaking / chat / tutor / self-recording
wrong_version
corrected_version
error_type
subtype
explanation
trigger: why I made the mistake
native_or_teacher_evidence
follow_up_drill
next_review_date
status: open / drilling / improved / retired
privacy_level

Example entry:

FieldEntry
wrong_version저는 어제 친구를 만났으면 카페에 갔어요.
corrected_version저는 어제 친구를 만나서 카페에 갔어요.
error_typeending
subtypecondition vs sequence/cause
triggertransferred English “when/after meeting” too loosely
follow_up_drillmake 10 sentences contrasting -면, -고, -아서/어서
statusopen

The key is the trigger field. Without it, the learner only knows what was wrong, not why it keeps happening.

Add a weekly remediation loop

The article should give a rhythm.

Monday: collect errors without overanalyzing.
Tuesday: classify the top five.
Wednesday: find native examples for the top two patterns.
Thursday: build drills or cards.
Friday: produce new sentences using the repaired pattern.
Weekend: review whether the same error reappeared.

This turns mistakes into a cycle: collect → classify → verify → drill → produce → recheck.

Add a “do not hoard errors” warning

A personal error corpus can become emotionally toxic if learners treat it as a wall of failure. The article should say plainly: the goal is not to preserve every mistake. The goal is to find recurring patterns and remove them.

Retire an error when:

  • it has not appeared in a month of similar output;
  • the learner can explain the repair without looking;
  • the learner has produced three correct examples in real contexts;
  • the error no longer blocks communication or register control.

Archive, do not worship.

Add a pattern-analysis example

After 40 entries, a learner sorts by type and sees:

Error typeCountInterpretationAction
particle14not random; place/event/object frames unstablestudy verbs with required particles
spacing8mechanical but recurringrun spelling review and card common chunks
register6too much textbook/casual mixingtag source genre before writing
honorifics5role mapping weakcreate address-term scenarios
word choice7English glosses too broaduse dictionary examples and collocations

This is the moment the corpus becomes useful. The learner no longer says “my Korean is bad.” The learner says “my next month is particles and register.”

Because the corpus may include chat messages, tutoring corrections, names, workplaces, or personal stories, the article needs a stronger privacy section.

Minimum privacy rules:

  • Replace names with placeholders: [친구], [회사], [선생님].
  • Do not store other people's private messages without permission.
  • Do not upload sensitive learner data into public AI tools or public notebooks.
  • Separate language examples from personal details.
  • Mark entries as private, anonymized, or publishable.

Example anonymization:

Before:

민지 씨한테 삼성동 사무실 계약서 보내줬어요.

After:

[동료]에게 [장소] 사무실 계약서를 보내 줬어요.

The grammar remains useful. The private content disappears.

Add a “correction quality” field

Not all corrections are equal. Tutors, friends, AI tools, grammar checkers, and native speakers may correct different things. The article should teach learners to mark correction reliability.

SourceStrengthRisk
trained teacherexplanation and level fitmay simplify for pedagogy
native friendnaturalness judgmentmay not explain rule
grammar checkerspelling/spacingmisses context and register
corpus/dictionaryusage evidencerequires interpretation
AI toolfast alternativesmay hallucinate or overcorrect
published sourceauthentic modelmay not match learner's intended context

Add fields:

correction_source:
confidence: high / medium / low
needs_verification: yes/no
verified_with:

Module name: Korean Error Corpus Builder

Core functions:

  • Add wrong and corrected sentence.
  • Select error type and subtype.
  • Store context and register.
  • Prompt the learner for a trigger hypothesis.
  • Generate a follow-up drill template.
  • Show recurring patterns by month.
  • Allow anonymization fields.

Dashboard views:

  1. Error frequency by type.
  2. Error frequency by source context: chat, essay, speech, email, reading summary.
  3. Open errors vs retired errors.
  4. Top three remediation targets.
  5. Examples needing verification.

Drill generation examples:

  • Particle error → generate verb-frame table.
  • Ending error → generate contrast sentences.
  • Register error → generate casual/formal rewrites.
  • Honorific error → generate role-map scenarios.
  • Collocation error → generate native-example search prompts.

Related reading