Designing a Korean Error Corpus From Your Own Mistakes
The reader can turn personal Korean mistakes into a searchable error corpus that reveals patterns and supports targeted improvement.
Why this matters
Most learners waste their mistakes.
A tutor corrects a sentence. A language partner rewrites a message. A teacher marks an essay. A speech-recognition tool fails to understand a phrase. The learner nods, feels briefly enlightened, and then the correction vanishes into a chat log, notebook margin, or memory fog.
The mistake was useful for ten seconds. Then it stopped being data.
A personal Korean error corpus changes that. It is a structured record of your recurring mistakes: wrong sentence, corrected sentence, context, error type, explanation, date, and follow-up practice. The goal is not to collect shame. The goal is to detect patterns.
One particle error may be random. Twenty particle errors sorted by context reveal a grammar problem. One awkward email sentence is annoying. Ten awkward email sentences tagged by register reveal a workplace Korean problem. One misheard phrase is normal. Fifty misheard phrases categorized by sound change reveal a listening problem.
What counts as an error?
An error is not only a red-marked grammar mistake. For serious Korean study, include several types.
| Error type | Korean label | Example problem |
|---|---|---|
| Particle | 조사 오류 | 은/는 vs 이/가, 을/를 omission, 에 vs 에서 |
| Ending | 어미 오류 | wrong connective, stiff final ending, mismatched politeness |
| Word choice | 단어 선택 | using 연령 where 나이 fits, or 고치다 where 수정하다 fits |
| Spacing | 띄어쓰기 | 할수 있다 instead of 할 수 있다 |
| Honorifics | 높임법 | subject honorific missing or overused |
| Speech level | 말높임 | switching 해요체 and 합쇼체 inconsistently |
| Pronunciation | 발음 | liaison, tensification, batchim, vowel contrast |
| Collocation | 연어 | grammatical but unnatural word pairing |
| Word order | 어순 | English-shaped sequencing |
| Omitted information | 정보 누락 | sentence assumes context the listener does not have |
| Register | 사용역 | casual word in formal document, bureaucratic word in friendly chat |
| Pragmatics | 화용 | sentence is grammatically correct but socially wrong |
A personal corpus should preserve why the sentence failed, not merely what the correct sentence was.
The minimum viable corpus
Start with a spreadsheet. Do not overengineer the first version.
Use these fields:
| Field | Purpose |
|---|---|
| ID | Unique number for the error |
| Date | When it occurred |
| Source | Tutor, teacher, chat, essay, speaking, self-recording, corpus check |
| Context | What you were trying to say and to whom |
| Wrong sentence | Your original Korean |
| Corrected sentence | Better Korean |
| Error type | Particle, ending, word choice, etc. |
| Explanation | Why the correction matters |
| Severity | Low, medium, high |
| Recurrence | First time, repeated, fossilized |
| Follow-up example | A new sentence using the repaired pattern |
| Review date | When to revisit it |
| Status | Open, practiced, improved, stable |
A simple entry might look like this:
| Field | Entry |
|---|---|
| Wrong sentence | 저는 어제 친구에 만났어요. |
| Corrected sentence | 저는 어제 친구를 만났어요. |
| Error type | Particle |
| Explanation | 만나다 takes the person met as an object, so 친구를 is expected. |
| Follow-up | 내일 선생님을 만날 예정이에요. |
| Status | Open |
A more advanced entry:
| Field | Entry |
|---|---|
| Wrong sentence | 고객님, 이 문제는 제가 생각해 볼게요. |
| Corrected sentence | 고객님, 이 문제는 내부적으로 확인 후 다시 안내드리겠습니다. |
| Error type | Register / workplace service Korean |
| Explanation | 생각해 볼게요 sounds personal and informal for customer-service context; 확인 후 안내드리겠습니다 is more institutional and appropriate. |
| Follow-up | 요청하신 내용은 담당 부서 확인 후 안내드리겠습니다. |
| Status | Practiced |
The second example is not a basic grammar correction. It is exactly the kind of error advanced learners need to track.
Build categories that produce action
A bad error category is too vague:
grammar
A useful error category points toward a drill:
이/가 vs 은/는 in contrastive contexts
에 vs 에서 with activity-location verbs
formal email closing formula
noun-modifying clause length too long
spacing around bound nouns: 수, 것, 때, 중
Categories should help you decide what to practice next.
Use a two-level taxonomy:
Error type: Particle
Subtype: location particle
Pattern: 에 vs 에서
Trigger: activity verb misread as destination verb
Practice: 20 sentences with 가다/오다/있다/공부하다/회의하다
or:
Error type: Register
Subtype: workplace request
Pattern: too direct for superior/client
Trigger: literal translation from English “Please send...”
Practice: rewrite with 확인 부탁드립니다 / 공유해 주시면 감사하겠습니다 / 가능하실 때
From collection to diagnosis
After 30–50 entries, start analyzing.
Ask:
- Which error type appears most often?
- Which errors recur after correction?
- Which errors happen only in writing?
- Which errors happen only in speech?
- Which errors appear under pressure?
- Which errors are low-level but persistent?
- Which errors create social or professional risk?
A learner may discover:
| Pattern | Evidence | Action |
|---|---|---|
| Particle errors cluster around motion and location | 에/에서/으로 mistakes in 12 entries | Build location-particle drill set |
| Honorifics fail in workplace writing | inconsistent 드리다/주시다/계시다 | Make role-map templates |
| Writing is grammatically correct but translationese | too many explicit subjects and passive forms | Mine native Korean examples from notices/emails |
| Listening errors come from sound change | 못 읽다, 한국말, 연락해요 misheard | Build audio cards for liaison/tensification |
The corpus turns vague frustration into a queue.
Weekly remediation loop
Use a 45-minute weekly review.
- Sort new errors by type.
- Pick the top two recurring patterns.
- Write a plain-language rule for each.
- Find or create five correct examples.
- Produce five new sentences or short spoken responses.
- Get at least one checked if possible.
- Mark the original errors as practiced, not solved.
- Re-test after one week.
Do not try to fix everything. The point of a corpus is prioritization.
Privacy and safety
A personal error corpus may contain private chats, workplace details, health information, immigration details, or identifying names. Treat it carefully.
Rules:
- Replace real names with placeholders: [친구], [회사], [병원].
- Do not store private messages without consent if you plan to share the corpus.
- Avoid saving full screenshots when a typed excerpt is enough.
- For speech recordings, store only short clips needed for analysis.
- Keep medical, legal, immigration, and employment content in a private file.
- Never publish corrections from another person’s message without permission.
A useful corpus does not need to expose your life.
Sample error entries by level
Beginner / lower intermediate
Wrong:
저는 한국어 공부해요.
Better:
저는 한국어를 공부해요.
Note: Object marker omission may be acceptable in some casual contexts, but at this stage the learner needs to learn the transitive frame 공부하다 + 을/를.
Intermediate
Wrong:
제가 시간이 있으면 연락할게요.
Better, depending on intent:
시간이 되면 연락드리겠습니다. 시간이 괜찮으시면 연락드리겠습니다.
Note: The original sentence may sound speaker-centered. The repair depends on whether the speaker is promising contact or asking permission.
Advanced
Wrong:
본 서비스는 사용자의 정보를 모읍니다.
Better:
본 서비스는 서비스 제공을 위해 이용자의 정보를 수집·이용합니다.
Note: 모으다 is too general and informal for privacy-policy register. 수집·이용하다 is domain-appropriate.
Suggested interactive/tool module
Tool name: Korean Personal Error Corpus Builder
Core functions:
- Form-based entry for wrong sentence, correction, context, and error type.
- Tag system: 조사, 어미, 띄어쓰기, 높임법, 연어, 사용역, 발음, 담화.
- Dashboard showing recurring error types.
- Drill generator that turns corrected sentences into cloze, rewrite, and production tasks.
- Privacy toggle that masks names and sensitive fields.
Good tool behavior:
- Encourage short, high-quality entries over hoarding.
- Distinguish “needs teacher confirmation” from “confirmed correction.”
- Let learners mark uncertainty rather than forcing fake certainty.
- Use Korean Learners’ Corpus resources as a model for how learner errors can be categorized, but keep the personal workflow simple.
- Cross-reference NIKL dictionary examples when explaining collocation and register.
- Treat automated grammar correction as a weak signal unless verified by a teacher, native speaker, or strong corpus evidence.
QA checklist
- Does the article turn mistakes into a practice system, not shame?
- Does it include register and pragmatic errors, not only grammar?
- Does it provide a real schema?
- Does it include privacy guidance?
- Does it show how to convert corpus patterns into drills?
Remediation and upgrade layer: errors are data only if they are structured
The first draft introduces the personal error corpus. The upgrade must prevent the most common failure: learners collect corrections like souvenirs and never turn them into behavior change.
An error corpus should answer four questions:
- What did I produce?
- What was the better version?
- What kind of error was it?
- What drill will make this less likely next time?
If the corpus cannot answer the fourth question, it is not yet a remediation tool.
Add a stricter error taxonomy
The article should keep the categories practical. Too many categories make the system unusable; too few hide patterns.
| Error type | Korean label | Example problem | Repair action |
|---|---|---|---|
| Particle | 조사 오류 | 학교를 가요 for 학교에 가요 | collect verb-place frames |
| Ending | 어미 오류 | 만나면 좋겠어요 used where 만나서 좋았어요 is needed | contrast cause/condition/result endings |
| Word choice | 단어 선택 | 시키다 used where 하게 하다 or 주문하다 is needed | build object-type cards |
| Collocation | 연어 오류 | 강한 비 instead of 많은 비 / 폭우 | collect adjective+noun pairs |
| Spacing | 띄어쓰기 | 할수있다 for 할 수 있다 | use spelling/norm checker + pattern notes |
| Honorifics | 높임법 오류 | 선생님이 말했어요 in a context needing 말씀하셨어요 | role-map drill |
| Speech level | 말높임 오류 | 반말 mixed into workplace email | rewrite by audience |
| Register | 사용역 오류 | 너무 casual word in formal document | tag source genre |
| Word order / focus | 어순·초점 오류 | English-shaped topic order | rewrite from predicate outward |
| Omission | 정보 누락 | missing object, time, or relation because English context supplied it | context completeness check |
| Pronunciation/listening | 발음·청취 오류 | 못 들어요 vs 못 들었어요; batchim liaison missed | record and compare |
This taxonomy should be introduced as a default, not a prison. Learners can merge categories if they do not generate enough data.
Add a minimum viable corpus schema
A spreadsheet is enough. The article should explicitly discourage starting with a complicated database unless the learner already likes that kind of tool.
id
date
source_context
mode: writing / speaking / chat / tutor / self-recording
wrong_version
corrected_version
error_type
subtype
explanation
trigger: why I made the mistake
native_or_teacher_evidence
follow_up_drill
next_review_date
status: open / drilling / improved / retired
privacy_level
Example entry:
| Field | Entry |
|---|---|
| wrong_version | 저는 어제 친구를 만났으면 카페에 갔어요. |
| corrected_version | 저는 어제 친구를 만나서 카페에 갔어요. |
| error_type | ending |
| subtype | condition vs sequence/cause |
| trigger | transferred English “when/after meeting” too loosely |
| follow_up_drill | make 10 sentences contrasting -면, -고, -아서/어서 |
| status | open |
The key is the trigger field. Without it, the learner only knows what was wrong, not why it keeps happening.
Add a weekly remediation loop
The article should give a rhythm.
Monday: collect errors without overanalyzing.
Tuesday: classify the top five.
Wednesday: find native examples for the top two patterns.
Thursday: build drills or cards.
Friday: produce new sentences using the repaired pattern.
Weekend: review whether the same error reappeared.
This turns mistakes into a cycle: collect → classify → verify → drill → produce → recheck.
Add a “do not hoard errors” warning
A personal error corpus can become emotionally toxic if learners treat it as a wall of failure. The article should say plainly: the goal is not to preserve every mistake. The goal is to find recurring patterns and remove them.
Retire an error when:
- it has not appeared in a month of similar output;
- the learner can explain the repair without looking;
- the learner has produced three correct examples in real contexts;
- the error no longer blocks communication or register control.
Archive, do not worship.
Add a pattern-analysis example
After 40 entries, a learner sorts by type and sees:
| Error type | Count | Interpretation | Action |
|---|---|---|---|
| particle | 14 | not random; place/event/object frames unstable | study verbs with required particles |
| spacing | 8 | mechanical but recurring | run spelling review and card common chunks |
| register | 6 | too much textbook/casual mixing | tag source genre before writing |
| honorifics | 5 | role mapping weak | create address-term scenarios |
| word choice | 7 | English glosses too broad | use dictionary examples and collocations |
This is the moment the corpus becomes useful. The learner no longer says “my Korean is bad.” The learner says “my next month is particles and register.”
Add privacy and consent rules
Because the corpus may include chat messages, tutoring corrections, names, workplaces, or personal stories, the article needs a stronger privacy section.
Minimum privacy rules:
- Replace names with placeholders: [친구], [회사], [선생님].
- Do not store other people's private messages without permission.
- Do not upload sensitive learner data into public AI tools or public notebooks.
- Separate language examples from personal details.
- Mark entries as private, anonymized, or publishable.
Example anonymization:
Before:
민지 씨한테 삼성동 사무실 계약서 보내줬어요.
After:
[동료]에게 [장소] 사무실 계약서를 보내 줬어요.
The grammar remains useful. The private content disappears.
Add a “correction quality” field
Not all corrections are equal. Tutors, friends, AI tools, grammar checkers, and native speakers may correct different things. The article should teach learners to mark correction reliability.
| Source | Strength | Risk |
|---|---|---|
| trained teacher | explanation and level fit | may simplify for pedagogy |
| native friend | naturalness judgment | may not explain rule |
| grammar checker | spelling/spacing | misses context and register |
| corpus/dictionary | usage evidence | requires interpretation |
| AI tool | fast alternatives | may hallucinate or overcorrect |
| published source | authentic model | may not match learner's intended context |
Add fields:
correction_source:
confidence: high / medium / low
needs_verification: yes/no
verified_with:
Module name: Korean Error Corpus Builder
Core functions:
- Add wrong and corrected sentence.
- Select error type and subtype.
- Store context and register.
- Prompt the learner for a trigger hypothesis.
- Generate a follow-up drill template.
- Show recurring patterns by month.
- Allow anonymization fields.
Dashboard views:
- Error frequency by type.
- Error frequency by source context: chat, essay, speech, email, reading summary.
- Open errors vs retired errors.
- Top three remediation targets.
- Examples needing verification.
Drill generation examples:
- Particle error → generate verb-frame table.
- Ending error → generate contrast sentences.
- Register error → generate casual/formal rewrites.
- Honorific error → generate role-map scenarios.
- Collocation error → generate native-example search prompts.
Related reading
When CJK Comparison Helps Korean Learners and When It Becomes Noise
The reader can decide when Chinese/Japanese comparison accelerates Korean learning and when it creates false friends, grammar transfer, register mistakes, or institutional confusion.
Using Speech Recognition Carefully for Korean Pronunciation
The reader can use speech recognition as a Korean pronunciation aid without treating it as an objective pronunciation judge.
Hanja Beneath Hangul: The Hidden Sino-Korean Layer
The reader can recognize the Sino-Korean layer behind Hangul words without needing to become a full Hanja reader on day one.
Sentence Rhythm in Korean: Eojeol, Particles, and Breath Groups
The reader can understand Korean sentence rhythm through eojeol grouping, particles, verb endings, and breath units.
Causatives in Korean: Authority, Permission, and Agency
The reader can analyze Korean causatives as statements about agency, authority, permission, and responsibility.
How to Mine Korean Sentences Without Collecting Translationese
The reader can collect Korean example sentences that reflect native usage, context, and register rather than mechanically translated English structures.