2019-11-29 19:53:50 -07:00

3.3 KiB

Indonesian tN issues

The correct .md file format is documented at

Cleanup steps:

  1. Initial inspection of the id_tn_l3 data. Preliminary report to team on 11/12/19.
  2. Conference call on 11/20/19. (Chuck, John, Craig, MAx, Tabitha, Christine)
  3. Documented correct file format at
  4. More extensive analysis and documentation of deviations.
  5. Forked the Indonesian repository to This is the WACS workspace for data cleanup for this project.

Automated cleanup steps:

  1. Removed the first line of every note file other than the files. Also removed the second line if blank.

    • Consistently, notes files other than had two extraneous lines at the top.
    • RISK: might possibly have deleted some valid data.
    • 23,055 files affected
  2. Removed the first two lines from files that start with some variation of "# Pendahuluan"

    • 446 files affected
  3. Removed lines containing empty HTML comments, and the lines following, if blank.

    • All HTML comment tags found were empty comments.
    • 469 files affected
  4. Converted instances of & nbsp; to a single space.

    • 11694 files affected
  5. Removed top line of file if blank. Consolidated consecutive blank lines elsewhere in file.

    • 842 files affected
  6. Removed instances of <o:p></o:p> and <o:p> </o:p>

    • They had no apparent purpose or meaning.
    • 620 files affected
  7. Removed blank headers and the blank line following (if any).

    • 1858 files affected
  8. Fixed language code in tA links. Replaced rc://en/ with rc://id/

    • 20,065 files affected
  9. Removed blank lines between list items.

    • 427 files affected
  10. Removed high level hash tags in files showing the first classic pattern of corrupted heading levels.

    • Classic pattern means: First heading at level 1. Subsequent headings alternate higher level to level 1.
    • Ends with higher level heading. No untagged text lines anywhere.
    • 681 files affected
  11. Promoted headings to level 1 in files showing the second classic pattern of corrupted heading levels.

    • Classic pattern means: First heading at level 2 or higher. Subsequent headings always the same level.
    • Plain text lines alternate with headings. Ends with plain text line.
    • 1665 files affected
  12. Removed top two lines of files meeting these criteria:

    • At least 5 lines long
    • First line contains a verse reference (space followed by digits, colon, and digits)
    • Second line is blank, and third line starts with hash mark
    • 3394 files affected
    • RISK: might possibly have deleted some valid data.
  13. Removed "Kata-kata Terjemahan" section from files that had it.

    • Those are from an older tN version.
    • 4817 files affected
  14. Reapplied #11.

    • 475 files affected
  15. Reapplied #10

    • 241 files affected
  16. (asked permission to...) Remove links specific to V-MAST resources that no longer exist.

Remaining issues are documented in issues.txt.

Remaining steps:

  1. Manually edit the files identified in issues.txt to conform to the required markdown format.