'A' to Ä converter

12

I'm currently scanning a bunch of handwritten documents and converting them to .txt files. Since I have a terrible handwriting the .jpg->.txt converter converts some of my umlauts to the "normal" letter encased by '

Task

Write a program or a function that:

  • Is given a string
    • you can choose any I/O codepage as long as
      • it supports the characters AEIOUaeiouÄËÏÖÜäëïöü'.
      • the Input and Output codepages are the same.
    • the input will (beside spaces) only contain printable characters from your codepage.
      • There will only be one solution, thus things like 'a'e' won't appear
  • Converts all characters in the following set AEIOUaeiou to ÄËÏÖÜäëïöü
    • If, and only if, they are surrounded by ' characters:
      • Example: 'a''e' -> äë
    • If the from string is a single letter.
      • for example 'AE' does not change at all, outputting as-is.
    • If the from character is not a character out of AEIOUaeiou that character won't change.

Note: The from character / from string is the one between '.

Testcases

Input
Output
<empty line>

'A'sthetik
Ästhetik

Meinung ist wichtig!
Meinung ist wichtig!

Ich sagte: "Er sagte: 'Ich habe Hunger'"
Ich sagte: "Er sagte: 'Ich habe Hunger'"

Ich sagte: "Er sagte: ''A'sthetik'"
Ich sagte: "Er sagte: 'Ästhetik'"

Hämisch rieb er sich die H'a'nde
Hämisch rieb er sich die Hände

H'a''a'slich isn't a German word
Hääslich isn't a German word

since it's really called h'a'sslich
since it's really called hässlich

Roman Gräf

Posted 2017-05-02T16:54:19.377

Reputation: 2 915

6The active ingredient in all of your testcases are either 'A' or 'a'... not what I consider good testcases. – Leaky Nun – 2017-05-02T17:53:22.330

1Can you add a example with 'w' (as w is not one of AEIOUaeiou)? – jimmy23013 – 2017-05-02T18:07:20.143

8

Combining diacriticals had unknown status, then were allowed, then were disallowed. This invalidated at least 4 answers. Boo! Hiss! I've changed my upvote to a downvote :(

– Digital Trauma – 2017-05-02T18:36:12.807

1@DigitalTrauma I'm very sorry for that. – Roman Gräf – 2017-05-02T18:45:59.923

4Add testcase: 'q'e'd' – Display Name – 2017-05-03T05:57:30.493

1What is with strings like 'a'u', this can result in äu, or äü, what of that is allowed what not? – 12431234123412341234123 – 2017-05-04T09:54:12.757

1Isn't it häßlich? – Magic Octopus Urn – 2017-07-07T17:02:11.307

This says it is hässlich. In the first example the error(?) is intended. – Roman Gräf – 2017-07-08T05:43:34.720

Answers

11

JavaScript (ES6), 81 70 68 bytes

s=>s.replace(/'[aeiou]'/gi,c=>"ï   ÖÄöä ËÜëüÏ "[c.charCodeAt(1)%15])

Try It

f=
s=>s.replace(/'[aeiou]'/gi,c=>"ï   ÖÄöä ËÜëüÏ "[c.charCodeAt(1)%15])
i.addEventListener("input",_=>o.innerText=f(i.value))
console.log(f("'A'sthetik")) // Ästhetik
console.log(f("Meinung ist wichtig!")) // Meinung ist wichtig!
console.log(f(`Ich sagte: "Er sagte: 'Ich habe Hunger'"`)) // Ich sagte: "Er sagte: 'Ich habe Hunger'"
console.log(f(`Ich sagte: "Er sagte: ''A'sthetik'"`)) // Ich sagte: "Er sagte: 'Ästhetik'"
console.log(f("Hämisch rieb er sich die H'a'nde")) // Hämisch rieb er sich die Hände
console.log(f("H'a''a'slich isn't a German word")) // Hääslich isn't a German word
console.log(f("since it's really called h'a'sslich")) // since it's really called hässlich
<input id=i><pre id=o>

Explanation

  • s=> Anonymous function taking the input string as an argument via parameter "s".
  • s.replace(x,y) Returns the string with "x" replaced by "y".
  • /'[aeiou]'/gi Case insensitive regular expression that matches all occurrences of a vowel enclosed by single quotes.
  • c=> Passes each match of the regular expression to an anonymous function via parameter "c".
  • "ï ÖÄöä ËÜëüÏ "[n] Returns the nth character (0 indexed) in the string "ï ÖÄöä ËÜëüÏ ", similar to "ï ÖÄöä ËÜëüÏ ".charAt(n).
  • c.charCodeAt(1)%15 Gets the remainder of the character code of the second character in "c" (i.e. the vowel character) when divided by 15.

Alternative, 40/52 36/48 bytes (35/47 characters)

The following was my answer before combining diacritics were disallowed (Boo-urns!) - better viewed in this Fiddle

s=>s.replace(/'([aeiou])'/gi,"$1̈")

However, ETHproductions suggests that with the addition of .normalize() for an additional 12 bytes that this would be valid.

s=>s.replace(/'([aeiou])'/gi,"$1̈").normalize()

Shaggy

Posted 2017-05-02T16:54:19.377

Reputation: 24 623

OP still hasn't answered https://codegolf.stackexchange.com/users/59183/dzaima

– Adám – 2017-05-02T17:40:46.477

No, if combining diacritics are allowed. – Adám – 2017-05-02T18:02:44.133

Combining diacritics are now prohibited. – Adám – 2017-05-02T18:30:16.327

I believe you can make this valid by adding .normalize() to the end of the function. – ETHproductions – 2017-05-02T18:33:08.600

Are you sure, @ETHproductions? If combining diacritics are prohibited, are they not prohibited from appearing in an answer at all? – Shaggy – 2017-05-02T19:29:41.213

8

Perl 5, 25 bytes

s/'(\w)'/chr 1+ord$1/age

24 bytes, plus 1 for -pe instead of -e

This makes use of the rule that "you can choose any I/O codepage as long as it supports the characters AEIOUaeiouÄËÏÖÜäëïöü'". It also makes use of the /a flag on regexes, which causes \w to refer to precisely the characters in abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_0123456789 no matter how they're encoded.

The chosen I/O codepage for my script is this:

 1  a
 2  ä
 3  e
 4  ë
 5  i
 6  ï
 7  o
 8  ö
 9  u
10  ü
11  A
12  Ä
13  E
14  Ë
15  I
16  Ï
17  O
18  Ö
19  U
20  Ü
21  '

(I can't test this script on the test cases in the question, as they include some really weird characters, like t.)


Thanks to Grimy for saving me three bytes. Earlier, I had s/'([a-z])'/chr 1+ord$1/gie, which made use of (the encoding and) the interesting fact that [a-z] is special-cased in Perl to match precisely abcdefghijklmnopqrstuvwxyz no matter the encoding. My earlier answer is, IMO, more interesting, but this one is shorter, so, what the heck, I'll take it.

msh210

Posted 2017-05-02T16:54:19.377

Reputation: 3 094

1I carefully checked the "loopholes that are forbidden by default" list before posting this, and inventing a codepage wasn't among them. That, plus especially the fact that the question invited use of "any I/O codepage", seem to allow this answer. And then the a-z trick makes the answer actually interesting instead of merely a cheat. (IMO, anyway.) – msh210 – 2017-05-02T23:35:30.030

3This is the kind of trick that’s only funny once, but I believe you’re the first one to use it, so it works (= – Grimmy – 2017-05-03T07:48:52.300

1You could save 3 bytes by using \w instead of [a-z], as well as /a instead of /i. If the "/a" modifier is in effect, \w matches the characters [a-zA-Z0-9_], regardless of how they’re encoded. – Grimmy – 2017-05-03T07:51:33.197

@Grimy, thanks! I'll edit.... – msh210 – 2017-05-03T13:00:12.913

6

Vim, 33 bytes

:s/\c'\([aeiou]\)'/<C-v><C-k>\1:/g
ii<esc>D@"

Try it online! in the backwards compatible V interpreter.

James

Posted 2017-05-02T16:54:19.377

Reputation: 54 537

4

Japt, 29 bytes

r"'%v'"@"ï   ÖÄöä ËÜëüÏ "gXc1

Try it online!

Explanation

r"'%v'"@"ï   ÖÄöä ËÜëüÏ "gXc1

r"'%v'"@                       // Replace each match X of /'<vowel>'/ in the input with
        "ï   ÖÄöä ËÜëüÏ "g     //   the character in this string at index
                          Xc1  //     X.charCodeAt(1).
                               //   Values larger than the length of the string wrap around,
                               //   so this is effectively equal to " ... "[n%15].
                               // Implicit: output result of last expression

ETHproductions

Posted 2017-05-02T16:54:19.377

Reputation: 47 880

1Using combining diacritics is controversial. – Leaky Nun – 2017-05-02T17:50:49.693

Beat me to it. Your solution is much shorter than mine though... Well done. – Luke – 2017-05-02T17:56:00.757

@LeakyNun Controversial for this question or in general? – Digital Trauma – 2017-05-02T18:15:28.797

Controversial for this question because you raised it in the comments but it was never addressed. – Leaky Nun – 2017-05-02T18:15:54.290

@Adám Beat you by 38 seconds ;-) – ETHproductions – 2017-05-02T18:30:23.693

@ETHproductions Ninja'ed :-D – Adám – 2017-05-02T18:31:11.077

@LeakyNun Fixed now. – ETHproductions – 2017-05-02T18:47:17.227

Can you add an explanation please? – Luke – 2017-05-02T18:49:50.083

@Luke Added now, let me know if you have any questions. – ETHproductions – 2017-05-02T18:53:45.863

It's not too different from what I expected. I started writing a solution which doesn't use the wrapping of string indexing (since I didn't know Japt had that), but that is much longer. – Luke – 2017-05-02T18:59:20.763

4

Javascript, 67 bytes

s=>s.replace(/'.'/g,c=>"äëïöüÄËÏÖÜ"['aeiouAEIOU'.indexOf(c[1])]||c)

Try it online!

Replace all characters between quotes with either the corresponding umlauted character, or the match itself if it's not in the group of characters that need changing.

Yair Rand

Posted 2017-05-02T16:54:19.377

Reputation: 381

3

Jelly, 36 bytes

œṣ⁹Ṫ¤j
“.ạẏụ’D196;+\Ọż⁾''jЀØc¤;@Wç/

Try it online!

This seems pretty complicated for Jelly!

How?

Note: Since the characters are not on the code-page, but are within the range of a byte in Unicode I think they must be created from ordinals, so I have.

œṣ⁹Ṫ¤j - Link 1, Replace: char list S [...], list R [char T, char list F]
œṣ     - split S at sublists equal to:
    ¤  -   nilad followed by link(s) as a nilad:
  ⁹    -     link's right argument, R
   Ṫ   -     tail - yield char list F and modify R to become [T]
     j - join with R (now [T])
       - all in all split S at Rs and join back up with [T]s.

“.ạẏụ’D196;+\Ọż⁾''jЀØc¤;@Wç/ - Main link: char list S
       196;                   - 196 concatenate with:
“.ạẏụ’                        -   base 250 literal 747687476
      D                       -   to decimal list [7,4,7,6,8,7,4,7,6]
           +\                 - cumulative reduce with addition: [196,203,207,214,220,228,235,239,246,252]
             Ọ                - cast to characters: ÄËÏÖÜäëïöü
                       ¤      - nilad followed by link(s) as a nilad:
               ⁾''            -   literal ["'", "'"]
                     Øc       -   vowel yield: AEIOUaeiou
                  jЀ         -   join mapped:  ["'A'", "'E'", ...]
              ż               - zip together
                          W   - wrap S in a list
                        ;@    - concatenate (swap @rguments)
                           ç/ - reduce with last link (1) as a dyad
                              - implicit print

Jonathan Allan

Posted 2017-05-02T16:54:19.377

Reputation: 67 804

3

V, 24 bytes

Óã'¨[aeiou]©'/±:
éiD@"

Try it online!

Hexdump:

00000000: d3e3 27a8 5b61 6569 6f75 5da9 272f 160b  ..'.[aeiou].'/..
00000010: b13a 0ae9 6944 4022                      .:..iD@"

This is just a direct translation of my vim answer so that I can beat all of the golfing languages. :P

James

Posted 2017-05-02T16:54:19.377

Reputation: 54 537

2

Ruby, 62+1 = 63 bytes

Uses the -p flag for +1 byte.

gsub(/'([aeiou])'/i){$1.tr"AEIOUaeiou","ÄËÏÖÜäëïöü"}

Try it online!

Value Ink

Posted 2017-05-02T16:54:19.377

Reputation: 10 608

1

APL (Dyalog), 53 bytes

(v←'''[AEIOUaeiou]''')⎕R{'  ÄËÏÖÜäëïöü'[v⍳2⊃⍵.Match]}

Try it online!

Uses PCRE Replace (saving the RegEx as v) to apply the following function to quoted vowels:

{ anonymous function

' ÄËÏÖÜäëïöü'[] index the string (note two spaces corresponding to '[) with:

  ⍵.Match the matched string

  2⊃ pick second letter (the vowel)

  v⍳ find index in v

}

Adám

Posted 2017-05-02T16:54:19.377

Reputation: 37 779

1

///, 67 bytes

/~/'\///`/\/\/~/'A~Ä`E~Ë`I~Ï`O~Ö`U~Ü`a~ä`e~ë`i~ï`o~ö`u~ü/

Try it online!

This works by replacing non-dotted letters surrounded by single-quotes('A') with the same letter as a dotted, without the single quotes (Ä). A single replacement of this looks like this (before the golf): /'A'/Ä/.

The golf takes two common occurrences, // and '/, and uses them as replacements.

Comrade SparklePony

Posted 2017-05-02T16:54:19.377

Reputation: 5 784

1

SOGL, 43 35 (UTF-8) bytes

L∫:ÆW ':h++;"äëïöü”:U+Wŗ

Explanation:

L∫                        repeat 10 times, pushing current iteration (0-based)
  :                       duplicate the iteration
   ÆW                     get the index (1-based) in "aeiouAEIOU"
      ':h++               quote it
           ;              put the copy (current iteration) ontop
            "äëïöü”       push "äëïöü"
                   :      duplicate it
                    U     uppercase it
                     +    join together, resulting in "äëïöüÄËÏÖÜ"
                      W   get the index (1-based) in it
                       ŗ  replace [in the input, current char from "aeiouAEIOU" with
                          the corresponding char in "äëïöüÄËÏÖÜ"

dzaima

Posted 2017-05-02T16:54:19.377

Reputation: 19 048

3Heh, one could think that ̈+ is a function in SOGL. – Adám – 2017-05-02T18:12:27.360

Combining diacritics are now prohibited. – Adám – 2017-05-02T18:31:40.873

1

Swift - 201 bytes

import Foundation;func g(s:String){var e=s;var r="aeiouAEIOUäëïöüÄËÏÖÜ".characters.map{String($0)};for i in r[0...9]{e=e.replacingOccurrences(of:"'\(i)'",with:r[r.index(of:i)!+10])};print(e)}

Usage: g("'A'sthetik") // => Ästhetik

Mr. Xcoder

Posted 2017-05-02T16:54:19.377

Reputation: 39 774

1characters.map{blah blah} and replacingOccurrences() really kill the fun :(( – Mr. Xcoder – 2017-05-02T18:18:42.473

1

AWK, 99 bytes

{split("AEIOUaeiou",p,"")
for(i=1;i<=split("ÄËÏÖÜäëïöü",r,"");i++)gsub("'"p[i]"'",r[i])}1

Try it online!

I tried to come up with some clever regex within a gensub but failed :(

Robert Benson

Posted 2017-05-02T16:54:19.377

Reputation: 1 339

1

Python 3.6, 98 92 characters

import re;a=lambda i,p="'([AEIOUaeiou])'":re.sub(p,lambda x:'ÄËÏÖÜäëïöü'[p.index(x[1])-3],i)

It's a function, not a complete program.

Formatted for readability:

import re

a = lambda i, p="'([AEIOUaeiou])'":\
    re.sub(p, lambda x: 'ÄËÏÖÜäëïöü'[p.index(x[1]) - 3], i)

Thanks to @ValueInk for clever tips for further golfing.

Display Name

Posted 2017-05-02T16:54:19.377

Reputation: 654

Does not run for me. Stops with a TypeError. – totallyhuman – 2017-05-03T11:40:23.447

@totallyhuman are you sure? It seems to be working for me. You need to call the a function with the string you want to replace. – numbermaniac – 2017-05-03T11:41:40.853

Try it here. – totallyhuman – 2017-05-03T11:45:57.440

@totallyhuman it works for me. Which Python version do you have? – Display Name – 2017-05-03T17:49:29.757

1Python docs reports that match.__getitem__(g) is new to Python 3.6 so it should probably be specified in your header. Also, if you change your regex to '([AEIOUaeiou])' you save a byte by changing x[0][1] to x[1] and use -3 instead of -2. – Value Ink – 2017-05-03T19:58:41.020

1Actually, it's even shorter to go import re;a=lambda i,p="'([AEIOUaeiou])'":re.sub ... since you cut out quite a bit of overhead from no longer needing a return statement! – Value Ink – 2017-05-03T20:11:29.417

1

05AB1E, 30 29 24 bytes

-6 bytes thanks to Emigna

žMDu«S''«''ì"äëïöü"Du«S:

05AB1E conveniently has the characters äëïöü in its code page.

Try it online!

(old code)

žMDu«Svy''.ø})"äëïöü"Du«¹ŠS:

Explanation (outdated):

žM                             Push aeiou                    ['aeiou']
  D                            Duplicate                     ['aeiou', 'aeiou']
   u                           Uppercase                     ['aeiou', 'AEIOU']
    «                          Concatenate                   ['aeiouAEIOU']
     vy                        For each...
       ''                        Push '
         .ø                      Surround a with b (a -> bab)
           }                   End loop
            )                  Wrap stack to array           [["'a'", "'e'", "'i'", "'o'", "'u'", "'A'", "'E'", "'I'", "'O'", "'U'"]]
             "äëïöü"           String literal.               [["'a'", "'e'", "'i'", "'o'", "'u'", "'A'", "'E'", "'I'", "'O'", "'U'"], 'äëïöü']
                    Du«        Duplicate, uppercase, concat  [["'a'", "'e'", "'i'", "'o'", "'u'", "'A'", "'E'", "'I'", "'O'", "'U'"], 'äëïöüÄËÏÖÜ']
                       ¹       Push first input
                        Š      Push c, a, b                  ["'A'sthetik", ["'a'", "'e'", "'i'", "'o'", "'u'", "'A'", "'E'", "'I'", "'O'", "'U'"], 'äëïöüÄËÏÖÜ']
                          S    Convert to char list          ["'A'sthetik", ["'a'", "'e'", "'i'", "'o'", "'u'", "'A'", "'E'", "'I'", "'O'", "'U'"], ['ä', 'ë', 'ï', 'ö', 'ü', 'Ä', 'Ë', 'Ï', 'Ö', 'Ü']]
                           :   Replace all                   ['Ästhetik']
                               Implicit print

Try it online!

Okx

Posted 2017-05-02T16:54:19.377

Reputation: 15 025

You could replace with Š. – Emigna – 2017-05-03T10:36:55.123

You could save a few more bytes with žMDu«S''«''ì"äëïöü"Du«S: – Emigna – 2017-05-03T10:42:25.393

@Emigna Thanks again. – Okx – 2017-05-03T10:44:41.277

You also don't need the I at the beginning :) – Emigna – 2017-05-03T10:45:24.453

0

Retina, 39 bytes

iT`A\EI\OUaei\ou'`ÄËÏÖÜäëïöü_`'[aeiou]'

Try it online!

Neil

Posted 2017-05-02T16:54:19.377

Reputation: 95 035