Introduction

Try to convert words to two different unicode fonts.

Challenge

Your task is to transform your input string into the and unicode characters.

All uppercase words should become lowercase words.

For Example: WORD ->

All lowercase words should become words

For example: other words ->

All mixed case words should remain unchanged

For example: Mixed Case Words -> Mixed Case Words

Periods and spaces should remain unchanged.

Words are separated by spaces or periods

For example (highlighted things are words):

Hello. This is a word. S.O. are these

Inputs: A string containing letters, spaces, and periods ([A-Za-z .]+)
Output: The formatted string

As this is a golfing challenge, the lowest number of bytes wins

Example Input and Output

Input:

This is an example STRING that c.o.U.L.d. be INPUTTED. It can CONTAIN multiple sentences.

Output:

This ..... . It .

Reference

Math Sans Bold: (characters 120302 through 120327)

Math Sans: (characters 120250 through 120275)

pfg

Posted 2018-01-25T23:17:14.877

Reputation: 735

3Welcome to PPCG! – Laikoni – 2018-01-25T23:26:10.507

♫ Philosophy's just rigor, sense, and practicality... ♫

– Esolanging Fruit – 2018-01-26T05:12:11.153

13Whoa! You got a title in bold in the sidebar? What? I... don't understand.... is the internet breaking? Have you broken the internet? – Zizouz212 – 2018-01-26T06:38:00.423

26https://i.stack.imgur.com/R4V3C.png I came here thinking this challenge was about stacking boxes, bar charts or something... – Matteo Italia – 2018-01-26T07:29:55.700

This shouldn't have been close hammered. This problem is significantly more difficult than a simple character transliteration. The leading answer in the cited challenge can not easily nor competitively be transferred using that same method (afaict, my retina isn't great)

– Conor O'Brien – 2018-01-26T15:20:08.887

Answers

QuadR, 45 43 bytes

-2 thanks to ngn.

\w+
⎕UCS a+(2>≢b)×120153+84×⊃b←∪96>a←⎕UCS⍵M

Since TIO scrambles Unicode output from QuadR, here is a screenshot of using QuadR as an APL library in an interactive session:

\w+ replace words with the result of applying the following code to them:

⍵M the found word
⎕UCS the Universal Character Set code points of that
a← store that in a
96> 0 or 1 for whether 96 is greater than each of those
∪ take just the unique; [0] or [1] or [0,1] or [1,0]
b← store that in b
⊃ pick the first from that
84× multiply 84 with that
120153+ add 120153 to that
(…)× multiply the following with that:
≢b the tally (length) of b (1 if single-case, 2 if mixed-case)
2> 0 or 1 for whether two is greater than that (1 if single-case, 0 if mixed-case)
a+ the original code points added to that
⎕UCS convert the resulting code points back to characters

Adám

Posted 2018-01-25T23:17:14.877

Reputation: 37 779

APL (Dyalog Unicode), 63 57 53 bytes

-6 thanks to Erik the Outgolfer. -4 thanks to ngn.

Anonymous tacit prefix function.

'\w+'⎕R{⎕UCS a+(2>≢b)×120153+84×⊃b←∪96>a←⎕UCS⍵.Match}

Since TIO scrambles Unicode output from Dyalog APL, here is a screenshot of the code in action:

'\w+'⎕R PCRE Replace words with the result of applying the following…

{...} anonymous lambda:

⍵.Match the found word

⎕UCS the Universal Character Set code points of that

a← store that in a

96> 0 or 1 for whether 96 is greater than each of those

∪ take just the unique; [0] or [1] or [0,1] or [1,0]

b← store that in b

⊃ pick the first from that

84× multiply 84 with that

120153+ add 120153 to that

(…)× multiply the following with that:

≢b the tally (length) of b (1 if single-case, 2 if mixed-case)

2> 0 or 1 for whether two is greater than that (1 if single-case, 0 if mixed-case)

a+ the original code points added to that

⎕UCS convert the resulting code points back to characters

Adám

Posted 2018-01-25T23:17:14.877

Reputation: 37 779

57 bytes: '\b([A-Z]+|[a-z]+)\b'⎕R{⎕UCS(⎕UCS+120153+84×∊∘⎕A)⍵.Match} – Erik the Outgolfer – 2018-02-20T12:55:40.333

@EriktheOutgolfer Thanks. Why didn't I think of going tacit‽ – Adám – 2018-02-20T14:36:13.763

I don't know, but it happens to me when I'm tired. :) – Erik the Outgolfer – 2018-02-20T14:39:39.847

@EriktheOutgolfer Actually, I think I wrote this one from home using my wife's computer without APL keyboard layout… – Adám – 2018-02-20T14:47:52.340

@Adám that regex is too long; you're better off using \w+ and computing the amount to add to the codepoints in the dfn: '\w+'⎕R{⎕UCS a+(2>≢b)×120153+84×⊃b←∪96>a←⎕UCS⍵.Match} – ngn – 2018-02-20T20:27:04.887

@ngn Separate answer-worthy? – Adám – 2018-02-20T20:28:41.553

@Adám I can't decide which is worth less: wasting some minutes of my life on explanations, or the potential for 9 upvotes :) – ngn – 2018-02-20T20:56:20.800

@ngn I'll explain if you want. – Adám – 2018-02-20T20:56:59.513

@ngn Btw, it is -2 for QuadR too. – Adám – 2018-02-20T20:58:11.553

@Adám go ahead then :) no need to credit - people can see my comments anyway – ngn – 2018-02-20T20:59:56.797

@ngn Comments are temporary, credit is forever! – Adám – 2018-02-21T09:46:45.407

Clean, 268 265 232 224 bytes

As a neat bonus, this works with strings containing any character. Including nulls.

import StdLib,StdInt,StdBool,Text.Unicode,Text.Unicode.UChar
u=isUpper
l=isAlpha
$c|l c=fromInt(toInt c+120153+if(u c)84 0)=c
?[h,s:t]=[if(u h<>isLower s)($c)c\\c<-[h,s:t]]
?[h]=[$h]
@s=[y\\x<-groupBy(\a b=l a&&l b)s,y<- ?x]

Try it online!

Defines the function @, taking a UString and returning a UString

Οurous

Posted 2018-01-25T23:17:14.877

Reputation: 7 916

3Is it also a clean bonus? :D – Conor O'Brien – 2018-01-26T17:22:38.010

C, 292 characters, 448 bytes (in UTF-8)

char*t;s,i,k;p(l){for(l=s=*t/96,i=k=strlen(t);i--;)t[i]/96-s&&++l;for(l=l-s&&write(1,t,k);!l&++i<k;)write(1,s?""+t[i]*4-388:""+t[i]*4-260,4);}f(char*s){char b[strlen(s)];for(t=b;*s;++s)*s<47?(*t=0),p(t=b),putchar(*s):(*t++=*s);*t=0;p(t=b);}

Try it online!

Unrolled:

char*t;
s,i,k;

p(l)
{
    for (l=s=*t/96, i=k=strlen(t); i--;)
        t[i]/96-s && ++l;

    for (l=l-s&&write(1, t, k); !l&++i<k;)
        write(1, s ? ""+t[i]*4-388
                   : ""+t[i]*4-260, 4);
}

f(char*s)
{
    char b[strlen(s)];

    for (t=b; *s; ++s)
        *s<47 ? (*t=0), p(t=b), putchar(*s) : (*t++=*s);

    *t = 0;
    p(t=b);
}

Steadybox

Posted 2018-01-25T23:17:14.877

Reputation: 15 798

Java 8, 221 219 203 201 bytes

s->{StringBuffer r=new StringBuffer();for(String x:s.split("(?<=[\\. ])|(?=[\\. ])"))x.codePoints().forEach(c->r.appendCodePoint(c+(x.matches("[A-Z]+")?120237:x.matches("[a-z]+")?120153:0)));return r;}

I have to use a StringBuffer instead of a regular String to use .appendCodePoint, unfortunately..

Explanation:

Try it online.

s->{                           // Method with String parameter and StringBuffer return-type
  StringBuffer r=new StringBuffer();
                               //  Resulting StringBuffer
  for(String x:s.split("(?<=[\\. ])|(?=[\\. ])"))
                               //  Split by space or dot, and keep them as separate items,
                               //  and loop over all those substrings
   x.codePoints().forEach(c->  //   Inner loop over the codepoints of that substring
      r.appendCodePoint(       //    Convert int to char, and append it to the result:
        c                      //     The next codepoint of the substring
        +(x.matches("[A-Z]+")? //     If the word is fully uppercase:
           120237              //      Add 120237 to convert it to Math Sans Bold
          :x.matches("[a-z]+")?//     Else-if the word is fully lowercase:
           120153              //      Add 120153 to convert it to Math Sans
          :                    //     Else (mixed case, or a dot/space)
           0)));               //      Leave the codepoint (and thus the character) as is
  return r;}                   //  Return the resulting StringBuffer

Kevin Cruijssen

Posted 2018-01-25T23:17:14.877

Reputation: 67 575

Haskell, 172 170 bytes

(s#w)r=[x|all(`elem`s)w,c<-w,(x,k)<-zip r s,c==k]
t[]=[]
t w=filter(>[])[['A'..'Z']#w$[''..],['a'..'z']#w$[''..],w]!!0
f s|(a,b:c)<-span(>'.')s=t a++b:f c|1>0=t s

Try it online!

Fairly straightforward. The # operator takes the set s of charcters (upper or lower case) the word w, and the math sans set r. It returns the word in the math sans font if all the characters in the word are in s or the empty list otherwise. The t function takes a word and tries all three possiblities (all upper, all lower, or mixed), returning the first one that isn't empty. The f function finds the first word by using span, transforming it with t and concatenating it with the separator (either . or space) and recurring on the rest of the string. The alternate case is for if span can't find a separator; we just transform the string.

Edit: Thanks to @Laikoni for taking off 2 bytes! I'm not used to the whole "operator that takes three arguments" thing

user1472751

Posted 2018-01-25T23:17:14.877

Reputation: 1 511

1(['A'..'Z']#w)[''..] can be ['A'..'Z']#w$[''..]. – Laikoni – 2018-01-27T00:37:49.290

Jelly, 34 bytes

e€ØBŒg
ṁÇµŒl,Œuiị“¡ẓƬ“¡ẓġ“’×Ç+OỌµ€

Try it online!

Full program.

Erik the Outgolfer

Posted 2018-01-25T23:17:14.877

Reputation: 38 134

2It's probably obvious to professional Jellyists, but could you add a brief explanation to show ẇƬƒ is going on here? – Mick Mnemonic – 2018-01-26T23:21:27.230

@MickMnemonic sorry, right now I don't have time to – Erik the Outgolfer – 2018-01-26T23:29:53.980

Japt, 34 33 32 31 bytes

Includes an unprintable (charcode 153) after the last #.

rV="%b%A+%b"Èc+#x#í
rVv Èc+#x#

Try it

Explanation

                        :Implicit input of string U
r                       :Replace
   "%b%A+%b"            :/\b[A-Z]+\b/g
 V=                     :Assign ^that to variable V
            È           :Run each match through a function
             c          :Map over the codepoints of the current match
              +#x#í     :  Add 120237
\n                      :Assign the result of that replacement to variable U
rVv                     :Another replacement, this time with V lowercased to give us the RegEx /\b[a-z]+\b/g
    Èc+#x#              :And, again, map over the codepoints of each match, this time adding 120153 to each

Original 32 Byte Japt v2 Solution

r/\b(\A+|\a+)\b/Èc_+#x#+#T*(X¶u

Try it

r                                     :Replace
 /\b(\A+|\a+)\b/                      :...all matches of this RegEx (\A=[A-Z], \a=[a-z])
                È                     :Pass each match through a function, with X being the current match
                 c_                   :Pass the codepoints of X through a function
                   +                  :Add to the current codepoint
                    #x#               :120153 (there's an unprintable after the second #)
                        +#T           :Plus 84
                           *          :  Multiplied by
                            (X¶u      :  Is X equal to its uppercase self

Shaggy

Posted 2018-01-25T23:17:14.877

Reputation: 24 623

1Mind adding an XXD dump? – Stan Strum – 2018-01-27T05:15:49.860

A reversible hexdump? For the unprintables. – Stan Strum – 2018-01-27T22:26:10.923

Python 3, 173 122 120 bytes

lambda s:''.join(chr(ord(c)+120153*t.islower()+120237*t.isupper())for t in re.split(r'\b(\w+)\b',s)for c in t)
import re

-51 bytes from ShreevatsaR

-2 bytes from abccd

Try it online!

Splits on word boundaries (re.split(r'\b(\w+)\b',s)), then maps lowercase words to (+120153*t.islower()), and uppercase words to (+120237*t.isupper()), and leaves mixed-case words alone, then joins the words back up.

Ungolfed and un-lambda-ed:

def f(s):
    words = re.split(r'\b(\w+)\b', s)
    ret = ''
    for word in words:
        for char in word:
            if word.isupper():
                ret += chr(ord(c) + 120237)
            elif word.islower():
                ret += chr(ord(c) + 120153)
            else:
                ret += c
    return ret

pizzapants184

Posted 2018-01-25T23:17:14.877

Reputation: 3 174

could it be less bytes if you set a variable to either 120237 or 120153 depending on if it was upper or lower? It looks like it might – pfg – 2018-01-26T19:08:57.057

@pfg Indeed, can easily shave off 13 bytes (down to 160).

– ShreevatsaR – 2018-01-26T20:10:24.807

@pfg Actually, replacing the map-lambda with (easier to read) comprehensions brings it down to 149 bytes.

– ShreevatsaR – 2018-01-26T20:18:14.307

140 – ShreevatsaR – 2018-01-26T21:09:58.377

5122 :-) I'll stop here; quite proud of how golfing it further has made it easier to read. Only in Python! – ShreevatsaR – 2018-01-26T21:20:30.697

2-2 by getting rid of spaces before for – abccd – 2018-01-27T06:00:34.837

Retina, 84 bytes

/\b[A-Z]+\b/_(`.
ĵ$&
)T`L`ۮ-܇
/\b[a-z]+\b/_(`.
ĵ$&
)T`l`ں-ۓ
T`ÿ-߿`퟿-

Try it online! Explanation: Retina is a .NET application and therefore works in UTF-16 internally. Unfortunately as the Math Sans characters aren't in the BMP I can't directly transliterate them because the number of code points differs. Worse, I can't use unpaired surrogates at all. Instead, I shift the appropriate words into characters in the range 0xFF-0x7FF which conveniently only take two bytes to encode, plus I also prefix them with the 0x135 character. Finally I map that range onto the a range that overlaps the unpaired surrogates, creating valid BMP pairs.

Neil

Posted 2018-01-25T23:17:14.877

Reputation: 95 035

JavaScript (ES6), 99 114 113 bytes

s=>s.replace(/\b([A-Z]+|[a-z]+)\b/g,e=>e.replace(/./g,f=>String.fromCodePoint(f.charCodeAt()+120153+(f<'a')*84)))

(Thanks to @pfg for pointing out an important flaw in my first solution.)

-1 bytes thanks to @Neil.

Snippet:

let f = 

s=>s.replace(/\b([A-Z]+|[a-z]+)\b/g,e=>e.replace(/./g,f=>String.fromCodePoint(f.charCodeAt()+120153+(f<'a')*84)))

d.innerHTML=f('This is an example STRING that c.o.U.L.d. be INPUTTED. It can CONTAIN multiple sentences.');

<p id="d">

Rick Hitchcock

Posted 2018-01-25T23:17:14.877

Reputation: 2 461

This only works with the HTML because of the &#, to do it with pure JS you would need to use String.fromCodePoint(120237) which would increase the size – pfg – 2018-01-26T21:30:11.597

Don't yet grok that, but I'll come back to it later, thanks. – Rick Hitchcock – 2018-01-26T21:33:08.737

let a = s=>s.replace(/\b([A-Z]+|[a-z]+)\b/g,e=>e.replace(/./g,f=>String.fromCodePoint(f.charCodeAt(0)+120153+(f<'a')*84))) works pure JS but adds many extra bytes – pfg – 2018-01-26T21:33:55.987

Ah, I understand! Daggum. – Rick Hitchcock – 2018-01-26T21:38:34.357

3Save 1 byte by using charCodeAt() without the 0. – Neil – 2018-01-27T00:22:42.427

05AB1E, 33 bytes

€aγ€g£εÐ.u•1Ù„•*s.l•1Ùm•*+sÇ+çJ}J

Try it online!

Erik the Outgolfer

Posted 2018-01-25T23:17:14.877

Reputation: 38 134