Is case sensitivity important? Part II: reality check

6

In this question Tom learned that, in general, there are many motivations to choose to include case sensitivity in his programming language, because the possible combinations for a variable name are much more than what's possible with a case insensitive programming language for a name of the same length.

But, as a more meticulous person than him pointed out, even with case sensitivity, many combinations like A_18__cDfG are wasted anyway because no one will ever use them, so Tom decided to be an even more meticulous person and to redo the calculation of the difference of possibilities of variable names between a case sensitive programming language and a case insensitive programming language only counting the variable names that are likely to be used.

The Challenge

Write a function (or a full program if your language doesn't support them) that takes an input n and returns (or outputs) the difference between the number of possible, valid combinations for a variable name of length n with case sensitivity and a variable name of the same length without case sensitivity.

Valid names

You must only count names in which:

  • The first character is either an alphabet letter or an underscore, but never a digit;

  • If there are digits, they must appear at the end of the name, so that x1 and arraysize12 are valid but iam6feettall isn't;

  • If underscores are present, they must appear at the beginning of the name, like in __private and _VERSION but unlike in __init__ or some_variable;

  • For case sensitive names, variables like HeLlObOyS aren't counted. A name must be either all uppercase (HELLOBOYS) or all lowercase (helloboys);

  • There must be at least one alphabetic character. This means _________a9999999999 is valid but __ or _4 arent't.

Rules

  • Regexs are, of course, allowed;

  • Blame Tom's meticulousness.

Testcases

Input (lenght of the varable name) -> Output (differences in valid combinations)
0 -> 0 (no combinations for both cases: 0 - 0 = 0)
1 -> 26
2 -> 962
3 -> 27898
4 -> 754234
5 -> 19898970
6 -> 520262106

Scoring

It's again code golf. Shortes program in bytes wins.

Reference, non-competing Lua implementation

Since the test cases may be wrong, I've been asked to include it, so here it is:

local case = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_0123456789"

local function isvalid(s)  --checks if a string s is valid AND only suitable for case sensitive
    if s:upper() == s then --if the string is all uppercase, it's also valid for case unsensitive
        return false
    end
    return (s:match"^_*[%w]+[%d]*$" --checks if it matchs the underscore-character-digit sequence
            and s:lower() == s)     --checks if it's all lowercase
            and true                --trasforms to boolean value
end

local function each(s, n) --recursive function called for character at position 2..n
    if n == 0 then
        if isvalid(s) then
            return 1
        end
        return 0
    else
        local total = 0
        for i = 1, #case do
            total = total + each(s..case:sub(i, i), n - 1)
        end
        return total
    end
end

local function diff(n) --the actual function
    local total = 0
    if n > 0 then
        for s = 1, #case - 10 do --loops for character at position 1, which can't be a digit
            total = total + each(case:sub(s, s), n - 1)
        end
    end
    print(string.format("%.0f", total)) --prints the result in non-scientific notation
end

user6245072

Posted 2016-07-14T09:08:27.150

Reputation: 775

@Zeta Tom's programming language's naming conventions don't allow that :-p it's to keep the challenge not too complicated. – user6245072 – 2016-07-14T09:43:39.733

I'm not sure whether the difference for 2 is correct. Valid names are either _<character>, <character><digit> or <character><character>. Without case sensitivity, we have 26 + 26 * 10 + 26 * 26 = 962 combinations. With case sensitivity, we have 52 + 2 * 26 * 26 + 52 * 10 = 1924 combinations. The difference is 962, but you state 926. Typo? – Zeta – 2016-07-14T10:04:45.050

@Zeta actually, yes. Thanks for correcting me. – user6245072 – 2016-07-14T10:10:16.097

It's computationally impossible to have the execution time stay less than 2^n minutes as n increases, since each character adds more than double the previous number of names. – PurkkaKoodari – 2016-07-14T10:17:21.893

@Pietu1998 I'm removing the limit – user6245072 – 2016-07-14T10:30:18.507

@user6245072: I'm still not sure about your test cases. There are 27898 valid case-insensitive combinations for n = 3, and 55796 for case-sensitive ones. See answer below. Perhaps you want to add your own solution as reference? – Zeta – 2016-07-14T11:12:01.633

1%w matches alphanumerics, so your pattern allows for e.g. iam6feettall. Use %a for letters. – PurkkaKoodari – 2016-07-14T12:15:06.657

@Pietu1998 * facepalm * – user6245072 – 2016-07-14T12:27:51.117

@Pietu1998 That fixed everything. Thanks. – user6245072 – 2016-07-14T12:30:54.107

@Pietu1998: Err, it's easily possible to have the execution time less than 2^n minutes. Well, unless you start to create every valid string… – Zeta – 2016-07-14T16:56:35.913

@Zeta well, now that I think of it, does seem simple with the mathematical approach. – PurkkaKoodari – 2016-07-14T16:58:10.780

Answers

9

Haskell, 41 bytes

d n=sum[26^a*10^k|a<-[1..n],k<-[0..n-a]]

Runtime: O(k*n²), where k depends on the size of the resulting numbers.

Results

ghci> forM_ [0..100] $ \n -> printf "%3d: %d\n" n (d n)
  0: 0
  1: 26
  2: 962
  3: 27898
  4: 754234
  5: 19898970
  6: 520262106
  7: 13555703642
  8: 352737183578
  9: 9174055661914
 10: 238554336098650
 11: 6202701627453786
 12: 161273131202687322
 13: 4193130300158759258
 14: 109021676693016629594
 15: 2834566482907321258330
 16: 73698757444479241605466
 17: 1916167982445349170631002
 18: 49820370432467967325294938
 19: 1295329660133056039346557274
 20: 33678571452348345911899378010
 21: 875642860649945882598272717146
 22: 22766714405787481836443979534682
 23: 591934574839363416636432356790618
 24: 15390298948712337721436130165444954
 25: 400147772695409669646228273190457690
 26: 10403842090369540299690823991840788826
 27: 270499894352496936680850312676749398362
 28: 7032997253193809242590997018484373246298
 29: 182857928583327929196254811369482593292634
 30: 4754306143169415047991513984495436314497370
 31: 123611959722433680136668252485770233065820506
 32: 3213910952783564572442263453518914948600222042
 33: 83561684772375567772387738680380677552494661978
 34: 2172603804081793650970970094578786505253750100314
 35: 56487698906126923814134111347937338025486391497050
 36: 1468680171559302908056375783935259677551535067812186
 37: 38185684460541904498354659271205640505228800652005722
 38: 992827795974089805846110029940235542024837705841037658
 39: 25813522695326337840887749667335012981534669240755867994
 40: 671151590078484812751970380239599226408790289148541456730
 41: 17449941342040605420440118775118468775517436406750966763866
 42: 453698474893055743820331977041969077052342235464414024749402
 43: 11796160347219449368217520291980084892249787010963653532373338
 44: 306700169027705683862544416480371096087383351173943880730595674
 45: 7974204394720347783315043717378537387160856019411429787884376410
 46: 207329314262729042395080025540730860955071145393586063373882675546
 47: 5390562170830955102560969552947891273720738669122126536609838453082
 48: 140154616441604832669474097265534062005628094286064178840744688669018
 49: 3644020027481725649435215417792774501035219340326557538748250794283354
 50: 94744520714524866885604489751501025915804591737379384896343409540256090
 51: 2463357538577646539028605622427915562699808274060752896193817536935547226
 52: 64047296003018810014772635072014693519083904014468464189928144849213116762
 53: 1665229696078489060384377400761270920385070393265068957827020654968429924698
 54: 43295972098040715569996701308681932818900719113780681792391425918068066931034
 55: 1125695274549058604819943122914619142180307585847186615491065962758658629095770
 56: 29268077138275523725318810084668986585576886120915740891656603920614013245378906
 57: 760970005595163616858291951090282540113887928032698152071960590824853233268740442
 58: 19785220145474254038315619617236234931849975017739040842759864250335072953876140378
 59: 514415723782330604996206398937030997116988239350103950800645359397600785689668538714
 60: 13374808818340595729901369261251694813930583111991591609705668233226509316820270895450
 61: 347745029276855488977435629681432954051084049800670270741236262952778131126215932170586
 62: 9041370761198242713413326660606145694217074183706315928161031725661120298170503125324122
 63: 235075639791154310548746496064648676938532817665253103021075713756078016641321970147316058
 64: 6111966634570012074267408926569754489290742148185469567436857446546917321563260112719106394
 65: 158911132498820313930952632379702505610448184741711097642247182499108739249533651819585655130
 66: 4131689444969328162204768444761154034760541692173377427587315633865716109376763836198115922266
 67: 107423925569202532217323979592678893792662972885396702006159095369397507732684748630039902867802
 68: 2793022064799265837650423469698540127498126183909203141049025368493224089938692353269926363451738
 69: 72618573684780911778911010215050932203840169670528170556163548469712715227294890073906974338634074
 70: 1888082915804303706251686265620213126188733300322621323349141149101419484798556030810470221693374810
 71: 49090155810911896362543842906414430169795954697277043295966558765525795493651345689961114652916633946
 72: 1276344051083709305426139915569664073303583711018092014584019416792559571723823876827877869864721371482
 73: 33184945328176441941079637804840154794782065375359281268073393725495437753708309686413713505371644547418
 74: 862808578532587490468070582926132913553222588648230201858797125751770270485304940735645440028551647121754
 75: 22433023041847274752169835156082344641272676193742874137217614158434915921506817348015670329631231714054490
 76: 583258599088029143556415714058169849561978469926203616456546857008196702848066139937296317459300913454305626
 77: 15164723576288757732466808565512704977500329106970182916759107171102003162938608527258593142830712638700835162
 78: 394282812983507701044137022703333218303897445670113644724625675337540971125292710597612310602487417495110603098
 79: 10251353137571200227147562590286692564790222476311843651729156447664954138146499364426808964553561743761764569434
 80: 266535181576851205905836627347454295573434673272996823833846956528177696480697872363985921967281494226694767694170
 81: 6929914720998131353551752311033814573798190393986806308568909758621508997387033570352522860038207738782952848937306
 82: 180177782745951415192345560086879207807641839132545852911680542613048122820951761718054483249882290097245662961258842
 83: 4684622351394736795000984562258859691887576706335081064592582996828140082233634693558305453385828431417276125881618778
 84: 121800181136263156670025598618730354877965883253600996568296046806420531026963390921404830676920428105738068161810977114
 85: 3166804709542842073420665564086989255716001853482514799664586105855822695589937052845414486488820019638078661095974293850
 86: 82336922448113893908937304666261720937504937079434273680168127641140278974227252262869665537598209399478934077384220528986
 87: 2140759983650961241632369921322804747264017252954180004573260207558536142218797447723500192866442333275341174900878622642522
 88: 55659759574924992282441617954392923457753337465697569007793654285410828586577622529699893903416389554047759436311733077594458
 89: 1447153748948049799343482066814216010190475662997025683091523900309570432139907074661086130377715017294130634232993948906344794
 90: 37625997472649294782930533737169616267841256126811556649268510296937720124526472830077128278709479338536285378946731560453853530
 91: 978275934288881664356193877166410022992761548185989361769870156609269612126577182470894224135335351690832308741503909460689080666
 92: 25435174291510923273261040806326660598100689141724612294905512960729898804179895633132138716407608032850528916167990534866804986202
 93: 661314531579284005104787060964493175553506806573728808556432225867866257797566175350324495515486697743002640709256642795425818530138
 94: 17194177821061384132724463585076822564420065859805837911356126761453411591625609447997325772291543030206957547329561601569960170672474
 95: 447048623347595987450836053211997386675210601243840674584148184686677590271154734536819358968469007674269785119457490529707853326373210
 96: 11623264207037495673721737383511932053558364521228746428076741690742506235938911986846192222069083088419903301994783642661293075374592346
 97: 302204869382974887516765171971310233392546366440836296018884172848194051023300600546889886662685049187806374740753263598082508848628289882
 98: 7857326603957347075435894471254066068206494416350632585379877382941934215494704503108025942118700167771854632148473742439034118953224425818
 99: 204290491702891023961333256252605717773371743714005336108765700845379178491751205969697563383975093250957109324749206192303775981672723960154
100: 5311552784275166622994664662567748662107694225453027627716797110868747529674420244101025536872241313413773731332368249888787064412379711852890
(0.72 secs, 632,190,360 bytes)

Ungolfed

difference :: Integer -> Integer
difference n = sum [ 26 ^ alpha * 10 ^ digits
                   | alphas <- [1 .. n]
                   , digits <- [0 .. n - alphas]
                   ]

Explanation

All valid names have the structure, namely [underscore]alpha[alpha][digit]. We have at least one alphabet letter, arbitrary many underscores, and arbitrary many digits (as long as the resulting string has the correct length). The position of those is fixed by Tom's rules.

The number of valid case-insenstive combinations can be calculated as follows (a is the number of alphabet letters, u the number of underscores):

$$
c(n) = \sum_{a=1}^n \sum_{u=0}{n-a} 26^a \cdot 10^{n - a - u}
$$

count of case-insensitive cariants

The number of valid case-sensitive combinations can be calculated similarly, we just add \sum_{\{L,U\}}:

$$
C(n) = \sum_{\{L,U\} \sum_{a=1}^n \sum_{u=0}{n-a} 26^a \cdot 10^{n - a - u} \\
     = 2 \sum_{a=1}^n \sum_{u=0}{n-a} 26^a \cdot 10^{n - a - u} 
$$

count of case-sensitive variants

The difference C(n) - c(n) is… c(n):

difference

So all we have to do is to find out is the count of case-insensitive variants.

Zeta

Posted 2016-07-14T09:08:27.150

Reputation: 681

I posted the reference implementation. It loops over each case sensitive possibility. If it is all lowercase (like __varaiablename123) then it's only a case sensitive variable. + 1 to the total. If it's all uppercase(VARIABLENAME) it can be either case sensitive and case insensitive. + 1 to the total, - 1 to the total. If it's mixed, it's not suitable for either case sensitive variables and case unsensitive. So + 0 to the total. That's what I thought when implementing it. – user6245072 – 2016-07-14T12:11:09.303

exactly. If I remember correctly, string.sub(str, i) matches everything from i to the end of the string, so I use string.sub(str, i, i) to match only one. – user6245072 – 2016-07-14T12:26:09.957

+1 for the overly complicated math proof. Correct me if I'm wrong but isn't it sufficient to say that since the case-sensitive names have to be all uppercase or all lowercase, there are just twice as many combinations possible? – Patrick Roberts – 2016-07-14T23:48:36.133

@PatrickRoberts: Ayup. The original proof was "well, if case doesn't matter, we only have to count lower case, if case matters, they have to choose one of two cases, therefore, there are twice as many". However, the numbers in the original version didn't agree, and I suspected an error on my side, rather than on OPs.

– Zeta – 2016-07-15T06:04:58.133

4

JavaScript (ES7), 55 bytes

n=>[...Array(n)].reduce(r=>r+26**++i*(10**n---1)/9,i=0)

I get the same results as @Zeta does. I started along similar lines although I simplified sum(10**[0..n-i]) to (10**(n+1-i)-1)/9.

Neil

Posted 2016-07-14T09:08:27.150

Reputation: 95 035

How could you get the same results? JavaScript's integer precision is only guaranteed for 53 bits. – Patrick Roberts – 2016-07-14T23:51:54.030

@PatrickRoberts At the time, Zeta disagreed with the test cases that were there, and I agreed with him rather than them. – Neil – 2016-07-14T23:58:18.043

Oh, alright. That makes more sense – Patrick Roberts – 2016-07-14T23:59:43.187

3

Pyth, 25 18 16 bytes

sm*^26ds^LTh-QdS

Try it online. Test suite.

I also get the same results as @Zeta.

PurkkaKoodari

Posted 2016-07-14T09:08:27.150

Reputation: 16 699

3

05AB1E, 18 14 bytes

Lv1¹y->×26ym*O

Explanation

Lv               # for each n in range(1,input)
  1¹y->×         # push 1 repeated (input-n+1) nr of times
        26ym     # 26^n
            *    # multiply the ones by the power of 26
             O   # sum and implicitly print

Try it online

Emigna

Posted 2016-07-14T09:08:27.150

Reputation: 50 798

2

Racket, 91 82 Bytes

Old solution (based off @Zeta's solution):

(λ(n)(for*/sum([i(in-range 1(+ 1 n))][j(in-range(+ 1(- n i)))])(*(expt 26 i)(expt 10 j))))

New solution (thanks to Neil's solution):

(λ(n)(for/sum([i(in-range 1(+ 1 n))])(*(expt 26 i)(/(-(expt 10(-(+ n 1)i))1)9))))

Steven H.

Posted 2016-07-14T09:08:27.150

Reputation: 2 841

1

Emacs lisp, 106 bytes

(defmath f(n)(let((r 0))(for((k 1(1+ n)))(setq r(+ r(*(^ 26 k)(/(1-(^ 10(- n k -1)))9)))))(calc-eval r)))

Creates a function called calcFunc-f. Pretty much the first (Emacs) lisp I've written, uses Neil's optimization.

Zeta

Posted 2016-07-14T09:08:27.150

Reputation: 681

Wow. That's a lot different than what I saw written in Common Lisp. – user6245072 – 2016-07-14T19:44:51.727

@user6245072 I have no idea whether it's idiomatic. I use the (included) calc package, since it supports arbitrary large numbers. – Zeta – 2016-07-15T05:19:14.350