Check if all vowels exist in pairs of strings

6

1

Your program should find the number of string pairs (pairs of 2) that contain all vowels (a e i o u), when given an integer N and N strings.

There are easy ways to do this, but I'm looking for the quickest possible solution.

Example:

INPUT:

4
password
unique
orbiting
ointmental

OUTPUT:

2

EXPLANATION:

password has a,o

unique has u,e,i

orbiting has o,i

ointmental has o,i,e,a


The pairs

password and unique

ointmental and unique

are the 2 successful combinations.


Here's a sample DataSet where N = 10000

Viswalahiri

Posted 2019-03-04T07:57:26.180

Reputation: 163

Question was closed 2019-03-04T16:47:24.347

6Welcome to PPCG! I think this is a promising first challenge! For the example input, any solution will be limited by the IO speed. Do you have a larger test set? Are all input words random English dictionary words, or do they need to meet certain criteria? For what input are we measuring fastest-code? You could have a specific list of N words, or you could take the average when selecting N random words a few times. On what machine are we measuring? Can we use multiple cores to speed up calculations? – maxb – 2019-03-04T08:32:18.020

@maxb Inputs don't necessarily have to be of any format. They can be words or even random, meaningless bunches of characters. – Viswalahiri – 2019-03-04T08:40:58.927

3Then that needs to be specified. A solution where <1% of all pairs contain all vowels can look very different from a solution where 50% of all pairs are positives. We will try to optimize for the challenge. To refine what you have, look at some previous fastest-code challenges and what information they contain. As a small example, if all input strings are 1000 random characters, almost every input contains all vowels, so the algorithm to find the number of collisions would be different compared to having English words as input. – maxb – 2019-03-04T08:51:49.860

How long will the words be? – Adám – 2019-03-04T08:59:06.277

Will there ever be words that contain all five vowels? E.g. education – Adám – 2019-03-04T09:10:16.250

@ Adám It doesn't have to be a word. The inputs can be just meaningless strings like absdfterc or any other combination. – Viswalahiri – 2019-03-04T09:16:24.870

1I'd recommend having a large test case to test against, as well as specifying the specs of the computer the code will be running on. – Jo King – 2019-03-04T09:29:15.290

4I've updated the question to have a dataset of 10000 10 lettered words. – Viswalahiri – 2019-03-04T10:06:42.480

1@ViswalahiriHejeebu Thank you for the large dataset, but what is the answer for it? – Adám – 2019-03-04T10:29:03.943

1I'm sorry I didn't mention. The answer is 1968503. – Viswalahiri – 2019-03-04T10:36:38.580

1What is the maximum length for a string (if any)? Is 'aeiou', 'aeiou' a valid pair? – None – 2019-03-04T10:36:55.453

Also, CPU specs very much needed. – None – 2019-03-04T10:38:19.840

The maximum length for each string is a 1000 characters, and 'aeiou', 'aeiou' can be considered as a valid pair. – Viswalahiri – 2019-03-04T10:59:54.973

Answers

4

Java openjdk8

Using bit arrays,

Coding each word on an int where if a is present the lowest bit is 1, if e the second is 1, etc.

For example password has a and o, so its code is 1|8 == 9 == 0b01001. As we only need the number of pairs we can store the number of words for each code in an array of size 32.

Then for each pair of different indexes except the last (31) where first is less than second and where bitwise or of indexes is 31, we multiply the number of words with code, and the last is particular because it's the only one that can make a pair with a word of the same code, and also with any word.

public static void main( String[] args ) {
    int[] a = new int[ 32 ];
    int count = 0;
    try (Scanner scan = new Scanner( System.in )) {
        while ( scan.hasNext() ) {
            String s = scan.next();
            a[ code( s ) ]++;
        }
    }

    int t = 0;
    for ( int i = 0 ; i < 31 ; ++i ) {
        count += a[ i ];
        for ( int j = i + 1 ; j < 31 ; ++j ) {
            if ( ( i | j ) == 31 ) {
                t += a[ i ] * a[ j ];
            }
        }
    }

    t += a[ 31 ] * count + a[ 31 ] * ( a[ 31 ] - 1 ) / 2;
    System.out.println( t );
}

// a e i o u
// 1 2 4 8 16
public static int code( String s ) {
    int r = 0;
    for ( char c : s.toCharArray() ) {
        r |= c == 'a' ? 1 : c == 'e' ? 2 : c == 'i' ? 4 : c == 'o' ? 8 : c == 'u' ? 16 : 0;
    }
    return r;
}

TIO

C++

translating to C++ is straightforward

test with 100 first words

test on TIO with 10000 words of dataset

1968503

Real time: 0.419 s
User time: 0.315 s
Sys. time: 0.099 s
CPU share: 98.83 %
Exit code: 0

Nahuel Fouilleul

Posted 2019-03-04T07:57:26.180

Reputation: 5 582

Fails on ["education","automobile"] – Adám – 2019-03-04T09:31:51.107

@adam, thank you the case a[31] was count twice, fixed – Nahuel Fouilleul – 2019-03-04T10:01:19.333

The C++ implementation was great. – Viswalahiri – 2019-03-05T20:54:47.640

@VHejeebu I've just noticed the getline can return an empty string so a check ìf(!input.empty()) must be done arround a[code(input)]++ – Nahuel Fouilleul – 2019-03-06T08:38:45.640

C++ link updated – Nahuel Fouilleul – 2019-03-06T08:45:42.833

3

APL (Dyalog Unicode)

Two solutions designed by my esteemed colleague Marshall, master of speeding up APL. The formulas used are both in '70s APL, so they works in APL\360 on an IBM/360 mainframe. They are anonymous prefix functions taking a character matrix as argument. N is not needed but may be supplied as optional (and ignored) left argument.

The first solution is O(n2)

A naive approach:

{0.5×(+/,(⍉i)∧.∨i)-+/∧⌿i←∨/'aeiou'∘.=⍵}

Try it online!

Processes* the large data set in about 4.5 ms. TIO takes about twice that.

{} "dfn"; the argument is :

'aeiou'∘.=⍵ equality 3D array with vowels along the first axis, words along the middle axis and characters along the last axis

∨/ OR-reduction along the last axis; this gives us a 5-row table with one column per word

i← store that in i (for in)

∧⌿ AND-reduction along the first axis; this gives us a mask of words that contain all vowels

+/ sum; count of words containing all vowels (we need to discount their "self-pairs")

()- subtract that from the following:

  (⍉i)∧.∨i the Boolean matrix where the OR of corresponding masks is all-true.

  , ravel (flatten)

  +/ sum; this gives us the count of ordered pairs

0.5× halve; this gives us the count of unordered pairs

The second solution is O(n)

Breaks even with the above O(n2) solution at about 150 words of 10 evenly distributed characters. It requires 0-based indexing, and a pre-computed 32-by-32 Boolean vowel signature pairing table:

{
    a←0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    a[2⊥∨/'aeiou'∘.=⍵]+←1
    (0.5×lׯ1+l←a[31])++/,p×a∘.×a
}

Processes* the large dataset in 0.09 ms. TIO's timing is too inconsistent to make any conclusions, but it is likely that it takes about twice the time here too.


* 64-bit Dyalog APL 17.0 on 2.6 Ghz i7-4720HQ

Adám

Posted 2019-03-04T07:57:26.180

Reputation: 37 779