Uniquify Identifiers

31

4

Introduction

By definition, unique identifiers should be unique. Having multiple identifiers that are the same causes one to retrieve unexpected data. But with data arriving concurrently from multiple sources, it can be difficult to ensure uniqueness. Write a function the uniquifies a list of identifiers.

This is possibly the worst puzzle fluff I have written, but you get the idea.

Requirements

Given a list of zero or more positive integers, apply the following rules to each number from first to last:

  • If the number is the first of its kind, keep it.
  • If the number has previously been encountered, replace it with the lowest positive integer not found anywhere in the entire input list or any existing output.

For the solution:

  • The solution may be a program or a function.
  • The input may be a string, an array, passed as arguments, or keyboard input.
  • The output may be a string, an array, or printed to the screen.
  • All numbers in the output list are distinct.

Assumptions

  • The input list is clean. It only contains positive integers.
  • A positive integer has the range from 1 to 231-1.
  • Less than 256 MB of memory is available for your program's variables. (Basically, no 2,147,483,648-element arrays are permitted.)

Test Cases

Input:  empty
Output: empty

Input:  5
Output: 5

Input:  1, 4, 2, 5, 3, 6
Output: 1, 4, 2, 5, 3, 6

Input:  3, 3, 3, 3, 3, 3
Output: 3, 1, 2, 4, 5, 6

Input:  6, 6, 4, 4, 2, 2
Output: 6, 1, 4, 3, 2, 5

Input:  2147483647, 2, 2147483647, 2
Output: 2147483647, 2, 1, 3

Scoring

Just a simple code golf. Lowest byte count by this time next week wins.

Hand-E-Food

Posted 2017-05-03T07:02:19.240

Reputation: 7 912

4Add test case: 6, 6, 1, 2, 3, 4, 56, 7, 1, 2, 3, 4, 5 – Adám – 2017-05-03T08:16:55.603

2@Adám Shouldn't 6, 6, ... give 6, 1, ...? – xnor – 2017-05-03T08:30:43.520

@xnor No, *If the number has previously been encountered, replace it with the lowest positive integer not found anywhere in the set.* – Adám – 2017-05-03T08:31:34.947

5@Adám I think you're right about that, but the wording could be clearer, Does "set" refer to the elements encountered, all elements of the input list, or elements in the list as it now stands? – xnor – 2017-05-03T08:39:09.493

@xnor Yes, could be clearer, but it should have said not already used if that was the case. – Adám – 2017-05-03T08:40:31.770

3@xnor The 6, 6, 4, 4, 2, 2 test case confirms Adám's interpretation: the expected output is 6, 1, 4, 3, 2, 5, and not 6, 1, 4, 2, 3, 5. – Fatalize – 2017-05-03T09:04:24.220

2Does 0 count as a positive integer for this challenge? – Luke – 2017-05-03T10:38:16.553

@Luke If it did, it'd be the first new number to show up in the test cases, and it doesn't. – Ørjan Johansen – 2017-05-03T20:12:46.897

Obviously, the test cases don't use it. However, it may still be acceptable if an answer uses 0. – Luke – 2017-05-03T21:18:57.207

@Luke 0 is not usually considered positive in the English language. See e.g. this Math SE question.

– Ørjan Johansen – 2017-05-03T22:03:05.417

It's declared under Assumptions that positive integers start from 1, not 0. – Hand-E-Food – 2017-05-03T23:44:37.593

I've clarified "not found anywhere in the set." It is now "not found anywhere in the entire input list or any existing output." – Hand-E-Food – 2017-05-03T23:51:06.857

Answers

11

Brachylog, 8 bytes

{|∧ℕ₁}ᵐ≠

Try it online!

Explanation

{    }ᵐ     Map on the Input:
              Input = Output…
 |            …or…
  ∧ℕ₁         …Output is in [1,+inf)
       ≠    All elements of the output must be different
            (Implicit labeling)

Fatalize

Posted 2017-05-03T07:02:19.240

Reputation: 32 976

8

Java 8, 158 144 bytes

a->{int m=0;String r="",c=",",b=r;for(int x:a)b+=x+c;for(int x:a)if(r.contains(x+c)){for(;(r+b).contains(++m+c););r+=m+c;}else r+=x+c;return r;}
  • .contains(m+c);m++) to .contains(++m+c);) to save 1 byte, and simultaneously converted to Java 8 to save 13 more bytes.

Explanations:

Try it here.

a->{                      // Method with integer-array parameter and String return-type
  int m=0;                //  Lowest integer
  String r="",            //  Result-String
         c=",",           //  Comma delimiter for result String
         b=r;for(int x:a)b+=x+c;
                          //  Input array as String
  for(int x:a)            //  Loop (2) over the integers in the array
    if(r.contains(x+c)){  //   If the result already contains this integer
      for(;(r+b).contains(++m+c););
                          //    Inner (3) as long as either the result-String or array-String contains the lowest integer
                          //     and raise the lowest integer before every iteration by 1
      r+=m+c;             //    Append the result with this lowest not-present integer
    }else                 //   Else:
      r+=x+c;             //    Append the result-String with the current integer
                          //  End of loop (2) (implicit / single-line body)
  return r;               //  Return the result-String
}                         // End of method

Kevin Cruijssen

Posted 2017-05-03T07:02:19.240

Reputation: 67 575

7

Ruby, 63 bytes

->a{r=*0..a.size;r.map{|i|[a[i]]-a[0,i]==[]?a[i]=(r-a)[1]:0};a}

Try it online!

Explanation

->a{                                    # Anonymous function with one argument
    r=*0..a.size;                       # Numbers from 0 to array size
    r.map{|i|                           # For all numbers in range:
        [a[i]]                          #  Get array with just a[i]
              -a[0,i]                   #  Remove elements from array that are
                                        #    also in a[0..i-1]
                    ==[]?               #  Check if result is an empty array
                        a[i]=           #  If true, set a[i] to:
                             (r-a)      #   Remove elements from number range
                                        #     that are also in input array
                                  [1]   #   Get second element (first non-zero)
                        :0};            #  If false, no-op
                            a}          # Return modified array

Value Ink

Posted 2017-05-03T07:02:19.240

Reputation: 10 608

7

JavaScript (ES6), 49 bytes

a=>a.map(g=(e,i)=>a.indexOf(e)-i?g(++n,-1):e,n=0)

Neil

Posted 2017-05-03T07:02:19.240

Reputation: 95 035

6

05AB1E, 17 16 18 bytes

vy¯yåi¹gL¯K¹K¬}ˆ}¯

Try it online!

Explanation

v                    # for each y in input
 y                   # push y
  ¯yåi               # if y exist in global list
      ¹gL            # push [1 ... len(input)]
         ¯K          # remove any number that is already in global list
           ¹K        # remove any number that is in the input
             ¬       # get the first (smallest)
              }      # end if
               ˆ     # add to global list
                }¯   # end loop, push and output global list

Emigna

Posted 2017-05-03T07:02:19.240

Reputation: 50 798

I should probably use reduce instead of map... let me see if that helps – Leaky Nun – 2017-05-03T07:35:01.250

@LeakyNun: Reduce or map are often the way to go :) – Emigna – 2017-05-03T07:36:15.880

1Gives wrong result on 6, 6, 1, 2, 3, 4, 5. – Adám – 2017-05-03T08:17:39.690

3Gives [6, '1', '2', '3', '4', '5', '7']. Should give [6, '7', '1', '2', '3', '4', '5']. – Adám – 2017-05-03T08:22:13.973

1@Adám: Thanks for the catch! Fixed now :) – Emigna – 2017-05-03T08:31:41.030

v¯yåi¯Z>LsK0èëy}ˆ for 17 bytes (I have no idea how it prints). – Magic Octopus Urn – 2017-05-11T17:11:01.557

@carusocomputing: It implicitly prints the global list as the stack is empty. Unfortunately it has the same problem as my previous version, had as noted by Adám above. – Emigna – 2017-05-11T21:37:37.030

6

PHP, 121 Bytes

<?$n=array_diff(range(0,count($g=$_GET)),$g);sort($n);$r=[];foreach($g as$v)$r[]=in_array($v,$r)?$n[++$i]:$v;print_r($r);

Online Version

Expanded

$n=array_diff(range(0,count($g=$_GET)),$g); # create array of ascending values which are not in input array plus zero
sort($n); # minimize keys
$r=[];  # empty result array
foreach($g as$v) # loop input array
  $r[]=in_array($v,$r)?$n[++$i]:$v; # if value is not in result array add value else take new unique value skip zero through ++$i
print_r($r); # output result array

Jörg Hülsermann

Posted 2017-05-03T07:02:19.240

Reputation: 13 026

5

Haskell, 79 76 bytes

EDIT:

  • -3 bytes: @nimi saw that head could be replaced by a pattern match.

([]#) is an anonymous function taking and returning a list. Use like ([]#)[2147483647, 2, 2147483647, 2].

(?)=notElem
s#(x:y)|z:_<-[x|x?s]++filter(?(s++y))[1..]=z:(z:s)#y
_#n=n
([]#)

Try it online!

How it works

  • ? is an abbreviated operator for checking if an element is absent from a list.
  • s#l handles the list of integers l, given a list s of integers already used.
    • x is the next integer to look at, y the remaining ones.
    • z is the integer chosen for the next spot. It's x if x is not an element of s, and the first positive integer neither in s nor in y otherwise.
    • (z:s)#y then recurses with z added to the used integer list.
    • n is an empty list, since nonempty lists have been handled in the previous line.
  • The main function ([]#) takes a list, and calls # with it as second argument, and an empty list for the first argument.

Ørjan Johansen

Posted 2017-05-03T07:02:19.240

Reputation: 6 914

|z:_<-[x|...]... – nimi – 2017-05-03T21:50:34.493

5

Python 2, 77 79 bytes

a=input();u=[];j=1
for x in a:
 u+=[[x,j][x in u]]
 while j in u+a:j+=1
print u

Takes keyboard input, like [3, 3, 3, 3, 3, 3].

Just keep track of the smallest positive integer j not used so far. For each element x of the input, output x if x hasn't already been used, otherwise output j. Finally, update j each time you output something.

EDITED: to fix a mistake handling input of [6, 6, 4, 4, 2, 2]. Thanks to @Rod for pointing out the mistake as well as a fix. The mistake was that, in the event of a duplicate entry, it would output the smallest number unused to that point in the list, even if that output appeared later in the input. (This was wrong, as clarified in the post and the comments, but I still messed it up somehow.) Anyway, the fix was to simply add the input list, a, to the set of values that could not be output in that case.

mathmandan

Posted 2017-05-03T07:02:19.240

Reputation: 943

doesn't work for [6,6,4,4,2,2], you can (probably) fix it by adding +a to the while j in u: -> while j in u+a: – Rod – 2017-05-03T16:33:33.843

@Rod You're right, my mistake. (Somehow I still messed this up despite the comments about this--thanks for bringing it to my attention--and I also didn't test my solution well enough against the test cases. Embarrassing.) OK, I have incorporated your very nice fix, and verified it against the test cases. Thanks! – mathmandan – 2017-05-03T18:06:02.487

4

APL (Dyalog 16.0), 34 bytes

(s↑v~⍨⍳⌈/v+s←+/b←(⍳≢v)≠⍳⍨v)@{b}v←⎕

Adám

Posted 2017-05-03T07:02:19.240

Reputation: 37 779

2There must be a better way. – Adám – 2017-05-03T09:12:08.660

3

Pyth, 21 20 bytes

.b?}NPY@-UhlQQ=hZNQ._
m?}edPd@-SlQQ~hZed._

Test suite

I'll add an explanation when I have the time.

Leaky Nun

Posted 2017-05-03T07:02:19.240

Reputation: 45 011

3

C, 169 bytes 133 bytes

input = array a, output = modified array a

i=1,j,k,l;void f(int*a,int n){for(;i<n;i++)for(j=i-1;j>=0;j--)if(a[i]==a[j]){l=1;for(k=0;k<n;)if(l==a[k])k=l++?0:0;else k++;a[i]=l;}}

formatted

int i, j, k, l;
void f(int* a, int n)
{
    for (i = 1; i<n; i++)
        for (j = i - 1; j >= 0; j--)
            if (a[i] == a[j])
            {
                l = 1;
                for (k = 0; k<n;)
                    if (l == a[k])
                        k = l++ ? 0 : 0;
                    else
                        k++;
                a[i] = l;
            }
}

Too many bytes wasted for these loop. Does anybody think of shorten the code by inventing a new algorithm (which use less loop)? I was thinking but havent found one.

hucancode

Posted 2017-05-03T07:02:19.240

Reputation: 199

3

C#, 135 bytes


Golfed

(int[] i)=>{for(int x=1,v,m=0,l=i.Length,y;x<l;x++){v=i[x];for(y=0;y<l&&v==i[x]?y<x:y<l;y++)if(i[y]==v){v=++m;y=-1;}i[x]=v;}return i;};

Ungolfed

( int[] i ) => {
   for( int x = 1, v, m = 0, l = i.Length, y; x < l; x++ ) {
      v = i[ x ];

      for( y = 0; y < l && v == i[ x ] ? y < x : y < l ; y++ )
         if( i[ y ] == v ) {
            v = ++m;
            y = -1;
         }

      i[ x ] = v;
   }

   return i;
};

Ungolfed readable

// Takes an array of Int32 objects ( 32-byte signed integers )
( int[] i ) => {

   // Cycles through each element on the array
   //   x: Scan position, starts at the 2nd element
   //   v: Value being processed
   //   m: The next minimum value to replace
   //   l: Size of the array, to save some byte count
   for( int x = 1, v, m = 0, l = i.Length, y; x < l; x++ ) {

      // Hold the value
      v = i[ x ];

      // Re-scan the array for a duplicate value up the the current position ( 'x' ) IF
      //   ... the currently hold value hasn't been modified
      //   ... otherwise, re-scans the entire array to find a suitable value to replace
      for( y = 0; y < l && v == i[ x ] ? y < x : y < l ; y++ )

         // Check if 'v' shares the same value with other element
         if( i[ y ] == v ) {

            // Set 'v' to the minimum value possible
            v = ++m;

            // Reset the scan position to validate the new value
            y = -1;
         }

      // Set the 'v' to the array
      i[ x ] = v;
   }

   // Return the array
   return i;
};

Full code

using System;
using System.Collections.Generic;

namespace Namespace {
   class Program {
      static void Main( String[] args ) {
         Func<Int32[], Int32[]> f = ( int[] i ) => {
            for( int x = 1, v, m = 0, l = i.Length, y; x < l; x++ ) {
               v = i[ x ];

               for( y = 0; y < l && v == i[ x ] ? y < x : y < l ; y++ )
                  if( i[ y ] == v ) {
                     v = ++m;
                     y = -1;
                  }

               i[ x ] = v;
            }

            return i;
         };

         List<Int32[]>
            testCases = new List<Int32[]>() {
               new Int32[] { },
               new Int32[] { 5 },
               new Int32[] { 1, 4, 2, 5, 3, 6 },
               new Int32[] { 3, 3, 3, 3, 3, 3 },
               new Int32[] { 6, 6, 4, 4, 2, 2 },
               new Int32[] { 2147483647, 2, 2147483647, 2 },
            };

         foreach( Int32[] testCase in testCases ) {
            Console.WriteLine( $" Input: {String.Join( ",", testCase )}\nOutput: {string.Join( ",", f( testCase ) )}\n" );
         }

         Console.ReadLine();
      }
   }
}

Releases

  • v1.0 - 135 bytes - Initial solution.

Notes

  • None

auhmaan

Posted 2017-05-03T07:02:19.240

Reputation: 906

3

Python 2, 101 bytes

x=input()
for i,n in enumerate(x):x[i]=[n,list(set(range(1,len(x)+9))-set(x))[0]][n in x[:i]]
print x

Try it online! or Try all test cases

Rod

Posted 2017-05-03T07:02:19.240

Reputation: 17 588

3

R, 39 46 bytes

Creates a vector from the input, then replaces the duplicated values with a range from 1 to a million that has the input values removed. Returns a numeric vector. No input will return the empty vector numeric(0).

i[duplicated(i)]=(1:1e6)[-(i=scan())];i

Try it online!

This will throw a warning about the length of the replacement vector

                           i=scan()     # set i as input
                 (1:1e6)                # 1 to a million (could go higher)
                 (1:1e6)[-(i=scan())]   # without input values
  duplicated(i)                         # duplicate values in i
i[duplicated(i)]=(1:1e6)[-(i=scan())]   # set duplicate i values to reduced range vector
                                     ;i # return result

MickyT

Posted 2017-05-03T07:02:19.240

Reputation: 11 735

2

Clojure, 72 bytes

#(reduce(fn[r i](conj r(if((set r)i)(nth(remove(set r)(range))1)i)))[]%)

A basic reduction. If i is contained in the output list so far, we'll take the 2nd element (1 when 0-indexed) from the infinite list of integers (range) from which we have removed those numbers that have already been used. Range starts from zero so we cannot take the first element but the second.

NikoNyrh

Posted 2017-05-03T07:02:19.240

Reputation: 2 361

2

C# 7, 116 bytes

int[]f(int[]c){int j=0;int h()=>c.Contains(++j)?h():j;return c.Select((e,i)=>Array.IndexOf(c,e)<i?h():e).ToArray();}

Indented

int[] f(int[] c)
{
    int j = 0;
    int h() => c.Contains(++j) ? h() : j;
    return c
        .Select((e, i) => Array.IndexOf(c, e) < i ? h() : e)
        .ToArray();
}

Explained

  • first occurrence of a number is always left as-is
  • consecutive occurrences of a number are drawn from [1, 2, 3, ...], skipping values present in the input.

Online Version

user1655772

Posted 2017-05-03T07:02:19.240

Reputation: 41

1

R, 74 bytes

reads the list from stdin; returns NULL for an empty input.

o=c();for(i in n<-scan())o=c(o,`if`(i%in%o,setdiff(1:length(n),o)[1],i));o

explanation:

o=c()                                #initialize empty list of outputs
for(i in n<-scan())                  # loop through the list after reading it from stdin
    o=c(o,                           # set the output to be the concatenation of o and
      `if`(i%in%o,                   # if we've seen the element before
           setdiff(1:length(n),o)[1] # the first element not in 1,2,...
           ,i))                      # otherwise the element
o                                    # print the output

1:length(n) may be used since we are guaranteed to never need a replacement from outside that range.

Try it online!

Giuseppe

Posted 2017-05-03T07:02:19.240

Reputation: 21 077

0

Axiom, 169 bytes

f a==(r:List INT:=[];for i in 1..#a repeat(~member?(a.i,r)=>(r:=concat(r,a.i));for j in 1..repeat if~member?(j,r)and(~member?(j,a)or j=a.i)then(r:=concat(r,j);break));r)

ungolf and result

ff(a)==
  r:List INT:=[]
  for i in 1..#a repeat
      ~member?(a.i,r)=>(r:=concat(r,a.i))
      for j in 1.. repeat
            if~member?(j,r)and(~member?(j,a) or j=a.i)then(r:=concat(r,j);break)
  r

(3) -> f([])
   (3)  []
                                                       Type: List Integer
(4) -> f([5])
   (4)  [5]
                                                       Type: List Integer
(5) -> f([1,4,2,5,3,6])
   (5)  [1,4,2,5,3,6]
                                                       Type: List Integer
(6) -> f([3,3,3,3,3,3])
   (6)  [3,1,2,4,5,6]
                                                       Type: List Integer
(7) -> f([6, 6, 4, 4, 2, 2])
   (7)  [6,1,4,3,2,5]
                                                       Type: List Integer
(8) -> f([2147483647, 2, 2147483647, 2])
   (8)  [2147483647,2,1,3]
                                                       Type: List Integer

RosLuP

Posted 2017-05-03T07:02:19.240

Reputation: 3 036