1
To give you a bit of background, I am a university student doing an assignment in Data Science. I have a dataset in Excel, based on football/soccer players from around the world. Currently, each player has an ID and each record includes their ID, name, club, photo, age, etc etc;
The trouble I am having is when I design the tables in Access, I cannot import the data from Excel AND normalize it due to some repeating data.
I have assigned each player an ID, and I want to break it up so that I have a table for (Club, Player, Player_Attributes, League,) but because I have over 17,000 players data and their club - I would have to manually assign each unique club an ID in Access then delete ALL repeating clubs from my excel spreadsheet so that only one instance of a club appears in the 'Club' table and a correlating 'clubID' column in the 'Player' table that links to the club table. Can I automatically delete all repeating instances of something in Excel AND ALSO replace that repetition with, let's say, a number?
Screenshot of my top 4 records I have within Excel. The spreadsheet contains over 17,000 records.
You may like to show some sample data of your input table, what transitive dependency you have identified and how you expect your output tables (that you would export to Access) look like. With some sample input and output tables, it may be easier to provide a possible solution. – patkim – 2017-11-14T20:15:28.640
Transitive dependancy - {player} -> {club} essentially, but I will be removing all transitive dependencies and they will be relational through the use of foreign key's. If you see my image, you will notice some of the statistics that I have for the players. The problem lies in the 'club' section, as well as 'League' and also 'Club logo' as multiple players can play for 1 given club, they share the same League, Club, and Club photo, but they are separate players. – MC123 – 2017-11-15T09:34:00.433
[Table1: Club] -> ClubID, ClubName, ClubLogo,LeagueID), [Table 2: League] -> LeagueID, LeagueName, [Table 3: Players] -> PlayerID, PlayerName,...,...,...,...,..,...,...,.. ClubID] [Table 4: PlayerAttributes] -> (PlayerAttID, PlayerID,...,...,...,...,...,...,..,..) (I have excluded non-relevant columns, there are too many to include here) – MC123 – 2017-11-15T09:42:28.663