5

I read about FHS, and I started to consider the file system of wikipedia. On the one hand, I feel it is a security risk to let everyone know it. On the other hand, it is necessary for developers. For example, is there some rule to know where are all sitemaps and their indices located? So:

How is the file system of Wikipedia designed?

6 Answers6

10

MediaWiki and thus Wikipedia uses MySQL to store all data shown on the site. You can see their database schema here: http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png

Adam Gibbins
  • 7,147
  • 2
  • 28
  • 42
6

Uhm. It's a database, not a filesystem. You can get the source code here

Thomas
  • 1,446
  • 11
  • 16
  • Is your reply contradictory to the reply by James F? –  Jun 27 '09 at 22:32
  • @SimpleThings: No. While Wikiepedia (et. al) pages are not stored in files, the database containing those pages is itself stored on a filesystem. – Richard Jun 28 '09 at 11:10
4

The FHS is not specific to Wikipedia or Mediawiki. It's just a suggested way to lay out the filesystems of any *nix-like system.

You could host Mediawiki (the software that runs Wikipedia) on any system that can run PHP and MySQL, regardless of what the underlying filesystem looked like.

Where in that filesystem your MySQL data and indices was stored is going to depend on whether you built MySQL from source or installed a distribution package (in which case it's whereever the package builder decided to put it).

James F
  • 6,549
  • 1
  • 25
  • 23
2

The master database servers run MySQL and store the article metadata. Text is stored on separate database instances running on Apache servers, to avoid consuming expensive database disk space.

Source

Gregor
  • 286
  • 1
  • 3
  • Is it possible to know from the database schema when they are using different database instances? I am not sure of the word "instance" here, just a new database? You can see the schema posted by Adam Gibbins. –  Jun 27 '09 at 22:29
  • I didn't found anything about the database instances. But if you'r intrested you could ask them - it's an open project and they would be glad to answer (as I understand it ;-)). However, as I understand they have one Database as a master and a lot slaves which replicate the data and serve to the enduser?! – Gregor Jun 28 '09 at 06:48
0

Wikipedia uses Hdoop.
http://hadoop.apache.org/

Quandary
  • 974
  • 4
  • 18
  • 34
0

The releasing the file system/database layout would not be much of security issue (security that is worth more than $0.02 works when you how it is done).

Just check out any crypto forum and you will that the security model that they use assumes that the attacker already knows how the system is constructed and which algorithms are in use. The reason is pretty simple, if your security is dependent on keeping the layout secret, than anybody that finds the layout can then break your security. Like the OP said, the people working on the system know the layout (and they don't have jobs for life). Any copies will expose the secret for everyone. In the crypto world, security is based on the idea that without the secret key, every copy must must be broken using methods that are little faster than brute force.

Walter
  • 1,047
  • 7
  • 14
  • Do you mean "Real-time adaptive security": http://en.wikipedia.org/wiki/Real-time_adaptive_security ? Is it the same as "automated security"? –  Jun 29 '09 at 17:21
  • By the phrase "automated security", I mean security that tries to be as independent of human factors as possible, adaptive in this sense. –  Jun 29 '09 at 17:22
  • Nope, I mean "real security" as in security that is worthwhile. Something that stand up to an attack by people with at least half a clue. I changed the answer to be more clear on this subject – Walter Jun 30 '09 at 00:00