Welcome
Username or Email:

Password:


Missing Code




[ ]
[ ]
Online
  • Guests: 33
  • Members: 0
  • Newest Member: omjtest
  • Most ever online: 396
    Guests: 396, Members: 0 on 12 Jan : 12:51
Members Birthdays:
No birthdays today

Next birthdays
05/04 Matthew T. (35)
05/04 Amrit Deshmukh (60)
05/05 Alexandre (32)
Contact
If you need assistance, please send an email to forum at 4hv dot org. To ensure your email is not marked as spam, please include the phrase "4hv help" in the subject line. You can also find assistance via IRC, at irc.shadowworld.net, room #hvcomm.
Support 4hv.org!
Donate:
4hv.org is hosted on a dedicated server. Unfortunately, this server costs and we rely on the help of site members to keep 4hv.org running. Please consider donating. We will place your name on the thanks list and you'll be helping to keep 4hv.org alive and free for everyone. Members whose names appear in red bold have donated recently. Green bold denotes those who have recently donated to keep the server carbon neutral.


Special Thanks To:
  • Aaron Holmes
  • Aaron Wheeler
  • Adam Horden
  • Alan Scrimgeour
  • Andre
  • Andrew Haynes
  • Anonymous000
  • asabase
  • Austin Weil
  • barney
  • Barry
  • Bert Hickman
  • Bill Kukowski
  • Blitzorn
  • Brandon Paradelas
  • Bruce Bowling
  • BubeeMike
  • Byong Park
  • Cesiumsponge
  • Chris F.
  • Chris Hooper
  • Corey Worthington
  • Derek Woodroffe
  • Dalus
  • Dan Strother
  • Daniel Davis
  • Daniel Uhrenholt
  • datasheetarchive
  • Dave Billington
  • Dave Marshall
  • David F.
  • Dennis Rogers
  • drelectrix
  • Dr. John Gudenas
  • Dr. Spark
  • E.TexasTesla
  • eastvoltresearch
  • Eirik Taylor
  • Erik Dyakov
  • Erlend^SE
  • Finn Hammer
  • Firebug24k
  • GalliumMan
  • Gary Peterson
  • George Slade
  • GhostNull
  • Gordon Mcknight
  • Graham Armitage
  • Grant
  • GreySoul
  • Henry H
  • IamSmooth
  • In memory of Leo Powning
  • Jacob Cash
  • James Howells
  • James Pawson
  • Jeff Greenfield
  • Jeff Thomas
  • Jesse Frost
  • Jim Mitchell
  • jlr134
  • Joe Mastroianni
  • John Forcina
  • John Oberg
  • John Willcutt
  • Jon Newcomb
  • klugesmith
  • Leslie Wright
  • Lutz Hoffman
  • Mads Barnkob
  • Martin King
  • Mats Karlsson
  • Matt Gibson
  • Matthew Guidry
  • mbd
  • Michael D'Angelo
  • Mikkel
  • mileswaldron
  • mister_rf
  • Neil Foster
  • Nick de Smith
  • Nick Soroka
  • nicklenorp
  • Nik
  • Norman Stanley
  • Patrick Coleman
  • Paul Brodie
  • Paul Jordan
  • Paul Montgomery
  • Ped
  • Peter Krogen
  • Peter Terren
  • PhilGood
  • Richard Feldman
  • Robert Bush
  • Royce Bailey
  • Scott Fusare
  • Scott Newman
  • smiffy
  • Stella
  • Steven Busic
  • Steve Conner
  • Steve Jones
  • Steve Ward
  • Sulaiman
  • Thomas Coyle
  • Thomas A. Wallace
  • Thomas W
  • Timo
  • Torch
  • Ulf Jonsson
  • vasil
  • Vaxian
  • vladi mazzilli
  • wastehl
  • Weston
  • William Kim
  • William N.
  • William Stehl
  • Wesley Venis
The aforementioned have contributed financially to the continuing triumph of 4hv.org. They are deserving of my most heartfelt thanks.
Forums
4hv.org :: Forums :: Computer Science
« Previous topic | Next topic »   

MySQL or Oracle?

1 2 
Move Thread LAN_403
McFluffin
Sun Aug 13 2006, 04:03PM Print
McFluffin Registered Member #119 Joined: Fri Feb 10 2006, 06:26AM
Location: USA
Posts: 114
I have a fairly large database(the MySQL data file has reached 47GB) that is composed of Medline data that a company is going to use for text mining. An Oracle expert joined our team and has us switching over to Oracle. Besides the fact that Oracle is like a 10 minute walk from my house which allows me to yell at them if I get annoyed, how much of an improvement can I really expect by using Oracle? I am told that Oracle handles large databases better, but I am not quite sure how much better or if it is even worth the trouble since I will have to learn a new system. Also, it has to be on a Windows system(Server 2003), which I think is a performance loss for both DBMSs, so that is a factor. I installed Oracle on a server, and it now has problems starting up. The screen that says "Windows Server 2003" that sort of goes from light to dark and has a loading bar under it now freezes most of the time and it takes a couple of tries to get it to boot up in Server 2003. I have a Server 2000 partition which I prefer to use anyway which is fine, but I am still not happy with what happened to the 2003 partition. Finally, Oracle launches multiple Perl scripts which take up a lot of memory and all of the processor which is really odd since I don't have any data in the database yet. So, which would I be better using? It seems that a lot of stuff is going to go to Oracle, but I will still have the option of maintaining some stuff in MySQL I think. Luckily, code I've written for this so far uses Java and database drivers, so it should be fairly simple to switch the driver to an Oracle driver.
Back to top
Carbon_Rod
Sun Aug 13 2006, 11:27PM
Carbon_Rod Registered Member #65 Joined: Thu Feb 09 2006, 06:43AM
Location:
Posts: 1155
Database design is the backbone of a company. No “New” guy should be given access to critical business operations. Expert or not is irrelevant.

Oracle in my opinion is overpriced bloat ware. It does have some excellent features like the ability to have recursive triggers, better 4GB BLOB support, and the full-featured HORRIFYING forms builder in Java. Oracle made huge changes from 8 -> 9 and from 9.x onward too. I would suspect your “new” guy is not an expert on all three major systems (countless configurations and PLSQL subtleties too etc.)

MySQL can partition a database across multiple systems to decrease query times. It also seems the clustering support is rather brilliant these days too. If a highly volatile transaction table is being used one could create 4GB ram drives. And custom build a kernel with bloated IO buffers and Raid 5/0 support on each node. Performance can increase substantially. (Note UPS protection and journaling file systems would then become mandatory)

Database design should include an archiving system to parse old data to slower systems. Otherwise the performance will decay even if you have modular design in excellent normal form. In this case one would try to take each standard query/insertion table structures and match them to each node to increase concurrent transaction performance (lower congestion). One can only hope 47GB was not some sort of monolithic set of files.

That being said, oracle does just about everything one can imagine – just $40000 versus under a grand for unlimited commercial operation that could run over 40 nodes on 4 networks for the same price (even without Raid the seek times would drop by a magnitude.)

Just a thought,
=)
Back to top
McFluffin
Mon Aug 14 2006, 12:00AM
McFluffin Registered Member #119 Joined: Fri Feb 10 2006, 06:26AM
Location: USA
Posts: 114
The guy who joined our team I think has been with Oracle since near its beginning and sets Oracle systems up professionaly I think. He specificaly requested Oracle 10(the newest version), so I am guessing he has experience with newer version. Right now we have a single IBM eServer x235 with a single 2.4GHz Xeon processor(if you happen to know where to get better ones/another, PM me, it can take up to 3.2GHz but they are so darn expensive) and 2.5GB of RAM. We have a single 1GB DDR stick, otherwise we'd be up another GB because of DDR's limiatations. I have another x235 (single 2.4) and a dual 2.2 Xeon that could be used I suppose if I wanted to test some things out.
If I understand what you are talking with parsing the data to slower systems, you are talking about reducing the amount of data to more general forms or on cheap media as in a large data warehouse. The problem with this is that we do not yet know which parts of the database will be used and which won't. If we have a huge table that is never really accessed(say we don't care about the author column for the abstracts, for example), will that slow down queries? With clustering, I talked to a guy about clustering MySQL a couple of weeks ago and he thought that on Windows, there was a limiation that caused you to have to load the entire database into RAM, which pretty much defeats the whole point of the cluster. I have not verified this, however. In any case, we only have 1 development server right now, so clustering is probably out of the question. I'd imagine that getting a second processor would be a good idea to boost performance. The main clustering option that might be a good idea is to have the data mining output on one server, and the data mining source on another as I would think that would speed things up a lot of network bandwith wasn't an issue. This server has a Gb ethernet, but I am not sure what sort of switch it will be on.
As far as backup, I am not sure if it will have UPS. The server is going to be moved to Stanford University sometime this week. I did, however, setup RAID1 on the hard drive to mirror its 320GB SATA drive. It has hot swap slots for SCSI drives, but they are waay to expensive for a reasonable size, so we just went with SATA.
I think I'll just keep working on my MySQL table then since I've been fairly happy with that and see what happens with the whole Oracle thing. How is it that MySQL costs money? I thought it was free software and only the support cost you money?
Back to top
Carbon_Rod
Mon Aug 14 2006, 04:40AM
Carbon_Rod Registered Member #65 Joined: Thu Feb 09 2006, 06:43AM
Location:
Posts: 1155

Look up “elevator scheduling” for an explanation of why even random data across multiple smaller drives is better (often cheaper & more reliable too.)

Single processor is also a bad idea – a dual processor at half the speed can be 20 times more efficient under certain conditions (safe concurrent threading is big in database implementations.)

Linux/Unix file systems are going to be faster under certain circumstances too. Notably you don’t have to even load a GUI or other support processes. Choose a distro like Redhat or Debian that implement a Mandatory Access Control policy.

IIRC the Win32 version of MySQL does some funky things with the permission tables just to get it to work.

MySQL is free for non-commercial use – but there are some catches depending on the version you run – see here for the current set of rules:
Link2

A word of warning about Oracle – read the uninstall instructions and follow them to the letter. It can leave your system inoperable.
Back to top
McFluffin
Mon Aug 14 2006, 06:00AM
McFluffin Registered Member #119 Joined: Fri Feb 10 2006, 06:26AM
Location: USA
Posts: 114
Thanks for all of your comments. I really appreciate it.
I don't have a choice about the OS. I don't like server 2003 and would have gone with server 2000 at the very least if not some linux distro(probably a fedora based since I am most familar with those).
That aside, I could enable hyperthreading if you think that might help as a substitute for multiple processors. However, I have never been clear on how this helps. When I looked up hyper threading before, people weren't very impressed with it because it split up the processor which isn't good for most apps since they are single threaded. The old server things ran on was my dual 2.2 Xeon. Could it be advisable to stack up two processors from that into the new machine instead of using its one? I have seen pairs of 2.2s go for considerably less than 2.4s(I have seen 150USD a pair on eBay and local sources). I am assuming that when you say multiple can have a huge increase, you are assuming that we will have quite a few simaltaneous connections. If we only have a couple of concurrent connections for the time being, will I really see such a large performance increase with the same total processing power spread out across two physical processors? A thread or two will be doing text mining which will be processor intensive and, although I could change some of the algorithims fairly easily to be multithreaded, are currently single threaded and would suffer from slower processors. With the hard drives and the elevator stuff, I think I can do RAID 5 if I wanted to, but I am not familar with that and we already have some hard drives installed, so probably not going to do anything about that right now. However, the controller is 4 channel and we are only using two right now, so it is an option for the future.
I followed the Oracle uninstallation instructions as best I could. I deleted all files and removed all registry keys that it stated. Still had problems and ended up putting Oracle back on since the problems didn't go away and Oracle was still requested. As this is a startup company, any option that might cost more but be nice is going to be discarded since no revenue has yet been generated and everything is being paid for out of pocket.
I read that page and I am still confused about the MySQL licensing, but perhaps that is because I don't understand GPL enough. If I don't actually modify the source code but rather use their database internally in the company and don't actually sell a product that encompasses that database(such as being able to search our database on the web site, selling services, not a product), there is no GPL interaction since I am not directly selling a MySQL related product. Please clarify this if it is wrong.
Once again, thanks for your help.
Back to top
Carbon_Rod
Mon Aug 14 2006, 07:44AM
Carbon_Rod Registered Member #65 Joined: Thu Feb 09 2006, 06:43AM
Location:
Posts: 1155
Pfftt -- "hyperthreading" only makes the pipeline go a bit faster if and only if your compiler supports it. I would say it will not make a diff. in win32/64.

if its the IBM i think it is then it should already have dual CPU slots at the least.

If you want to see the best in the world try google's design
Link2
Even they could bump it up a bit more if they wanted.

I think it should be OK, but legal stuff you want to get right the first time =o] Better phone mySQL with a "personal" request to ask questions of "if".

One can always lease dedicated servers with firewall and VPN for around $150/month. And it comes with licensed mySQL on fedora. That’s about $1800 a year per node parked on the Internet – but no hardware maintenance cost. Still several times cheaper than Oracle – But it really does depend on what you plan on doing. It may be a better tax write off.

Back to top
Bjørn
Mon Aug 14 2006, 06:21PM
Bjørn Registered Member #27 Joined: Fri Feb 03 2006, 02:20AM
Location: Hyperborea
Posts: 2058
Hyperthreading helps in exactly the same way as having two processors. It adds a new processor to the system to let two threads run at the same time. The two processors share cache and ALUs.

The performance gain will depend on the application. Unless you know the exact memory access patterns and instruction mix of all threads you can only find the performance gain by testing.

The smallest increase I have had was -2% and the largest increase was 53%. On normal applications more 10% than 50%.
Back to top
Carbon_Rod
Tue Aug 15 2006, 12:39AM
Carbon_Rod Registered Member #65 Joined: Thu Feb 09 2006, 06:43AM
Location:
Posts: 1155
Right, so cache access (and look aside buffers) could cause contention with concurrent requests and an increase in dirty page-faults even if Belady’s anomaly is non-contributing. Link2

Hype-r-threading I am sure has its place with limited concurrent operations. ;o)

Symmetric multiprocessing has different behaviours, limitations, and benefits. But again, getting the most out of it requires a good operating system and properly designed programs (Bad thread design can happen anywhere.)

Cheers,
Back to top
McFluffin
Tue Aug 15 2006, 02:25AM
McFluffin Registered Member #119 Joined: Fri Feb 10 2006, 06:26AM
Location: USA
Posts: 114
Thanks for all the advice. I'll run some tests with a couple of setups and see what happens. I'll play around with some of these ideas on my dev server and if I like the results push things to the company server.
There will probably be sporadic high use times that the threads will use that will not be time critical, but it would be nice for them to finish in a timely manner. The end users will only rely on a single table for which queries should be fairly fast. There will be a lot of data in the table(probably in the order of at least one million records I would guess). Since people tend to be a lot less patient than machines, I guess I'd eventually like to optimize things for those queries if the server gets real busy. However, right now will probably be a lot of data processing for setup, so it might be better to optimize for operations that are probably going to be single threaded. I'll give some updates after I do some benchmarks.
On a side note, I did do some testing before with Intel against Intel Xeon processors on a single threaded application. My desktop used a 2.2GHz Xeon with RDRAM and my laptop used an Intel 2.8GHz to do the same task. The 2.2Ghz processor outperformed the laptop by some 40% maybe.
Back to top
Steve Conner
Tue Aug 15 2006, 11:47AM
Steve Conner Registered Member #30 Joined: Fri Feb 03 2006, 10:52AM
Location: Glasgow, Scotland
Posts: 6706
I got the impression that hyperthreading was kind of "meh". rolleyes My understanding is that it's still a single CPU, but with extra circuitry to let it do fast hardware context switches between two threads. Then a bunch of trickery makes it look to the OS like two separate CPUs. If you watch the CPU load meter in Windows, you'll see that the loads on the two virtual CPUs never add up to more than 100% dead

This is supposed to help with situations where one thread is idle I guess. The CPU can context switch in hardware a lot faster than the OS can do it in software. Also, subsystems within the CPU that aren't being used by one thread can be given to the other one. In other words, while the two CPU loads are never going to add to more than 100%, I suppose they'll sometimes add to a higher figure than the same CPU without HT could achieve. It's the kind of feature that an optimizing compiler, or a programmer working in assembler, could really beat on to get things done quicker.

The DSPs I program on have a hardware context switch that I use all the time. They have two complete sets of registers that you swap between with a single instruction. I divide my program into an interrupt service routine with all the signal processing stuff, and a main loop that does everything else, and swap between them hundreds to thousands of times per second. I guess it's ghetto hyperthreading smile

They also have multifunction instructions that let you do up to three things per clock cycle. But because it hurts my head so much trying to figure out how to use these, I tend to only use them in the worst "Hot spots" of the signal processing, and then only in the ways that the maker's handbook shows for doing FIR filtering and suchlike. I think that can be a problem with fancy CPU designs. Most of the time I'd rather have a CPU that does a few easy-to-understand instructions really fast, and I suspect optimizing compilers feel the same wink
Back to top
1 2 

Moderator(s): Chris Russell, Noelle, Alex, Tesladownunder, Dave Marshall, Dave Billington, Bjørn, Steve Conner, Wolfram, Kizmo, Mads Barnkob

Go to:

Powered by e107 Forum System
 
Legal Information
This site is powered by e107, which is released under the GNU GPL License. All work on this site, except where otherwise noted, is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. By submitting any information to this site, you agree that anything submitted will be so licensed. Please read our Disclaimer and Policies page for information on your rights and responsibilities regarding this site.