Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
Reformatted to fit your screen.

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.

user:
pass:
register,


                     

Supercore

Unparallelize a core and speed up processing
  (+2)
(+2)
  [vote for,
against]

Modern processors have a problem. It's much more efficient to put several cores on the chip than devote that silicon to branch prediction, cache, and other accessories for one. But like it or not a certain amount of programming can only be done one thing at a time, no matter how cleverly you program for multiple cores.

At first I considered proposing making two kinds of core; one larger and faster for tasks than cannot be made parallel. But that would be too costly. My proposal is to take one of the, for example, four cores on a chip and devote it entirely to caring for another one. It would do branch prediction, cache handling, memory swapping, and memory lookup. One core would therefore be able to work more efficiently at the cost of only three cores being available on the chip. I think this would result in an overall faster chip than a design simply making use of four of the same core.

Voice, Jan 10 2017

Branch prediction at wikipedia https://en.wikipedi...ki/Branch_predictor
[beanangel, Jan 11 2017]

Geranium https://en.wikipedia.org/wiki/Geranium
Mentioned in my anno [notexactly, Jan 12 2017]

Please log in.
If you're not logged in, you can see what this page looks like, but you will not be able to add anything.
Short name, e.g., Bob's Coffee
Destination URL. E.g., https://www.coffee.com/
Description (displayed with the short name and URL.)






       So you want to do branch prediction, caching, and all that stuff that's usually done by hardware or microcode in software instead? I imagine it'll go like software vs. hardware video effects/CGI rendering: it'll be a lot more accurate, but it'll also be a lot slower and consume a lot more energy. I suggest making it a feature that can turn on and off as necessary.   

       [+]
notexactly, Jan 10 2017
  

       It's already the case that one logical processing unit will have multiple dedicated subunits for dealing with the stuff which is relatively slow but needed frequently.   

       Asymmetrical cores also already exist, for example ARM's big.LITTLE architecture.
Loris, Jan 10 2017
  

       I used to wonder if two different voltage squarewaves could travel through a CPU, with one channel being read at come logic gates, and the other higher voltage being read at different logic gates. i think I remember the primitive solution of voltage drop with one diode causing the low voltage bits to drop to effective zero leaving the higher channel.   

       I remember thinking this could be proved more efficient because if you send as compressed version of data to say, Venus, then uncompress it, it would be faster than sending the raw data, so at planet spanning microprocessors there would be an advantage to a clock synchronized yet compressed data channel travelling through the same wires, so it is only a question of how big does a microprocessor have to be to heighten efficiency this way.   

       this kind of relates to your idea, as little big multicore could decompress and contribute to calculations
beanangel, Jan 10 2017
  

       Another potential speed-up would be to have a processor which can be programmed to do several operations at once to optimize tight-loops. Several ALUs would be required. They would be programmatically "connected" to specified registers and an operation selected for each to perform with each clock cycle. For example:
One could add the contents of one register with a value in RAM,
another could increment an address stored in EDI
another would count the loops (up or down),
another would compare the loop count with the exit value
and another would compare two values for an alternative exit condition.

What's obvious, is that if all of these interdependent operations are executing at once, operands would often be modified too early or too late. To remedy this, one would use two sets of registers: One of each set would hold the value it held at the beginning of the cycle, while the other would receive the freshly modified value. At the end of each cycle, the sets would swap their designations (original versus modified values).
Another likely necessity, is the abillity to undo the result of the last cycle, since it might not know before hand whether it will be one-too-many.
The CPU would be configured for handling a loop this way before entering the loop. It would revert to normal operation immediately after exiting the loop.
Alvin, Jan 11 2017
  

       [alvin] although what you described is different (niftier) [Voice] mentioned branch prediction which wikipedia describes as "The branch that is guessed to be the most likely is then fetched and speculatively executed. If it is later detected that the guess was wrong then the speculatively executed or partially executed instructions are discarded and the pipeline starts over with the correct branch, incurring a delay." (wikipedia, branch prediction) AMD Ryzen uses branch prediction.
beanangel, Jan 11 2017
  

       // AMD Ryzen uses branch prediction //   

       I would think just about every modern architecture uses branch prediction.   

       I use branch prediction while performing everyday tasks.
notexactly, Jan 11 2017
  

       I, too, am at least as intelligent as a two-square-centimeter piece of silicon and geranium.
Voice, Jan 12 2017
  

       Are you sure it's not a pelargonium? Apparently they're often confused with geraniums: [link]
notexactly, Jan 12 2017
  


 

back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle