There is an article on
that a new Intel processor architecture to succeed the lake processors
(Sky, Cannon, Ice and Tiger)
will be "faster and leaner" and more interestingly might not be entirely
compatible with older software.
The original source is
I suppose it is curious that the Lake processors form a double tick-tock or now process-architecture-optimization (PAO), but skipped Kaby, and Cannon.
Both the bridge (Sandy and Ivy) and well processors (Has and Broad) each had only
one tick-tock pair.
Naturally, I cannot resist commenting on this.
in the really old days, processor architecture and instruction set architecture (ISA) was somewhat
the same thing.
The processor implemented the instruction set, so that was the architecture.
I am excluding the virtual-architecture concept where lower cost version
would not implement the complete instruction set in hardware.
was a significant step away from this,
with micro-architecture and instruction set architecture now
largely different topics.
Pentium Pro has its own internal instructions, called
The processor dynamically decodes X86 instructions to the "native"
This was one of the main concepts that allow Intel to borrow
many of the important technologies from RISC.
The Pentium 4 processor, codename
that was a cache for decoded instructions.
This may not have been in the Core 2 architecture that
followed Pentium 4.
My recollection is that Pentium Pro had 36 physical registers
of which only 8 are visible to the X86 ISA.
The processor would rename the ISA registers as necessary to support
Pentium 4 increased this to 128 registers.
Nehalem micro-architecture diagrams
do not mention a
(somehow the acronym is DSB) but
and subsequent processors do.
This is curious because both Willamette and Nehalem
are Oregon designs, while Core 2 and Sandy Bridge are Haifa designs.
The other stream that comes into this topic involves
the Intel Itanium adventure.
The original plan for Itanium was to have a hardware (silicon)
Naturally, this would not be comparable to the then contemporary
X86 processors, which would have been Pentium III, codename Coppermine
at 900MHz, for Merced.
So by implication, X86 execution would probably be comparable to
something several years old, a Pentium II 266MHz with luck,
and Itanium was not lucky.
By the time of Itanium 2, the sophistication of software CPU emulation
was sufficiently advanced that the hardware X86 unit was discarded.
In its place was
IA-32 Execution Layer.
Also see the
IEEE Micro paper on this topic.
My recollection was the Execution Layer emulation was not great but not bad either.
The two relevant technologies are: one, the processor having native µops
instead of the visible X86 instructions,
and two, the Execution Layer for non-native code.
With this, why is the compiler generating X86
(ok, Intel wants to call these IA-32 and Intel 64 instructions?) binaries.
Why not make the native processor µops visible to the compiler.
When the processor detects a binary with native micro-instructions,
it can bypass the decoder?
Also make the full set of physical registers visible to the compiler?
If Hyper-threading is enabled, then the compiler should know to only
use the correct fraction of registers.
I am inclined to also say that the more the compiler knows about the underlying
hardware, the better it can generate binaries to fully utilize available resources,
with less reliance on the processor doing dynamic scheduling for parallelism.
But of course, that was what Itanium was,
and we would need to understand why Itanium did not succeed.
My opinion was that EPIC was really better suited to scientific computing
and not logic heavy server applications.
Have one or two generations of overlap,
for Microsoft and the Linux players make a native micro-op operating system.
Then ditch the hardware decoders for X86.
Any old code would then run on the Execution Layer,
which may not be 100% compatible.
But we need a clean break from old baggage
or it will sink us.
Off topic, but who thinks legacy baggage is sinking the Windows operating system?
Of course, I still think that one major issues is that Intel is stretching
their main line processor core over too broad a spectrum.
The Core is used in both high-performance and high-efficiency mode.
For high performance, it is capable of well over 4GHz,
probably more limited by power than transistor switching speed.
For power efficiency, the core is throttled to 2 or even 1 GHz.
If Intel wants to do this in a mobile processor, it is probably not
that big a deal.
However, in the big server chips, with 24 core in Xeon v4
and possibly 32 cores in the next generation (v5),
it becomes a significant matter.
The theory is that if a given core is designed to operate at
a certain level, then doubling the logic should achieve a 40%
increase in performance.
So if Intel is deliberately de-rating the core in the Xeon HCC die,
then they could built a different core specifically to one half
the original performance is perhaps one quarter the complexity.
So it should be possible to have 100 cores with half the performance
of the Broadwell 4GHz capable core,
i.e., equivalent to Broadwell at 2GHz?
If this supposed core were very power efficient, then perhaps we could
even support the thermal envelope of 100 mini-cores?
Of course, not every application is suitable for wide parallelism.
I would like to see Intel do a processor with mixed cores.
Perhaps 2 or 4 high performance cores and 80 or so mini-cores?
A really neat trick would be if the GPU were programmable,
but graphics vendors have things along this line?