mirror of
https://review.haiku-os.org/haiku
synced 2025-01-31 10:47:14 +01:00
d1ecc0955f
These are better here than in my bash history... Change-Id: Iab8940f4efed950e26a8bad29cb8954464270e8f Reviewed-on: https://review.haiku-os.org/c/haiku/+/4645 Reviewed-by: waddlesplash <waddlesplash@gmail.com>
370 lines
14 KiB
ReStructuredText
370 lines
14 KiB
ReStructuredText
The SPARC port
|
|
##############
|
|
|
|
The SPARC port targets various machines from Sun product lineup. The initial effort is on the
|
|
Ultra 60 and Ultra 5, with plans to latter add the Sun T5120 and its newer CPU. This may change
|
|
depending on hardware donations and developer interest.
|
|
|
|
Support for 32-bit versions of SPARC is currently not planned.
|
|
|
|
SPARC ABI
|
|
=========
|
|
|
|
The SPARC architecture has 32 integer registers, divided as follows:
|
|
|
|
- global registers (g0-g7)
|
|
- input (i0-i7)
|
|
- local (l0-l7)
|
|
- output (o0-o7)
|
|
|
|
Parameter passing and return is done using the output registers, which are
|
|
generally considered scratch registers and can be corrupted by the callee. The
|
|
caller must take care of preserving them.
|
|
|
|
The input and local registers are callee-saved, but we have hardware assistance
|
|
in the form of a register window. There is an instruction to shift the registers
|
|
so that:
|
|
|
|
- o registers become i registers
|
|
- local and output registers are replaced with fresh sets, for use by the
|
|
current function
|
|
- global registers are not affected
|
|
|
|
Note that as a side-effect, o7 is moved to i7, this is convenient because these
|
|
are usually the stack and frame pointers, respectively. So basically this sets
|
|
the frame pointer for free.
|
|
|
|
Simple enough functions may end up using just the o registers, in that case
|
|
nothing special is necessary, of course.
|
|
|
|
When shifting the register window, the extra registers come from the register
|
|
stack in the CPU. This is not infinite, however, most implementations of SPARC
|
|
will only have 8 windows available. When the internal stack is full, an overflow
|
|
trap is raised, and the handler must free up old windows by storing them on the
|
|
stack, likewise, when the internal stack is empty, an underflow trap must fill
|
|
it back from the stack-saved data.
|
|
|
|
Misaligned memory access
|
|
========================
|
|
|
|
The SPARC CPU is not designed to gracefully handle misaligned accesses.
|
|
You can access a single byte at any address, but 16-bit access only at even
|
|
addresses, 32bit access at multiple of 4 addresses, etc.
|
|
|
|
For example, on x86, such accesses are not a problem, it is allowed and handled
|
|
directly by the instructions doing the access. So there is no performance cost.
|
|
|
|
On SPARC, however, such accesses will cause a SIGBUS. This means a trap handler
|
|
has to catch the misaligned access and do it in software, byte by byte, then
|
|
give back control to the application. This is, of course, very slow, so we
|
|
should avoid it when possible.
|
|
|
|
Fortunately, gcc knows about this, and will normally do the right thing:
|
|
|
|
- For usual variables and structures, it will make sure to lay them out so that
|
|
they are aligned. It relies on stack alignment, as well as malloc returning
|
|
sufficiently aligned memory (as required by the C standard).
|
|
- On packed structure, gcc knows the data is misaligned, and will automatically
|
|
use the appropriate way to access it (most likely, byte-by-byte).
|
|
|
|
This leaves us with two undesirable cases:
|
|
|
|
- Pointer arithmetics and casting. When computing addresses manually, it's
|
|
possible to generate a misaligned address and cast it to a type with a wider
|
|
alignment requirement. In this case, gcc may access the pointer using a
|
|
multi byte instruction and cause a SIGBUS. Solution: make sure the struct
|
|
is aligned, or declare it as packed so unaligned access are used instead.
|
|
- Access to hardware: it is a common pattern to declare a struct as packed,
|
|
and map it to hardware registers. If the alignment isn't known, gcc will use
|
|
byte by byte access. It seems volatile would cause gcc to use the proper way
|
|
to access the struct, assuming that a volatile value is necessarily
|
|
aligned as it should.
|
|
|
|
In the end, we just need to be careful about pointer math resulting in unalined
|
|
access. -Wcast-align helps with that, but it also raises a lot of false positives
|
|
(where the alignment is preserved even when casting to other types). So we
|
|
enable it only as a warning for now. We will need to ceck the sigbus handler to
|
|
identify places where we do a lot of misaligned accesses that trigger it, and
|
|
rework the code as needed. But in general, except for these cases, we're fine.
|
|
|
|
The Ultrasparc MMUs
|
|
============================
|
|
|
|
First, a word of warning: the MMU was different in SPARCv8 (32bit)
|
|
implementations, and it was changed again on newer CPUs.
|
|
|
|
The Ultrasparc-II we are supporting for now is documented in the Ultrasparc
|
|
user manual. There were some minor changes in the Ultrasparc-III to accomodate
|
|
larger physical addresses. This was then standardized as JPS1, and Fujitsu
|
|
also implemented it.
|
|
|
|
Later on, the design was changed again, for example Ultrasparc T2 (UA2005
|
|
architecture) uses a different data structure format to enlarge, again, the
|
|
physical and virtual address tags.
|
|
|
|
For now te implementation is focused on Ultrasparc-II because that's what I
|
|
have at hand, later on we will need support for the more recent systems.
|
|
|
|
Ultrasparc-II MMU
|
|
-----------------
|
|
|
|
There are actually two separate units for the instruction and data address
|
|
spaces, known as I-MMU and D-MMU. They each implement a TLB (translation
|
|
lookaside buffer) for the recently accessed pages.
|
|
|
|
This is pretty much all there is to the MMU hardware. No hardware page table
|
|
walk is provided. However, there is some support for implementing a TSB
|
|
(Translation Storage Buffer) in the form of providing a way to compute an
|
|
address into that buffer where the data for a missing page could be.
|
|
|
|
It is up to software to manage the TSB (globally or per-process) and in general
|
|
keep track of the mappings. This means we are relatively free to manage things
|
|
however we want, as long as eventually we can feed the iTLB and dTLB with the
|
|
relevant data from the MMU trap handler.
|
|
|
|
To make sure we can handle the fault without recursing, we need to pin a few
|
|
items in place:
|
|
|
|
In the TLB:
|
|
|
|
- TLB miss handler code
|
|
- TSB and any linked data that the TLB miss handler may need
|
|
- asynchronous trap handlers and data
|
|
|
|
In the TSB:
|
|
|
|
- TSB-miss handling code
|
|
- Interrupt handlers code and data
|
|
|
|
So, from a given virtual address (assuming we are using only 8K pages and a
|
|
512 entry TSB to keep things simple):
|
|
|
|
VA63-44 are unused and must be a sign extension of bit 43
|
|
VA43-22 are the 'tag' used to match a TSB entry with a virtual address
|
|
VA21-13 are the offset in the TSB at which to find a candidate entry
|
|
VA12-0 are the offset in the 8K page, and used to form PA12-0 for the access
|
|
|
|
Inside the TLBs, VA63-13 is stored, so there can be multiple entries matching
|
|
the same tag active at the same time, even when there is only one in the TSB.
|
|
The entries are rotated using a simple LRU scheme, unless they are locked of
|
|
course. Be careful to not fill a TLB with only locked entries! Also one must
|
|
take care of not inserting a new mapping for a given VA without first removing
|
|
any possible previous one (no need to worry about this when handling a TLB
|
|
miss however, as in that case we obviously know that there was no previous
|
|
entry).
|
|
|
|
Entries also have a "context". This could for example be mapped to the process
|
|
ID, allowing to easily clear all entries related to a specific context.
|
|
|
|
TSB entries format
|
|
------------------
|
|
|
|
Each entry is composed of two 64bit values: "Tag" and "Data". The data uses the
|
|
same format as the TLB entries, however the tag is different.
|
|
|
|
They are as follow:
|
|
|
|
Tag
|
|
***
|
|
|
|
Bit 63: 'G' indicating a global entry, the context should be ignored.
|
|
Bits 60-48: context ID (13 bits)
|
|
Bits 41-0: VA63-22 as the 'tag' to identify this entry
|
|
|
|
Data
|
|
****
|
|
|
|
Bit 63: 'V' indicating a valid entry, if it's 0 the entry is unused.
|
|
Bits 62-61: size: 8K, 64K, 512K, 4MB
|
|
Bit 60: NFO, indicating No Fault Only
|
|
Bit 59: Invert Endianness of accesses to this page
|
|
Bits 58-50: reserved for use by software
|
|
Bits 49-41: reserved for diagnostics
|
|
Bits 40-13: Physical Address<40-13>
|
|
Bits 12-7: reserved for use by software
|
|
Bit 6: Lock in TLB
|
|
Bit 5: Cachable physical
|
|
Bit 4: Cachable virtual
|
|
Bit 3: Access has side effects (HW is mapped here, or DMA shared RAM)
|
|
Bit 2: Privileged
|
|
Bit 1: Writable
|
|
Bit 0: Global
|
|
|
|
TLB internal tag
|
|
****************
|
|
|
|
Bits 63-13: VA<63-13>
|
|
Bits 12-0: context ID
|
|
|
|
Conveniently, a 512 entries TSB fits exactly in a 8K page, so it can be locked
|
|
in the TLB with a single entry there. However, it may be a wise idea to instead
|
|
map 64K (or more) of RAM locked as a single entry for all the things that needs
|
|
to be accessed by the TLB miss trap handler, so we minimize the use of TLB
|
|
entries.
|
|
|
|
Likewise, it may be useful to use 64K pages instead of 8K whenever possible.
|
|
The hardware provides some support for mixing the two sizes but it makes things
|
|
a bit more complex. Let's start out with simpler things.
|
|
|
|
Software floating-point support
|
|
===============================
|
|
|
|
The SPARC instruction set specifies instruction for handling long double
|
|
values, however, no hardware implementation actually provides them. They
|
|
generate a trap, which is expected to be handled by the softfloat library.
|
|
|
|
Since traps are slow, and gcc knows better, it will never generate those
|
|
instructions. Instead it directly calls into the C library, to functions
|
|
specified in the ABI and used to do long double math using softfloats.
|
|
|
|
The support code for this is, in our case, compiled into both the kernel and
|
|
libroot. It lives in src/system/libroot/os/arch/sparc/softfloat.c (and other
|
|
support files). This code was extracted from FreeBSD, rather than the glibc,
|
|
because that made it much easier to get it building in the kernel.
|
|
|
|
Openboot bootloader
|
|
===================
|
|
|
|
Openboot is Sun's implementation of Open Firmware. So we should be able to share
|
|
a lot of code with the PowerPC port. There are some differences however.
|
|
|
|
Executable format
|
|
-----------------
|
|
|
|
PowerPC uses COFF. Sparc uses a.out, which is a lot simpler. According to the
|
|
spec, some fields should be zeroed out, but they say implementation may chose
|
|
to allow other values, so a standard a.out file works as well.
|
|
|
|
It used to be possible to generate one with objcopy, but support was removed,
|
|
so we now use elf2aout (imported from FreeBSD).
|
|
|
|
The file is first loaded at 4000, then relocated to its load address (we use
|
|
202000 and executed there)
|
|
|
|
Openfirmware prompt
|
|
-------------------
|
|
|
|
To get the prompt on display, use STOP+A at boot until you get the "ok" prompt.
|
|
On some machines, if no keyboard is detected, the ROM will assume it is set up
|
|
in headless mode, and will expect a BREAK+A on the serial port.
|
|
|
|
STOP+N resets all variables to default values (in case you messed up input or
|
|
output, for example).
|
|
|
|
Useful commands
|
|
---------------
|
|
|
|
Disable autoboot to get to the openboot prompt and stop there
|
|
|
|
.. code-block:: text
|
|
|
|
setenv auto-boot? false
|
|
|
|
Configuring for keyboard/framebuffer io
|
|
|
|
.. code-block:: text
|
|
|
|
setenv screen-#columns 160
|
|
setenv screen-#rows 49
|
|
setenv output-device screen:r1920x1080x60
|
|
setenv input-device keyboard
|
|
|
|
Configuring openboot for serial port
|
|
|
|
.. code-block:: text
|
|
|
|
setenv ttya-mode 38400,8,n,1,-
|
|
setenv output-device ttya
|
|
setenv input-device ttya
|
|
reset
|
|
|
|
Boot from network
|
|
-----------------
|
|
|
|
The openboot bootloader supports network booting. See the
|
|
`Network booting guide <https://www.haiku-os.org/guides/network_booting/>`_
|
|
for general information about the general network booting process. This page
|
|
documents the parts specific to the openboot bootloader configuration.
|
|
|
|
In openboot, booting from the network is done simply by using the "net:" device
|
|
alias in the boot command line. This lets openboot load our bootloader, which
|
|
then uses the openboot ability to send and receive data over the network to load
|
|
the filesystem (and kernel contained in it) over the network. The two parts are
|
|
independent: it's also possible to load the bootloader from the network but boot
|
|
a local filesystem, or use the local bootloader and load the filesystem from the
|
|
network.
|
|
|
|
The bootloader needs to be placed in a tftp server, I use atftpd in Debian,
|
|
which serve files from /srv/tftp/ (so "somefile" in the example below will look
|
|
for /srv/tftp/somefile).
|
|
|
|
static ip
|
|
*********
|
|
|
|
This currently works best, because rarp does not let the called binary know the
|
|
IP address. We need the IP address if we want to mount the root filesystem using
|
|
remote_disk server.
|
|
|
|
.. code-block:: text
|
|
|
|
boot net:192.168.1.2,somefile,192.168.1.89
|
|
|
|
The first IP is the server from which to download (using TFTP), the second is
|
|
the client IP to use. Once the bootloader starts, it will detect that it is
|
|
booted from network and look for a the remote_disk_server on the same machine.
|
|
|
|
rarp
|
|
****
|
|
|
|
This needs a reverse ARP server (easy to setup on any Linux system). You need
|
|
to list the MAC address of the SPARC machine in /etc/ethers on the server. The
|
|
machine will get its IP, and will use TFTP to the server which replied, to get
|
|
the boot file from there.
|
|
|
|
.. code-block:: text
|
|
|
|
boot net:,somefile
|
|
|
|
(net is an alias to the network card and also sets the load address: /pci@1f,4000/network@1,1)
|
|
|
|
This currently does not work completely: the server address is not forwarded to
|
|
the bootloader, and as a result, remote filesystems will not be available. The
|
|
bootloader needs to be updated to know where to find the address in this case
|
|
(it is done for PowerPC, I think).
|
|
|
|
dhcp
|
|
****
|
|
|
|
This needs a DHCP/BOOTP server configured to send the info about where to find
|
|
the file to load and boot.
|
|
|
|
.. code-block:: text
|
|
|
|
boot net:dhcp
|
|
|
|
|
|
|
|
Debugging
|
|
---------
|
|
|
|
The openboot environment provide several useful commands to assist in debugging:
|
|
|
|
.. code-block:: text
|
|
|
|
202000 dis (disassemble starting at 202000 until next return instruction)
|
|
4000 1000 dump (dump 1000 bytes from address 4000)
|
|
.registers (show global registers)
|
|
.locals (show local/windowed registers)
|
|
%pc dis (disassemble code being exectuted)
|
|
ctrace (backtrace)
|
|
|
|
The backtrace provides addresses and register values (allowing to know the
|
|
function arguments), there is no symbols and function names printed.
|
|
objdump (on the build machine) can be used to disassemble the kernel or
|
|
bootloader and find the corresponding code:
|
|
|
|
.. code-block:: text
|
|
|
|
./cross-tools-sparc/bin/sparc64-unknown-haiku-objdump -d objects/haiku/sparc/release/system/kernel/kernel_sparc |c++filt|less
|
|
./cross-tools-sparc/bin/sparc64-unknown-haiku-objdump -d objects/haiku/sparc/release/system/boot/openfirmware/boot_loader_openfirmware |c++filt|less
|