LOGIN   :::   RECOVER PASS   :::   GET ACCOUNT    
Browse
  • Projects
  • Code (CVS)
  • Forums
  • News
  • Articles
  • Polls
  •  
    OpenCores
  • FAQ
  • CVS HowTo
  • Mission
  • Media
  • Tools
  • Advertise
  • Mirrors
  • Logos
  • Contact us
  • Find Resources
  • Job Opportunity
  •  
    Tools
  • Search
      
  • Download Cores (CVSGet)
  •  
    More
  • Wishbone
  • Perlilog
  • EDA tools
  • OpenTech CD
  •  
    Navigation: All forums > Cvs-checkins > Message List > Message Post

    Message

    Reply | Reply all
    Date Prev | Date Next | Thread Prev | Thread Next Date Index | Thread Index

    From: cvs at opencores.org<cvs@o...>
    Date: Mon Nov 28 16:39:45 CET 2005
    Subject: [cvs-checkins] MODIFIED: simpcon ...
    Top
    Date: 00/05/11 28:16:39

    Added: simpcon/doc simpcon.tex
    Log:
    Add document sources to the project


    Revision Changes Path
    1.1 simpcon/doc/simpcon.tex

    http://www.opencores.org/cvsweb.shtml/simpcon/doc/simpcon.tex?rev=1.1&content-type=text/x-cvsweb-markup

    Index: simpcon.tex
    ===================================================================
    \documentclass[a4paper,12pt]{scrartcl}
    \usepackage{pslatex} % -- times instead of computer modern

    \usepackage[colorlinks=true,linkcolor=black,citecolor=black]{hyperref}
    \usepackage{booktabs}
    \usepackage{graphicx}

    \usepackage[latin1]{inputenc}

    \newcommand{\code}[1]{{\textsf{#1}}}
    \newcommand{\sign}[1]{{\texttt{#1}}}


    \begin{document}

    \title{SimpCon -- a Simple SoC Interconnect\\Draft}
    \author{Martin Schoeberl\\ martin@j...}
    \maketitle \thispagestyle{empty}

    \begin{abstract}

    This document proposes a simple interconnection standard for
    system-on-chip (SoC) components. It is intended to provide pipelined
    access to devices such on-chip peripherals and on-chip memory
    controller with minimum hardware resources.


    \end{abstract}

    \section{Introduction}

    The intention of the following SoC interconnect standard is to be
    simple and efficient with respect to implementation resources and
    transaction latency.

    SimpCon is a fully synchronous standard for on-chip
    interconnections. It is a point-to-point connection between a master
    and a slave. The master starts either a read or write transaction.
    Master commands are single cycle to free the master to continue on
    internal operations during an outstanding transaction. The slave has
    to register the address when needed for more than one cycle. The
    slave also registers the data on a read and provides it to the
    master for more than a single cycle. This property allows the master
    to delay the actual read if it is busy with internal operations.

    The slave signals the end of the transaction through a novel
    \emph{ready counter} to provide an early notification. This early
    notification simplifies the integration of peripherals into
    pipelined masters.

    Slaves can also provide several levels of pipelining. This feature
    is announced by two static output ports (one for read and one write
    pipeline levels).

    Off-chip connections (e.g.\ main memory) are device specific and
    need a slave to perform the translation. Peripheral interrupts are
    not covered by this specification.

    \subsection{Feature}

    \begin{itemize}
    \item Master/slave point-to-point connection
    \item Synchronous operation
    \item Read and write transactions
    \item Early pipeline release for the master
    \item Pipelined transactions
    \item Open-source specification
    \item Low implementation overheads
    \end{itemize}

    \subsection{Basic Read Transaction}

    Figure~\ref{fig:sc:basic:rd} shows a basic read transaction for a
    slave with one cycle latency. The acknowledge signals are omitted
    from the figure. In the first cycle, the address phase, the
    \sign{rd} signals the slave to start the read transaction. The
    address is registered by the slave. During the following cycle, the
    read phase, the slave performs the read and registers the data. Due
    to the register in the slave the data is available in the third
    cycle, the result phase. To simplify the master, the read data stays
    valid till the next read request response.

    \begin{figure}
    \centering
    \includegraphics{figures/sc_basic_rd}
    \caption{Basic read transaction}
    \label{fig:sc:basic:rd}
    \end{figure}
    \subsection{Basic Write Transaction} A write transaction consists of a single cycle address/command phase started by assertion of \sign{wr} where the address and the write data are valid. \sign{address} and \sign{wr\_data} are usually registered by the slave. The end of the write cycle is signalled to the master by the slave with \sign{rdy\_cnt}. See section \ref{sec:ack} and an example in Figure~\ref{fig:sc:wr:ws}. \section{SimpCon Signals} This sections defines the signals used by the SimpCon connection. Some of the signals are optional and may not be present on a peripheral device. All signals are a single direction point-to-point connection between a master and a slave. The signal details are described by the device that drives the signal. Table~\ref{tab:sc:signals} lists the signals that define the SimpCon interface. The column Direction indicates wether the signal is driven by the master or the slave. \begin{table} \centering \begin{tabular}{lrlll} \toprule Signal & Width & Direction & Required & Description \\ \midrule \sign{address} & 1--32 & Master & No & Address lines from the master\\ & & & & to the slave port\\ \sign{wr\_data} & 32 & Master & No & Data lines from the master\\ & & & & to the slave port\\ \sign{rd} & 1 & Master & No & Start of a read transaction \\ \sign{wr} & 1 & Master & No & Start of a write transaction \\ \sign{rd\_data} & 32 & Slave & No & Data lines from the slave\\ & & & & to the master port\\ \sign{rdy\_cnt} & 2 & Slave & Yes & Transaction end signalling \\ \sign{rd\_pipeline\_level} & 2 & Slave & No & Maximum pipeline level\\ & & & & for read transactions \\ \sign{wr\_pipeline\_level} & 2 & Slave & No & Maximum pipeline level\\ & & & & for write transactions \\ \bottomrule \end{tabular} \caption{SimpCon port signals} \label{tab:sc:signals} \end{table} \subsection{Master Signal Details} This section describes the signals that are driven by the master to initiate a transaction. \subsubsection{address} Master addresses represent word addresses as offsets in the slaves address range. \sign{address} is valid a single cycle either with \sign{rd} for a read transaction or with \sign{wr} and \sign{wr\_data} for a write transaction. The number of bits for \sign{address} depend on the slaves address range. For a single port slave \sign{address} can be omitted. \subsubsection{wr\_data} The \sign{wr\_data} signals carry the data for a write transaction. It is valid for a single cycle together with \sign{address} and \sign{wr}. The signal is typically 32 bits wide. Slaves can ignore upper bits when the slave port is less than 32 bits. \subsubsection{rd} The \sign{rd} signal is asserted a single clock cycle to start a read transaction. \sign{address} has to be valid in the same cycle. \subsubsection{wr} The \sign{wr} signal is asserted a single clock cycle to start a write transaction. \sign{address} and \sign{wr\_data} have to be valid in the same cycle. \subsubsection{sel\_byte} The \sign{sel\_byte} signal is reserved for future versions of the SimpCon specification to add individual byte enables. \subsection{Slave Signal Details} This section describes the signals that are driven by the slave as a response to transaction initiated by the master. \subsubsection{rd\_data} The \sign{wr\_data} signals carry the result for a read transaction. The data is valid when \sign{rdy\_cnt} reaches 0 and stays valid till a new read result is available. The signal is typically 32 bits wide. Slaves that provide less than 32 bits should pad the upper bits with 0. \subsubsection{rdy\_cnt} The \sign{rdy\_cnt} signal provides the number of cycles till the pending transaction will finish. A 0 means that either read data is available or a write transaction has been finished. Values of 1 and 2 mean the the transaction will finish in at least 1 or 2 cycles. The maximum value is 3 and means the the transaction will finish in 3 or \emph{more} cycles. Note that not all values have to be used in a transaction. Each monotonic sequence of \sign{rdy\_cnt} values is legal. \subsubsection{rd\_pipeline\_level} The static \sign{rd\_pipeline\_level} provides the master with the read pipeline level of the slave. The signal has to be constant to enable the synthesizer to optimize the pipeline level dependent state machine in the master. \subsubsection{wr\_pipeline\_level} The static \sign{wr\_pipeline\_level} provides the master with the write pipeline level of the slave. The signal has to be constant to enable the synthesizer to optimize the pipeline level dependent state machine in the master. \section{Slave Acknowledge} \label{sec:ack} Flow control between the slave and the master is usually done by a single signal in the form of \emph{wait} or \emph{acknowledge}. The \sign{ack} signal, e.g.\ in the Wishbone specification, is set when the data is available or the write operation has finished. However, for a pipelined master it can be of interest to know it \emph{earlier} when a transaction will finish. For a lot of slaves, e.g.\ a SRAM interface with fixed wait states, this information is available inside the slave. In the SimpCon interface this information is communicated to the master through the two bit signal \sign{rdy\_cnt}. \sign{rdy\_cnt} signals the number of cycles till the read data will be available or the write transaction will be finished. Value 0 is equivalent to an \emph{ack} signal and 1, 2, and 3 are equivalent to a wait request with the distinction that the master knows how long the wait request will last. To avoid too many signals at the interconnect \sign{rdy\_cnt} has a width of two bits. Therefore, the maximum value of 3 has the special meaning that the transaction will finish in 3 or \emph{more} cycles. As a result the master can only use the values 0, 1, and 2 to release actions in it's pipeline. Idle slaves will keep the former value of 0 for \sign{rdy\_cnt}. Slaves, that don't know in advance how many wait states are need for the transaction can produce sequences that omit any of the numbers 3, 2, and 1. The master has to handle this situations. Figure~\ref{fig:sc:rd:ws} shows an example of a slave that needs three cycles for the read to be processed. In cycle 1 the read command and the address are set by the master. The slave registers the address and sets \sign{rdy\_cnt} to 3 in cycle 2. The read takes three cycles (2--4) during which \sign{rdy\_cnt} is decremented. In cycle 4 the data is available inside the slave and gets registered. It is available in cycle 5 for the master and \sign{rdy\_cnt} is finally 0. Both, the \sign{rd\_data} and \sign{rdy\_cnt} will keep their value till a new transaction is requested. \begin{figure} \centering \includegraphics{figures/sc_rd_ws} \caption{Read transaction with wait states} \label{fig:sc:rd:ws} \end{figure} Figure~\ref{fig:sc:wr:ws} shows an example of a slave that needs three cycles for the write to be processed. The address, the data to be written and the write command are valid during cycle 1. The slave registers the address and write data during cycle 1 and performs the write operation during cycles 2--4. The \sign{rdy\_cnt} is decremented and a non-pipelined slave can accept a new command after cycle 4. \begin{figure} \centering \includegraphics{figures/sc_wr_ws} \caption{Write transaction with wait states} \label{fig:sc:wr:ws} \end{figure} \section{Pipelining} Figure~\ref{fig:sc:pipe:level} shows a read transaction for a slave with four cycles latency. Without any pipelining the next read transaction will start in cycle 7 after the data from the former read transaction is read by the master. The three bottom lines show when new read transactions will be started for different pipeline levels. With pipeline level 1 a new transaction can start in the same cycle when the former read data is available (in this example in cycle 6). Higher levels mean that the next read will start earlier as shown for level 2 and 3. \begin{figure} \centering \includegraphics[width=\textwidth]{figures/sc_pipe_level} \caption{Different pipeline levels for a read transaction} \label{fig:sc:pipe:level} \end{figure} Implementation of level 1 in the slave is trivial (just two more transitions in the state machine). It is recommended to provide level 1 at least for read transactions. Level 2 is a little bit more complex but usually no additional address or data registers are needed. To implement level 3 pipelining in the slave at least an additional address register is needed. However, to use level 3 the master has to issue the request in the same cycle as \sign{rdy\_cnt} goes to 2. That means this transition is combinatorial. We see in Figure~\ref{fig:sc:pipe:level} that \sign{rdy\_cnt} value of 3 means three or more cycles till the data is available and can therefore not be used to trigger a new transaction. \section{Multiple Master} SimpCon defines no signals for the communication between a master and an arbiter. However, it is possible to build a multi master system with SimpCon. The SimpCon interface can be used as interconnect between the masters and the arbiter and the arbiter and the slaves. In this case the arbiter acts as slave for the master and as master for the peripheral devices. The missing arbitration protocol in SimpCon results in the need to queue $n-1$ requests in an arbiter for $n$ masters. However, for this additional HW we get zero overheads for the bus request. The master, which gets the bus will will start the slave transaction in the same cycle. \\ \\ TODO: add a timing diagram to explain this concept. \section{Examples} This section provides some examples for the application of the SimpCon definition. \subsection{IO Port} TODO: Show how simple an IO port can be with SimpCon. We need no addresses and can tie \sign{bsy\_cnt} to 0. We only need the \sign{rd} or \sign{wr} signal to enable the port. \subsection{SRAM interface} The following example is taken from an implementation of SimpCon for a Java processor. The processor is clocked with 100MHz and the main memory consists of 15ns static RAMs. Therefore the minimum access time for the RAM is two cycles. The slack time of 5ns forces us to use output registers for the RAM address and write data and input registers for the read data in the IO cells of the FPGA. These registers fit nice with the intention of SimpCon to use registers inside the slave. Figure~\ref{fig:sc:sram} shows the interface for a non-pipelined read access followed by a write access. Four signals are driven by the master and two signal by the slave. The lower half of the figure shows the signals at the FPGA pins where the RAM is connected. \begin{figure} \centering \includegraphics[width=\textwidth]{figures/sc_sram} \caption{Static RAM interface without pipelining} \label{fig:sc:sram} \end{figure} In cycle 1 the read transaction is started by the master and the slave registers the address. The slave also sets the registered control signals \sign{ncs} and \sign{noe} during cycle1. Due to the IO cell registers, the address and control signals are valid at the FPGA pins very early in cycle 2. At the end of cycle 3 (15ns after \sign{address}, \sign{ncs} and \sign{noe} are stable) the data from the RAM is available and can be sampled with the rising edge for cycle 4. The master reads the data in cycle 4 and starts a write transaction in cycle 5. Address and data are again registered from the slave and are available for the RAM at the beginning of cycle 6. To perform a write in two cycles the nwr signal is registered by a negative triggered flip-flop. In figure~\ref{fig:sc:sram:prd} we see a pipelined read from the RAM with pipeline level 2. With this pipeline level and the two cycles read access time of the RAM we get the maximum bandwidth possible. \begin{figure} \centering \includegraphics[width=\textwidth]{figures/sc_sram_prd} \caption{Pipelined read from a static RAM} \label{fig:sc:sram:prd} \end{figure} We can see the start of the second read transaction in cycle 3 during the read of the first data from the RAM. The new address is registered in the same cycle and available for the RAM in the following cycle 4. Although we have a pipeline level of 2 we need no additional address or data register. The read data is available for two cycles (\sign{rdy\_cnt} 2 or 1 for the next read) and the master is free to select one of the two cycles to read the data. \subsection{Master Multiplexing} To add several slaves to a single master the \sign{rd\_data} and \sign{bsy\_cnt} have to be multiplexed. Due to the fact that all \sign{rd\_data} signals are registered by the slaves a single pipeline stage will be enough for a large multiplexer. The selection of the multiplexer is also known at the transaction start but needed at most in the next cycle. Therefore it can be registered to further speed up the multiplexer. \\ \\ TODO: add a schematic for the master \sign{rd\_data} multiplexer. \section{Status} \begin{itemize} \item First timing diagrams drawn \item SimpCon SRAM interface for JOP on Cyclone and Spartan-3 is available \item Project at opencores.org accepted \end{itemize} % Next steps: % \begin{itemize} \item Continue this document \item Provide more SimpCon examples (e.g.\ a UART) \item Change JOPs IO interface to SimpCon \item Provide Wishbone bridges \end{itemize} % to clarify: \begin{itemize} \item Use transaction or transfer in this document? \item Use address phase or better command cycle? \end{itemize} \end{document} \section{Notes} \subsection{Group comment} After implementing the Wishbone interface for main memory access from JOP I see several issues with the Wishbone specification that makes it not the best choice for SoC interconnect. The Wishbone interface specification is still in the tradition of microcomputer or backplane busses. However, for a SoC interconnect, which is usually point-to-point, this is not the best approach. The master is requested to hold the address and data valid through the whole read or write cycle. This complicates the connection to a master that has the data valid only for one cycle. In this case the address and data have to be registered \emph{before} the Wishbone connect or an expensive (time and resources) MUX has to be used. A register results in one additional cycle latency. A better approach would be to register the address and data in the slave. Than there is also time to perform address decoding in the slave (before the address register). There is a similar issue for the output data from the slave: As it is only valid for a single cycle it has to be registered by the master when the processor is not reading it immediately. Therefore, the slave should keep the last valid data at it's output even when \emph{wb.stb} is not assigned anymore (which is no issue from the hardware complexity). The Wishbone connection for JOP resulted in an unregistered Wishbone memory interface and registers for the address and data in the Wishbone master. However, for fast address and control output ($t_{co}$) and short setup time ($t_{su}$) we want to place the registers in the IO-pads of the FPGA. With the registers are buried in the WB master it takes some effort to set the right constraints for the Synthesizer to implement such IO-registers. The same issue is true for the control signals. The translation from the \emph{wb.cyc}, \emph{wb.stb} and \emph{wb.we} signals to \emph{ncs}, \emph{noe} and \emph{nwe} for the SRAM are on the critical path. The \emph{ack} signal is too late for a pipelined master. We would need to know it *earlier* when the next data will be available --- and this is possible, as we know in the slave when the data from the SRAM will arrive. A work around solution is a non-WB-conforming early ack signal. Due to the fact that the data registers not inside the WB interface we need an extra WB interface for the Flash/NAND interface (on the Cyclone board). We cannot afford the address decoding and a MUX in the data read path without registers. This would result in an extra cycle for the memory read due to the combinational delay. In the WB specification (AFAIK) there is no way to perform pipelined read or write. However, for blocked memory transfers (e.g. cache load) this is the usual way to get a good performance. Conclusion -- I would prefer: * Address and data (in/out) register in the slave * A way to know earlier when data will be available (or a write has finished) * Pipelining in the slave As a result from this experience I'm working on a new SoC interconnect (working name SimpCon) definition that should avoid the mentioned issues and should be still easy to implement the master and slave. As there are so many projects available that implement the WB interface I will provide bridges between SimpCon and WB. For IO devices the former arguments do not apply to that extent as the pressure for low latency access and pipelining is not high. Therefore, a bridge to WB IO devices can be a practical solution for design reuse. \subsubsection{additional comments} The idea for (some) pipeline support is twofold: 1.) The slave will provide more information than a single \emph{ack} or wait states. It will (if it is capable to do) signal the number of clock cycles remaining till the read data is available (or the write has finished) to the master. This feature allows the pipelined master to prepare for the upcoming read. 2.) If the slave can provide pipelining the master can use overlapped wr or rd requests. The slave has a static output port that tells how many pipeline stages are available. I call this 'pipeline level': 0 means non overlapping 1 a new rd/wr request can be issued in the same cycle when the former data is read. 2 one earlier and 3 is the maximum level where you get full pipelining on the basic read cycle with one wait state (command - read - read - result). The draft of the spec at the moment are few sketches on real paper - takes some time to draw all diagrams for a document. I have a first implementation of SimpCon on JOP to test the ideas: A master in JOP and a slave for SRAM access.

     
    Copyright (c) 1999 OPENCORES.ORG. All rights reserved.