Making a new post to prevent too much editing.
So carrying on from my previous post I am adding some items to this list and changing it slightly.
I will keep this post updated and edited, extending until complete with my development on both DS-lite & NAT64.
DS-lite / NAT64 Implementation Considerations
Scalability is all about throughput control & service up time. Can I with a single box service this amount of customers, how many initial flows per sec/per sec is it capable of & can I swap out the parts in case of failure or a change in the traffic? Carrier grade vendors for DS-lite are few and far between. This is the problem I faced at least. With only seven known and stable vendors on DS-lite and only two capable of delivering the throughput needed, 80 gig plus. So can I service n customers with x flows per second & with y throughput. And if this changes can I swap, move or reposition a node, and upgrade it without having to swap the complete chassis
- Also what is its future, will I be able to add 100 gig or higher forwarding performance in the future.
logical and physical redundancy is a must, these platforms will supply most of your new customers and in time a huge percentage of them so this is a major risk implementation, you must get it right, and when we don't or "out-of-our-control" hardware failure or software bugs cause issues you need to know that traffic has a path or you start to lose income.
Now this includes the biggy.... stateful syncing of the NAT cache. Now don't neglect that his needs to be a multi-box cluster so not just active and standby but three or four boxes all with synced caches, dependent on your architecture of course. One node goes down you want to be able to shared the traffic between the other nodes, and guarantee that the session(s) will not go down. instant failover comparative with the routing convergence (make sure network convergence is out of the equation as much as possible and test well can be a major sticking point dependent on your network architecture).
3. Particular features
So what feature do we want? We ll some are discussed below but lets go over the major ones. Future or present delivery of PCP (port forwarding control), buffering on the SI NPU, MSS is a must for TCP, Netflow or Syslog for your monitoring, management and data retention, compatible ALGs where and if needed, CPE to fit your AFTR architecture, tunnel gateway DHCPv6 option form the AFTR to the CPE,
4. CPE compatibility
As mentioned above a little, make sure your DS-lite CPE L3! (the ! is supposed to be there) device is suited. If you have a 12 meg service but your CPE can't do a throughput of more than 40 meg on IPv6 .... plus the node latency of the DS-lite additional processing encapsulating the IPv4 into IPv6 then forwarding it on might be well over 1 ms which is quite high. And I don't think, nor is it necessary with DS-lite, to degrade your present customer's service.
5. PCP development
PCP is a nice tech for port forwarding on DS-lite and kinda sexy. Make sure your vendors, both CPE (this is necessary) and the AFTR, are developing their products and willing to use PCP.
6. ALG requirements
There aren't very many and I won't distribute the list of work-able and non-work-able protocols here as it is to a small degree vendor independent, but make sure you know what you will miss especially if you don't use private NAT4x4 in your network at present.
Node latency, bandwidth, peak performance, amount of concurrent customers per node or blade, amount of initial & secondary (already have some of the NAT entry in the cache) flows per sec/per sec & is this dependent on the utilization. Are all the components line rate? until recently UDP traffic was the only way to validate high rates of ten gig plus throughput on any node but that has recently changed with some testing vendors developing full stateful TCP session duplication on their test platforms. This is young technology and the automation isn't fully there yet, but you can now validate if your AFTR vendor's figures are correct and it can do 750,000 session initialization per sec/per sec on TCP at full line rate of the box, 40 gig, with a node latency less than 100 micro.
This goes without question, but to a certain degree this is greenfield tech, so make sure you test until you are blue in the face. No use in getting slapped later with TTL 10 or less not being able to transition your AFTR/CGN.
9. Logging for DR & network measurement
This is a tricky one, with Radius, Netflowv9, IPFIX, XML & Syslog being front runners for this race. But if we are talking DR you need a time stamp and a 1:1 flow record for end destinations on a full cone deployment. The amount of data is reasonably huge. Consider testing your output heavily before choosing on your options. And let your lawyer/security teams let you know the full DR requirement for each country you deal in. You might not need timestamps, or we might, we oh so hope, get a few concessions from the men in grey.
Note that radius is a good option if the government is reasonably sane on its requirements but it has restrictions. Some eastern european countries ay legally force you into implementing a full 1:1 solution. This has been developed with netflowv9 and syslog at present with syslog eating into performance. Bring the v4 and v6 templates together has seemed to provide a nice scalable and non-performance impacting solution.
10. SNMP OIDs for monitoring/possibly XML
You need to monitor your DS-lite and NAT64. You have a little choice and your standard OIDs wont do it. Consider what you will need to monitor the service. You don't want to be costing your company 80xn cents a minute to deliver a call to 6,000 customers because you have no idea what is wrong or when it went down. Watch for the 32 bit issues & no explicit traps. Also you may want to consider extending a script to these nodes to walk & deliver a report each day. There should be to high a number of AFTRs to potentially cause flooding even on large networks so a viable option.
11. Fragmentation, re-ordering & reassembly
OK now this is one of the painful items. If the your nodes don't support fragmentation your packets will be dropped or you will get some interesting buffers occurrences. If they support fragmenting their own packets to support the tunnel MTU size but they don't support receiving fragmentation out of order, reassembly and reordering, then your packets will be dropped. Sure we have packet to big messages for both IPv6 and IPv4, but you want less latency not more. So study your MTU size for all three connections well and test them well and validate your vendor supports both packet to big for both IP protocols (for UDP), TCP MSS & frag,reassembly & reordering. Reassembly & reordering are different so consider both. Reordering means the TCP info does not need to be swapped into another packet and the AFTR/CPE just waits for the first of the series then starts sending the packets on again. Watch for your buffer thresholds here, CPEs & AFTRs can be taken down with high thresholds. And note that you don't get that many fragmented packets these days anyway that either the CPE or the AFTR have to deal with.
Now to continue this one has to also consider not just receipt of the fragmented packet but the receipt of a packet that needs fragmentation on the node. You must consider ICMP/UDP/TCP. TCP can be 95% controlled by MSSS clamping. Preferably both the CPE and the AFTR will have this feature for obvious reasons if one does not then the packet might not reach the feature. but this is only 95%, C3 T4, etc ... might be ignored by the older VPNs, thus you will need fragmentation coming into the AFTR or on the AFTR itself (preferable, especially in the case of an MP-BGP/LDP deployment). You have two ways of doing this. First is to cover on RFC6333, the IPv6 tunnel fragmentation on ingress and reassembly on egress, however this is certainly not preferred on the downstream as this is an overlay to IPv6 which doesn't fragment by default on transit, and thus the code tends to add a heavier performance hit than on IPv4, which is 1-5%, quite suitable. IPv4 fragmentation on a transit packet been around for long enough for it not to hurt too much which is the second option. You will need to implement RFC6333, or get your vendor to make sure they comply, just for the reassembly to cover yourself if the v4 payload is not fragmented on the upstream on the CPE you may use and the vendor may refuse to implement it. So your AFTR will need to be configurable for downstream IPv4 fragmentation & upstream 6333 reassembly, or choose a modem vendor that fragments on IPv4 and ignores the IETF approved deployment which may one day become useful. This also obviously handles your ICMP & UDP in conjunction.
So this covers all of your fragmentation needs end to end. If you consider your "tunnel MTU" will be the IPv4 payload MTU set at 1460 to give your IPv6 wrap enough space to fit through the 1500 std spread across your network more than likely your then, not including the DSL market which seems to enjoying preventing 1500 seamless, end to end architecture can stay intact with guaranteed delivery for all types of events. Do not neglect that security/banking/vpn app/stack/pass through that ignores the ICMP responses that your packet is too big. note also to validate both your CPE and AFTR ICMP responses themselves and the transit of those responses in case the DF bit is set.
12. Tunnel gateway propogation
Right there is a new option that is being discussed and developed for the CPEs to request the AFTR tunnel gateway address from the AFTR itself, option 64. This causes complexities with your present DHCPv6 setup, or potentially will, however note that this will will be the preferred option for most of us to assign the anycast or std unicast gateway address for the AFTR SI to the CPE.
13. NAT64 DNS hack
As discussed before the NAT64 DNS hack not in all DNS server app vendors. You may have to ask for it, but I would separate it from your NAT64 LSN if you can.
14. Does it fit in your network.
The main problem with explicit post is that my requirements will be different from yours in some manner. So make sure you match your topology. Do you require centralization to improve delivery cost, scalability & geo-traffic balancing and if so where are you placing your AFTRs logically? For example a lot of vendors allow for MPLS/MP=BGP implemented NAT64 and DS-lite which means placing your AFTR in the CORE is no longer a problem.
15. NAT64 - where it won't work
NAT64 is the saving grace.... if we can get it working across enough ALGs to make it marketably viable. Instead of buying 10 million CPEs for DS-lite we can use a few LSNs strategically places and deliver the services required. But there are a few drawbacks and the main two are ..... all applications/clients/servers wanting to access the internet will only be able to access an IPv4 only web service if the originating communicating is initialized by a DNS request on IPv6. Secondly, and most importantly, any legacy device in the home that cannot be updated with an IPv6 stack and sits solely on IPv4 will have no internet access. NAT64 gives the new equipment and viable applications in the customers home the chance to interact with the solely and latent to IPv6 web services, etc ... but those old IP TVs & fridges will never be able to use it.
16. AFTR address withdrawal
Under what conditions will your AFTR address be removed from the routing table. Is it configurable? Does it allow for max redundancy? Can it be done on shared resource of the throughput? For example if you are using 4 NPUs within your AFTR, SI workload processors, and they are only at 10% each and you lose two with peaks of 15% the AFTR address can still be propagated and the AFTR as the max load of the remaining two will reach only 30% peak each, so no need to withdraw the gateway in the network for that AFTR as the it can still handle its present requirement. Also memory, flapping routes (dampening), out-going route failure upstream on the v4 side, etc ..... all need to be considered as "watermarks" for your AFTR address withdrawal.
17. Shared Resources
Your AFTR has four main shared logical resources & three physical resources. Your logical resources can be consider the AFTR SI Gateway Address, your sessions, your Port Blocks & your IPv4 Public Address Pools. Your three physical are your memory, interface line rates & processing power. Now comparatively these physical resources must be able to share the logical. For example using 4 NPUs you must be able to use a single SI gateway address for the AFTR and that address/function uses ALL of the NPUs evenly, hashed load balancing, can use any incoming and out-going interface, or use an interface for both. For that single AFTR address you must be able to use the same memory as a single shared resource and load balance over numerous potential resources. Your IP pool must be considered a single address pool or be able to be broken out, but it must be able to be configurable. If the above are not a feature of your AFTR you will end up having a single AFTR in each NPU. So six NPUs (SI processors) would act like six AFTRs, no shared resources, one dies blackholing occurs, 6 AFTR addresses which then causes you problems on DHCP Option 64, your IPv4 Address Pool then needs to be broken into 6 subnets, and you will never get any load blanking between them.
18. Lets focus on PCP for now. Now we need UPnP 2 support without a doubt ... UPnP and PMP need to be supported both. How much do we allocate and what changes occur to our DR with separate primary flows being generated by PCP. So we may have to push for static versus dynamic ratios here. So we need a solution that allows static block allocations, minimizing and mitigating PCP increase in primary logging required, to maybe 30% of peak and the rest of the block allocations leave to dynamic. So this allows for a single AFTR to log only once the static and then when under a reasonable high load the dynamic takes over. Even splitting so only primary block allocations are static and the rest would be dynamic. So we have a few options here. The more static blocks allocations you use the less efficiency you will have with your IPv4 public allocations however, so note.
Regardless f logging we need to allow for static port forwarding mechanisms like customers do today directly on to the modem in most case ... however ... we need a PCP negotiation of sorts here so a portal system will have to be developed per MSO to allow for the customer to configure their own modem & let the AFTR know which port is allocated. Dynamic allocation for PCP is reasonably easy. Assign a reservation, above the standard reserve of 1024, lets say 4k and allow PCP to use this allocation for UPnP and PMP.
The problems comes in clustering and session redundancy. I hope to give y ou this solution towards the end of the month.
I will, if time permits, update you all by the end of the month with the final DS-lite requirements that you have to consider. then we will go on to the potential NAT64 windows embedded client solution as a back up for those MSOs who cannot afford to swap out their modems. But I think NAT44 bridged at the CPE will be the only solution for now. Puts you into a state of double NAT with customers running their own L3 devices but I am not sure what other choice we can present that doesn't provide a huge service deprecation.