The Oracle at Delphi: And now the $100,000 question.

Thursday, January 10, 2008

And now the $100,000 question.

Will there be a switch to control when string = UnicodeString?

The current assumption about that is, no. Let me explain why.

DCU compatibility and linking problems - Suppose you built unit A with the switch to Unicode mode. Then you built unit B with it off. You can not link Unit A with B because of the type mismatches. Sure, some things might work, but lots of things won't. Things like assigning event handlers, var/out parameters and other would seem to fail with type mismatches. It gets even more confusing and frustrating for the user when they look at the source code and see string declared everywhere, yet things don't work. Even if we delivered two complete sets of DCUs, one for AnsiString and one for UnicodeString, there is still the problem with packages, design-time components, and third-parties.

The IDE would be natively built in Unicode mode - This requires that all design-time components be built in Unicode mode. You could not install a component package or IDE enhancement built in Ansi mode. The upshot of this is that the design-time experience could be very different than runtime.

Third-parties would have to deliver Ansi and Unicode versions of their components - I don't want to impose this kind of effort on our valuable and critical third-party market. From a source code standpoint, there is a lot (a majority) of code that can be compiled either way, but that doesn't help with testing, packaging, and install size. Delivering two versions of the same library for the same version of Delphi just doubled the testing effort.

Piecemeal moves to Unicode seem "safe" and are easier to "sell" to management, but there are real land mines - When you look at Unicode as merely a "way to display Kanji or Cyrillic characters on the screen" it is easy to conceptually think Unicode is only about display rendering. Unicode is really about a storage and data format for human-readable text. Display rendering is only one part of it. If you strictly relegate Unicode only to the visual portions of your application, you're ignoring the real "meat" of your application. A holistic approach is needed. If one were to take the seemingly "safe" route and only do a portion of the application at a time, you run the risk of hiding an untold number of "pinch-points" throughout your application where implicit or even explicit conversions are taking place. If a Unicode code-point does not map to the currently active Ansi codepage, that character(code-point) is dropped during the conversion. When it comes back out of the system and needs to be rendered, data is lost. The trick is finding all those "pinch-points" and figuring out what to do with them.

Character indexing and normal string manipulation remain unchanged - The UnicodeString is still a 1-based index, reference counted, lifetime managed entity. It is no different from AnsiString in this regard. The difference is that it has a payload of UTF16 (word-sized) elements instead of byte sized elements. String assignments, indexing, implicit conversions, etc all continue to work as expected. Length(UnicodeStringVar) returns the number of elements the same as Length(AnsiStringVar).

Code that must use AnsiStrings should be explicit - If your code absolutely must use AnsiStrings, you can explicitly change the declarations to AnsiString. You can do this right now with your existing code.

string is already a Unicode string - In the Delphi for .NET compiler, string has been equivalent to System.String which is a UTF16 element based string. Many of our customers have already had to deal with this fact, and have survived the transition very well.

An example.

As we've been working on making sure things compile with the new Unicode compiler, it has been surprising even for us as to how much code we have that simply just works. SysUtils.pas contains a lot of filename manipulation functions that do a lot of string manipulation internally. Take ExtractFileExt(), for example;

function ExtractFileExt(const FileName: string): string; var I: Integer; begin I := LastDelimiter('.' + PathDelim + DriveDelim, FileName); if (I > 0) and (FileName[I] = '.') then Result := Copy(FileName, I, MaxInt) else Result := ''; end;

This function simply recompiled and worked as is. Granted, it is only a few lines of code, but what I'm not showing here is code path for the LastDelimiter function. This function cases the Filename parameter to a PChar, then calls StrScan. Since all the functions that take PChar parameters do not do implicit conversions, we've provided overloaded versions of these functions. So even if you do a lot of "PChar" buffer manipulation, we've got those functions covered.

Beefed up warnings and hints.

Another thing we're doing to try and help folks easily identify sections of their code where they may need to inspect it, is the addition of more warnings. When the compiler sees certain code constructs such as implicit string conversions, strange pointer casting, etc extra diagnostic information will be output. Another compiler feature we've added is the ability to elevate any one or all warnings to be an error. We've actually been going through our own code (and I'm a little embarrassed to say we haven't been particularly "warning free") eliminating all warnings from the code and then elevating the warnings to a error. Now our own build processes will literally fail when someone checks in code that generates a warning.

Illusions, Delusions, Fantasy and Reality.

I hold no delusions that this change will be bump-free and every lick of code out there will work without a hitch. There will be a class of applications and libraries that will be affected far more than others. Our goal is to ensure that the vast majority of our users out there will see as little disruption as possible. Also, for those cases where disruption is bound to happen, we're working on providing tooling and education to assist in this transition. The cold-hard reality is that this change is arguably late in coming. This has been a perennial request for at least the last 5 - 6 years (probably longer). Getting on track, focusing our efforts, and addressing a real need for a large segment of our customers is sometimes a little painful.

Maintaining a strong bias for backward compatibility has come at a price. There are segments of customers clamoring for major sweeping changes to things like VCL (add skinning, a new data binding model, XML streaming, etc.). We, too, fantasize regularly about things like "what if we could ignore the past and just pick up the pieces and go for it?" The cold-hard reality is that we've built over the last 13 years high expectations about what customers have come to rely on from release to release. I know that.

For many of you, your first exposure to Delphi was maybe version 3 or later. Many of you never experienced the largest transition in Delphi's history, the move from 16bit Windows to 32bit Windows in the Delphi 1 to 2 cycle. Lots of changes happened there. The Integer data type grew to a 32 bit entity. string became a managed, reference counted, heap-allocated, entity that also managed to maintain a lot of "value semantics" encapsulated in an underlying reference type. The change was embraced because finally a string could hold huge amounts of data. You could "cast" the string to a PChar and call a Windows API function since it was also null-terminated.

Today, I realize that the landscape is different. There are many, many years of history. The Internet has permeated our very existence. The world has shrunk on account of this new level of connectivity. Countless millions of lines of code have been written. Given all of that, the need to communicate using a common, unified and standard encoding is paramount.

As Delphi moves into emerging markets, especially in the far east, if we are to continue to find acceptance and carve out our place, strong clear Unicode support is paramount. In many of those markets, governments are beginning to legislate and enforce how applications interact with character data. While this doesn't necessarily apply to the private sector, that sector does take a cue from those requirements and cannot afford to one day be shut out of certain jobs and markets. They too see the value and reasoning behind these rule and elect to follow suit.

Finally, I do not intend to fully shut the door on this issue as I know it will have (is having) a polarizing effect. I do, however, want to make sure people get as informed as possible. Agree or disagree, that's fine. One thing I learned early on in my career here at CodeGear (and Borland) was to truly think about a problem. Don't just pop-off with the first reason you can find about why something won't work. Also, continue to challenge your own conclusions and position. Don't be afraid to be wrong (and don't assume that is advice only for the "other guy" either). Get the facts. I'll help by presenting as many facts as I'm able. Let the games begin :-).

49 comments:

Jan DerkJanuary 10, 2008 at 7:28 AM
I totally understand your logic. I don't like my code breaking, unless it is worth it and Unicode is certainly worth it. Still I estimate that the effort won't be too large. ANSI applications can stay in D7, no problem. Plus I rather have you guys doing Unicode well and then do some other useful thing like generics or win64 support than put a huge amount of effort in staying ANSI backward compatible.

The one thing that scares me is DevExpress. I need DevExpress to support D2008 Unicode or I cannot upgrade. Did you already talk to them?
ReplyDelete
Replies
Thorsten EnglerJanuary 10, 2008 at 7:35 AM
You already have a switch that controls if string maps to string[255] (a short string) or AnsiString. Extend that switch (or introduce another one which switches Char/string between AnsiChar/AnsiString and UnicodeChar/UnicodeString). Feel free to make the Unicode state the default. But simply dumping compatibility completely is a really bad idea.
ReplyDelete
Replies
Nick HodgesJanuary 10, 2008 at 7:53 AM
Thorsten --

We really aren't "dumping compatibility". Your code can easily be made compatible by simply declaring all your strings to explicitly be AnsiStrings.

Nick
ReplyDelete
Replies
Ritsaert HornstraJanuary 10, 2008 at 8:17 AM
@Nick:

But if you do that then all VCL calls will contain conversion calls between AnsiString and string and bocome quite inefficient.

One question: Will the compiler understand:

OutputDebugString( PChar( UnicodeStr ) );

AND

OutputDebugString( PAnsiChar( AnsiStr ) );

?
ReplyDelete
Replies
Brad WhiteJanuary 10, 2008 at 8:22 AM
I wasn't sold on the idea that Unicode was important enough for you to work on given the opportunity cost of other features that get left out. But you do have a good point that the far east is a big market that is just coming online.
The better Delphi does, the better for me, regardless of whether this feature affects me or not.
Thanks,
Brad.
ReplyDelete
Replies
PeterJanuary 10, 2008 at 8:28 AM
I always hate non-breaking Delphi releases. They have blocked the innovation. Please break as much as possible! I need progress, I need future, I need new features in the compiler !!! ;-)
ReplyDelete
Replies
DanBJanuary 10, 2008 at 8:32 AM
1) Have you talked with DevExpress, and are the going support Unicode VCL in a timely manner in Delphi AND C++ Builder?

2) Have you talked with other component developers? Feel free the list the ones that have committed to supporting Unicode VCL in a timely manner.

3) One advantage of being so late to support Unicode is that other companies and products have made the switch. Learn from them. Microsoft went all Unicode from the ground up in .Net... This made sense, it was a new product with no legacy code to support and .Net has a lot of momentum... For C++/MFC, they took a different approach. Support for both side by side, with a compiler switch to make one the default. Maybe not as clean as a total switch, but a hell of a lot better for developers.
ReplyDelete
Replies
aam aamJanuary 10, 2008 at 8:44 AM
Now I see the fruits of using only OpenSource components.

Thank you JEDI, PascalScript, AsyncProf and VirtualTreeview.
ReplyDelete
Replies
George ShenJanuary 10, 2008 at 9:47 AM
Great post, I fully agree with you :)
ReplyDelete
Replies
snorkelJanuary 10, 2008 at 10:26 AM
What about things like tstringlist? Will there be a twidestringlist or something?
ReplyDelete
Replies
Pavel SJanuary 10, 2008 at 12:41 PM
Nick Hodges said :
'We really aren’t "dumping compatibility". Your code can easily be made compatible by simply declaring all your strings to explicitly be AnsiStrings.'
Problem is not changing it in my application. Problem is changing all toolboxes I use, many of them being pretty old and doing odd things. Too much work and testing. So I still see having a switch between ANSI and Unicode regimes a necessity.
ReplyDelete
Replies
Qian XuJanuary 10, 2008 at 4:34 PM
Hi Allen, Unicode migration was a topic ten years ago. Just do what Visual C++ has done.

BTW: Please re-consider of the name of UnicodeString. Please create a new type BSTR for COM.

Most users use WideString (D4-D2007) everywhere in your code. Some part of them are used as BSTR and the others are used as UnicodeString. When users upgrade to the next release, they have to explicitly change those WideString, that should be defined as UnicodeString, to UnicodeString, otherwise you cannot benefit from the speedup of string-reference-counting.

The question is: BSTR and UnicodeString, which one is more frequently used in your existing projects.

The full idea please check my blog (section "UnicodeString versus WideString")
http://stanleyxu2005.blogspot.com/2008/01/random-thoughts-on-unicode_10.html
ReplyDelete
Replies
EPOJanuary 10, 2008 at 5:03 PM
I hope seriously that "a new data binding model" is part of "that some others [things] that [you] just cannot talk about."....

Eddy
Brussels
ReplyDelete
Replies
Keld R. HansenJanuary 10, 2008 at 6:10 PM
I don't see the difference between {$H} specifiying that "STRING" is either "STRING[255]" or "AnsiString" and {$H} specifying that "STRING" can be either "STRING[255]", "AnsiString" or "UniCodeString".

If you could do it when you made AnsiString, why can't you do it when you make UniCodeString?????

The problems and issues are the same...
ReplyDelete
Replies
Keld R. HansenJanuary 10, 2008 at 6:12 PM
Will it be possible to re-declare "String" as in:

{$IFDEF UseAnsi }
TYPE STRING = AnsiString;
{$ELSE }
TYPE STRING = UniCodeString;
{$ENDIF }

If not, then you could make STRING a pre-defined type instead of a reserved word, then it would be fairly easy to do it in the UNITs that need it.
ReplyDelete
Replies
Georg StegmuellerJanuary 10, 2008 at 6:20 PM
Will the new warnings slow down the compiler like the .NET-related warnings (unsafe code et al.)? I hope not...
ReplyDelete
Replies
Steffen FriismoseJanuary 10, 2008 at 6:39 PM
Will it be possible to declare a variable of type UnicodeString[100], and what effect will it have?
ReplyDelete
Replies
ConcernedJanuary 10, 2008 at 6:51 PM
I have an idea...

Andreas Hausladen has found cool ways to patch VCL functions and classes with alternate ones of his own. Can't this concept be applied for the Ansi/Unicode VCL???

Make the VCL totally unicode but then via an extra Ansi compatibility unit (for example adding "AnsiCompat" unit as first item to uses clause in .dpr file), patches these with Ansi equivalent functions/classes instead, so that way the user can decide what to use. Add the unit and all built-in VCL units (or at least all the popular ones that most people use) behave as Ansi, leave out the unit (default) and they behave as Unicode.

One downside is codebloat as there will basically be extra versions of all VCL functions/classes, but that is a better alternative than breaking code. Maybe the Win32 generics feature can be applied to this to reduce this?

Also for example many thirdparty libraries use VCL functions like TStringList, etc. Having these suddenly Unicode can cause problems for some apps and 3rd party libraries (some unsupported cos they old now) which even though we have the source we might not understand enough to change it), and simply replacing "AnsiString" for "string" is not enough.
ReplyDelete
Replies
ConcernedJanuary 10, 2008 at 7:06 PM
Idea #2 :)

Include 2 versions of the delphi compiler & VCL...
(but u covered that already... well not exactly :)
what if the other compiler and VCL is exactly the same from D2007, that way D2007 Ansi .bpl's work but in D2008 also but obviously without any new features (as it's exactly same D2007 compiler & VCL), but at least your code compiles and you get to use the new D2008 IDE, and when ready can move you app. over to the D2008 compiler/VCL by setting a IDE switch which says okay I want to use the Unicode compiler/VCL now - so all D2008 specific .bpl's are assumed to be Unicode, and Ansi ones are D2007 compatible (as they actually compiled using the D2007 compiler but called from D2008 IDE).

Any 3rd party component vendor then only still only needs to support only one VCL from D2008, and if they support D2007 that .bpl's will work as is on D2008 (and D2007).

A switch is however needed to switch between the D2008 compiler and D2007 compiler, and might need to start the IDE in either D2007 compiler or D2008 compiler mode if it causes problems with loading of packages/bpl's?

Or something like that...
ReplyDelete
Replies
Kiaser ZohsayJanuary 10, 2008 at 10:06 PM
Data loss during a conversion to ANSI is a valid point, but what about UTF8? UTF8 is capable of storing all of the code points that UTF16 can represent, yet is still byte compatible with ANSI. You do potentially have to deal with multi-byte characters, so Length(Utf8Str) and StrLen(@Utf8Str[1]) could return different values. But implicit conversions would be much smoother.
ReplyDelete
Replies
Anthony FrazierJanuary 10, 2008 at 10:14 PM
@Concerned:

For all that trouble, just stay with D2007. That makes a lot more sense to me. If CodeGear was feeling particularly gracious, they could even include a copy of D2007 in the box.
ReplyDelete
Replies
Adrien ReboissonJanuary 11, 2008 at 12:59 AM
"If CodeGear was feeling particularly gracious, they could even include a copy of D2007 in the box."

I support that, really. I want to be able to write new Unicode apps, but I really need to support legacy apps without spending hours changing strings to AnsiString.
ReplyDelete
Replies
ahmoyJanuary 11, 2008 at 2:31 AM
1. I agreed with Keld R. Hansen said. if delphi can support option for ansistring and fixed string, then why cant we have the same thing for unicodetsring?

2. "Third-parties would have to deliver Ansi and Unicode versions of their components... Delivering two versions of the same library for the same version of Delphi just doubled the testing effort."
Same applies to support unicodestring only without a switch. i can't see how this will have help the third parties to deliver their library/components.

3. if the new version will be a breaking version then we will live to see a major changes to the VCL controls!
ReplyDelete
Replies
Greg StevensonJanuary 11, 2008 at 4:11 AM
UTF-16 can use 2 or 4 bytes for any given character. In this way it is similar to UTF-8. Sounds like the CodeGear approach is not true UTF-16, but only valid for characters in the basic multilingual plane. Is this correct? Are there plans to also support true UTF-16 strings or will UTF-8 be the only way this will be supported? Granted most people should never need more than BMP, but then I remember Mr. Gates predicting that nobody will ever need more than 640K of memory.
ReplyDelete
Replies
Allen BauerJanuary 11, 2008 at 4:24 AM
Greg,
My next post should answer that question. It is UTF16 and will allow surrogate pairs.

Allen.
ReplyDelete
Replies
KryvichJanuary 11, 2008 at 5:02 AM
Great post!
While backward compatibility is good, VCL and Delphi should move forward to be competitive.

@Nick:

To leave some/all units of an application ANSI compatible there should be a possibility to declare the ANSI encoded constants as well as variables.
For example,
const
ch = 'A';
How can we tell to the compiler that ch should be ANSI encoded, not UTF8 encoded? May be
ch: AnsiChar = 'A';
or
ch= AnsiChar('A');

And how about resourcestrings? They always should be Unicode.
ReplyDelete
Replies
KryvichJanuary 11, 2008 at 5:04 AM
Fix to previous post: "UTF8" - "UTF16".
ReplyDelete
Replies
Dan PalleyJanuary 11, 2008 at 7:32 AM
While you're at it, can you make Tiburon compile my BP7 code with no changes as well?

Seriously, older versions of Delphi that were used to create these older applications will still continue to work once Tiburon is released. Older apps would continue to be compiled with the older Delphi version.
ReplyDelete
Replies
Alexandre MachadoJanuary 11, 2008 at 8:05 AM
Three words Allen: Go for it!
I made that transition from Delphi 1 to Delphi 2, ten years ago. A major app was written in D1. And after all that sleepless nights reviewing new code, I can assure you: Worthed!
We all have to admit: I've never been excited about a new version of Delphi since....long time. I really believe that CodeGear guys are doing their best.
ReplyDelete
Replies
Tobias GiesenJanuary 11, 2008 at 9:02 AM
Awesome work there! I'm not afraid of UnicodeString at all around here. For old apps I will always have Delphi 2007.
ReplyDelete
Replies
m. Th.January 11, 2008 at 5:48 PM
Hi Allen,

Thanks a lot for your insights about Unicode support in the next Delphi
release. It's really encouraging to see that the cooperation between
CodeGear and community is increasing. Way to go!

However, reading your blog posts we saw that the folks are a little bit
concerned :-) about the impact which such a global change will bring and
they asked about a compiler switch which will turn this feature on/off.
OTOH, you were so kind to explain us in this post that this isn't really
possible. Imho, your reasoning really stands and because it happens that
you are the CG's chief scientist perhaps you have a bare knowledge about
the internals of code base :-) so, perhaps we must take this as
something really verified on. (Btw, I really like your position in your
blog posts nowadays).

...but because we must please our community (isn't? ;-) ) and we must
mitigate the testing implied by the Unicode conversion, I'd humbly
propose a refactoring called 'Type replace' which will appear as a new
menu item under 'Refactoring' which would have three UI artifacts:
1. a combo box with caption "Area:" (with elements like 'Current
Project', 'Current Unit', 'Directory' (hummm.... how to specify it with
ease?), 'Open files', 'Selection' aso.)
2. an editable combo box "Source type:" (in which the user can write
anything but provide him some bare types like: string, integer, double etc.)
3. another editable combo box "Destination type:" (with same UI
functionality as above).

And when the user presses 'Ok' then the refactoring engine - which will
use the _syntactic_ lexical parser which you have already - in fact this
is the central point and the main difference with a simple Search &
Replace - will display all the occurrences and the user will choose what
to refactor. Related to our theme this can be easily used in order to
'turn off' the Unicode string by refactoring (all?) the code to
AnsiString and after this, gradually, the programmer will re-refactor
back to Unicode areas which he had tested/checked already. Imho, this is
way better than an on/off switch because when one wants to turn the
switch 'on' he must be ok with everything without having the possibility
to do this step by step (or to speak in our language, we must build
'blindly' all our unit tests which must suddenly work when we turn the
switch 'on'). Also this refactoring can be used also in other areas.
Perhaps one wants to change all his 'single's to 'double's all his
'TIntegerField' to 'TFloatField', TButton to TBitBtn etc. See, the last
two examples from above, implies also refactoring not only in .pas but
also in .dfm, isn't? :-) Also you can add a smaller 'brother' here a
'Replace hard cast occurrences' (which, sometimes, is, oh, so useful...)
with the same Presentation layer as 'Replace types' but will replace the
TSourceType(foo) / foo as TSourceType with TDestinationType(foo) / foo
as TDestinationType etc.

just my2c & hth
ReplyDelete
Replies
Qian XuJanuary 11, 2008 at 7:41 PM
Are "string", "Char", "PChar" (possible to be) equivalent to generic-text mapping macros?

If they are, then there is no reason not to make UNICODE_MODE=ON/OFF switchable. No one will blame you, when their code is broken with UNICODE_MODE=OFF. Third-parties can still delivery their one-version components with UNICODE_MODE=ON.
ReplyDelete
Replies
Patrick van LogchemJanuary 12, 2008 at 6:42 AM
As I see it, 'string' is just an alias, currently mapped to either AnsiString or ShortString. Adding another meaning to it would offer us the choice to compile code written with 'string' to UnicodeString, or not - keeping it at AnsiString (or ShortString, for people still using that type).

As I said in a previous comment, this could be done with a little extra effort from your side :

Would the RTL, VCL and IDE be refactored to use the UnicodeString and WideChar types explicitly, then surely 'string' wouldn't need to be mapped to UnicodeString now, or would it?

Users must be made aware of potential casting-problems ofcourse, but as long as CodeGear's units (and 3rd party components) don't use 'string' and 'Char' anymore, then these types could still mean whatever we choose, or am I missing something important here?

I'd really like some feedback on this...
ReplyDelete
Replies
HSJanuary 14, 2008 at 5:16 AM
One thing that's still quite unclear to me:

Will D2008 compiled exe's be able to run under Win 95/98 at all?

If not, what will "happen" if I run that exe file anyway.

What about those exe's which only use AnsiString or doesn't access any of the Unicode Win API's?
ReplyDelete
Replies
Allen BauerJanuary 14, 2008 at 5:52 AM
HS,

Most likely you'll get an Windows loader error that it could not locate all DLL imports from kernel32, user32, etc...

Allen.
ReplyDelete
Replies
HSJanuary 14, 2008 at 7:29 PM
Got it. So D2008 produced exe's will be a 100% switch over to Win2000/XP/Vista/NT4.0...

Fair enough!
ReplyDelete
Replies
AdemJanuary 14, 2008 at 11:01 PM
I too am excited about the new Delphi, but I don't agree with the name WideString.

Why can't you call it UTF16String?

This way, it would be a lot less confusing.

And, once you're at it, why can't we have UTF8String and UTF32String too?

One final thing --somewhat (un)related to this topic: Is it such a trade secret that CG does not allow even a read-only access to its lexical parser tree?

If CG allowed this, it would open up a wide field for people to develop a better and more usable refactoring tools (and code formatters) etc.
ReplyDelete
Replies
Qian XuJanuary 16, 2008 at 6:29 AM
Hi Allen, I am still very confused. Is D2008 really non-breaking? For instance, I have a function:

function MyFunc(const S: string): Integer;
// string=AnsiString
begin
Result := WindowsAPI(PChar(S), Length(S));
end;

If string=UnicodeString (not switchable), I have to change the function to

function MyFunc(const S: string): Integer;
begin
if not OSIsUnicode then
Result := WindowsAPI_A(PAnsiChar(AnsiString(S)), Length(S))
else
Result := WindowsAPI(PChar(S), Length(S))
end;

It will be also double effort on our side, won't it?
ReplyDelete
Replies
Allen BauerJanuary 16, 2008 at 7:35 AM
Qian Xu,

The first example you cite will rebuild as-is and will call the Unicode version of the API. There will be no need for any of the contortions in your second example.

Another thing to understand is that just referencing a windows API in most cases causes a hard-reference to be placed in the resulting binary. This means that the Windows loader will still fail to load the binary even if you have run-time checks like you demonstrate above.

Allen.
ReplyDelete
Replies
HSJanuary 16, 2008 at 10:24 AM
Just for clarification:

Is there any way for D2008/VCL to work in Ansi mode and utilise the *_A (Ansi) variants within the Windows API? (save for explicitly calling the Windows API)

In principle the VCL could determine the _A or _W route by looking at the parameter type of the API call: AnsiString will call _A and String (UnicodeString) will call _W.

Or will all ANSI support be completely eradicated?

FWIW: I don't think supporting Unicode on 95/98 (via MSLU) is a major requirement. After all, end users who truly need Unicode applications should definitely move to a "real" Unicode OS in that case, like Win2000/XP etc.

However, I think it's perfectly reasonable to be able for Delphi to target 95/98 in plain old Ansi/code page mode.

Cutting the Delphi support at Win2000/NT full stop does seem a bit harsh. There's plenty of 98's out there still, and Ansi mode it a perfectly legitimate platform...
ReplyDelete
Replies
Kristofer SkaugJanuary 17, 2008 at 8:20 AM
I understand your arguments, and I too have seen the hordes in QC and other places clamoring for Unicode. But it's a shame, so much effort and pain for the CodeGear team and its existing customers to accommodate an "emerging market". Let's hope that it pays off for you, at least! We probably will still buy future Unicode-Delphi releases to have Vista support and other stuff. But I dread what the Unicode string support will do to some of my applications and libraries. I use LOTS of functions for casting byte-strings around, which are central to performance. AnsiStrings are a must-have data type in my domain, while Unicode is non-existent. We now seem to be facing the choice of "parking" most of this code with D2006, or taking an expensive review-and-rewrite-tour to save the stumps.
ReplyDelete
Replies
Maël HörzJanuary 21, 2008 at 2:04 AM
"Another thing to understand is that just referencing a windows API in most cases causes a hard-reference to be placed in the resulting binary. This means that the Windows loader will still fail to load the binary even if you have run-time checks like you demonstrate above."

I do this quite a lot in my apps and it works. The reason is that Win9x has stubs of most Unicode-APIs. They do fail when called but the Windows program loader loads the EXEs just fine also under Win9x.

What would be great is if Windows.pas and similar files could be replaced by custom/user versions. This way you could add conditional switches to each Unicode WinAPI, and thereby still support Win9x.

for example GetFileAttributesW would look something like this:
function GetFileAttributesW(lpFileName: PWideChar): DWORD;
begin
if Win32PlatformIsUnicode then
Result := Windows.GetFileAttributesW(lpFileName)
else
Result := Windows.GetFileAttributesA(PAnsiChar(AnsiString(lpFileName)));
end;
ReplyDelete
Replies
Maël HörzJanuary 21, 2008 at 2:18 AM
The advantage would be that CodeGear wouldn't have to handle this, but users/the community/third parties could add such compatibility units that support older platforms. They would essentially replace the default API-declarations by a wrapper function like shown above that emulates missing functionality.

People who don't need to support legacy platforms could just stick with the default CodeGear implementation and avoid the overhead.

This would also enable you to painlessly drop other Windows versions in future since "compatibility units" could be added to ease the transition.

The compatibility layer would be in one central location, all the rest of the code would not have to be aware of that and wouldn't need to care about the platform.

I think to break backwards compatibility to Win9x had to be done, but it would be nice to let people easily add a compatibility layer themselves.

Thanks for considering.
ReplyDelete
Replies
Allen BauerJanuary 21, 2008 at 2:30 AM
Maël,

Hmm... interesting. That was my main concern was having to actively late-bind all the non-Unicode APIs which would add a chunk of code.

The example you cite works fine for the cases where the passed in string is a "read-only" string, but there are other cases such as GetModuleFileName() where the string buffer is a result. This would mean that this "stub" would have to also allocate it's own local buffer and then translate on the way out. Possible but still fraught with the potential for nasty errors.

If this could be done in a way you describe where it is an optional thing, it may be worth pursuing. No guarantees, but an intriguing idea nonetheless.

Allen.
ReplyDelete
Replies
Maël HörzJanuary 21, 2008 at 2:37 AM
Thanks for the response. I thought of stubs like was done by the TNT controls, but that could be integrated seamlessly by somehow substituting Windows.pas.

TMS Software might be interested in developing such compatibility units as they bought the TNT Unicode controls.

Again thanks for considering.
ReplyDelete
Replies
Chi SongTaoJanuary 23, 2008 at 3:29 AM
OMZ............
I cannot speak more.
MSVC support both Ansi & Unicode for years. So codegear have no choice.

Unicode, Win64, Kylix are all we can expect from codegear. However, we do not like using only these features.

Bind BSD2007 as a patch for Tiburon.... if you cannot overcome it for technical reasons.
ReplyDelete
Replies
TKMarch 8, 2008 at 2:11 AM
I really think if string is made 16-bit unicode by default it should equal the previous WideString.

Then, fast reference counting should be for all long string types, i.e. AnsiString and string = WideString.

I hate that slow implementation of WideString by means of Win32 API.

I do not see a objective reason to introduce new symbol like UnicodeString.

Will ShortString be made Unicode aware?
ReplyDelete
Replies
Tassos KyriakosMarch 25, 2008 at 10:21 AM
Its the first time after years i am excited for a new Delphi release.
Unicode support is going to be a must in a global word . This is already true for countries of European union where is very common the need of precessing descriptions or names in different languages (german, french, greek, cyrillic etc) in the same application.
BUT,
i would prefer that string ** remains ** an ansi string and just all the new VCL/IDE uses UnicodeString **explicitly**.
ReplyDelete
Replies
TomasNovember 17, 2009 at 6:02 AM
Uaaaaa,

it's two months ago, when we bought new version of delphi, but we can't use it, becouse our codes, codes of native delphi components and third party components using string for binary data buffers and made our codes totaly incmpatible. Great work, really... :( next time you change boolean type from true and false to yes, no and maybe
in my case, I must review evry line in our code and most alogithm rewrite, but maybe it's more efective rewrite it to another language and forget delphi...
I think, Delphi is great programing language with super IDE, but incompatibles betwen version kill it.
ReplyDelete
Replies

Add comment

Please keep your comments related to the post on which you are commenting. No spam, personal attacks, or general nastiness. I will be watching and will delete comments I find irrelevant, offensive and unnecessary.

The Oracle at Delphi

Thursday, January 10, 2008

And now the $100,000 question.

49 comments:

Blog Archive

Popular Posts

Labels

MVP

My Blog List