UTF8 encoding

Hello,

I am saving different types (image, plain text, html text, sound and
pdf) of content into a database in Byte[] format.
Is UTF8 a correct encoding for all these content types?

Thanks,
Miguel
0
shapper
11/16/2009 3:37:23 PM
dotnet.languages.csharp 1931 articles. 0 followers. Follow

6 Replies
1067 Views

Similar Articles

[PageSpeed] 42

Hello,

> I am saving different types (image, plain text, html text, sound and
> pdf) of content into a database in Byte[] format.
> Is UTF8 a correct encoding for all these content types?

UTF-8 is a way to encode unicode characters.  You should just store the 
bytes to your db without any further encoding (if needed it should have been 
encoded earlier, stricly speaking you don't have anything else to do to 
store the content than getting it and saving it unchanged in the db). If you 
tried something that doesn't work please be explicit on the problem you 
get...

Could it be that you save the content in a varchar or text column ?  With 
SQL Server 2005 or later varbinary(max) is likely the preferred datatype for 
blob data.

Another well known option is to store the data outisde of the db in the 
filesystem and store its location inside the db. SQL Server 2008 has also 
some support for doing this transparently...

Some details could perhaps help to better understand the exact point on 
which you need help.

--
Patrice 

0
Patrice
11/16/2009 3:52:15 PM
Hello,

On a C# web application I have a few global resources that I need to
save on the database.

Examples:
- Welcome Text (Plain Text)
- Contact Text (Html Text)
- Logo Image (Image JPEG)
- About the compay file (PDF file)
- Catalog Ambient Sound (MP3 File)

These are isolated elements that are used around the web application.
So I would like to have a way to save all them on the same SQL table
or in a XML file for small projects.

For Plain Text and Html Text I can read the bytes in it.
The other files I think in my C# code I will easily.

When saving in a XML file I think I need to save the byte[]
representation to Base64String.

In both cases, SQL and XML, I will have a column with the Mime Type of
the content.

In case of the XML I will never have more than 20 content elements.
In case of the SQL I can have around 1000 element max.

I do know about file stream. And I do use it for saving files in a
Documents table.

But in this case the content itself can be a file, or plain text or
plain html, etc.
Can I save to file stream a plain text or plain html text that was
converted to byte[]?

Does this make sense?

Thanks,
Miguel



0
shapper
11/16/2009 4:11:49 PM
> But in this case the content itself can be a file, or plain text or
> plain html, etc.
> Can I save to file stream a plain text or plain html text that was
> converted to byte[]?

Sure the key point is that the encoding is not something you deal with when 
you store those data. This has been dealt earlier ie. if you save a sound, a 
word document, or an UTF-8 encoded HTML file you'll just get this content as 
bytes and will save those bytes unchanged to the db...

--
Patrice 

0
Patrice
11/16/2009 4:21:28 PM
On Nov 16, 4:21=A0pm, "Patrice" <http://scribe-en.blogspot.com/> wrote:
> Sure the key point is that the encoding is not something you deal with wh=
en
> you store those data. This has been dealt earlier ie. if you save a sound=
, a
> word document, or an UTF-8 encoded HTML file you'll just get this content=
 as
> bytes and will save those bytes unchanged to the db...

True, but if I need to get the from a XML file where it was saved
before by converting the Byte[] to Base64String, don't I need to use:

Byte[] Content =3D Encoding.UTF8.GetBytes(MyContent)

If Content is a file I use the Byte as it is in my C# application.
If Content is plain text or html text then can I use MyContent
directly?

Thanks,
Miguel
0
shapper
11/16/2009 4:33:54 PM
>>True, but if I need to get the from a XML file where it was saved
>>before by converting the Byte[] to Base64String, don't I need to use:

>> Byte[] Content = Encoding.UTF8.GetBytes(MyContent)

No once base64 data are decoded you have the same content that was stored 
(ie. it is already encoded the same way if you stored encoded data).

For example when you store a file on disk, the disk doesn't care what is is. 
It just take the bytes and save them. "Encoding" is just a convention to 
represent unicode characters and is needed when you change this convention 
(ie. the HTML document you stored is encoded using a method and you want to 
display it using another encoding convention).

Do you have problems if you just read back your data ?

--
Patrice



0
Patrice
11/16/2009 5:27:03 PM
I was doing something else and I believe I suddenly could have understood 
your issue. Do you mean that the problem is when you convert back the byte 
array to a string ?

Also .NET uses UTF-16.  Is this a web app ? Usually the conversion happens 
when data are written to the Responsse output stream depending on the coding 
defined in the web.config file...

If it still not that doing a small sample so that we can understand what is 
the issue you currently have is likely best (I assume you do have some 
problem currently ? If not try first the soimplest option and see if you 
have an issue so that we can start from there)

--
Patrice 

0
Patrice
11/16/2009 6:28:56 PM
Reply:

Similar Artilces:

Incoming mail encoding OL 2003
I just installed OL 2003 and having problems with reading messages in Russian. The general problem is that historically Russian has at least 3 code tables that are being used now. So besides the language, you need to specify encoding (code table). If message contains character set info in the header, everything works fine. But some mailers used in Russia do not specify it. All they say "8bit" in the best case. In previous versions of OL and in OE there is a place where you can specify default encoding for incoming messages. But this is not the case in OL 2003. I couldn...

Encoding Exchange 2003
I am currently using Outlook 2003 and Exchange 2003. I am getting messages rejected from a list serv that I post to because it has to be in BASE64. I found a document on how to change my Outlook but Exchange seems to be overriding it. If I send from OWA my posts to the list serv are successful. Any thoughts? You can perhaps create a new Domain container under Global Settings | Internet Message Formats, add the list serv's domain and set message encoding. Hopefully that's the only address you send mail to in that domain, because everything that goes to the domain will now ...

Outlook Inserting Photos inline in HTML encoded messages...
I am looking for somethign that will help me do the following: From explorer, select an image, resize it, and send it to a HTML email, as an inline attachment - ie. I want to see the picture when it is inserted.. Currently, there is an option in XP to email this file, which will infact resize the picture, however when it inserts it into my version of outlook (XP), it is inserted as an attachment, not inline, ie. so you can see it... Is there any way to do this? Thanks Derek the integration of Microsoft Picture Library with Outlook 2003 in Office 2003 allows you to do this from Pictur...

Encoding drives man to madness
This one really has me puzzled, and the problem is not as obvious as it sounds in the beginning, so please read this entire message, and thanks in advance for your help!! Using OL 2003, I send a message to a recipient, and the recipient says the message appears "just fine." He replies, leaving my original message at the bottom of his reply. When I receive the reply, it also looks fine, but my original message (at the bottom of his reply) has weird characters all over - in place of any punctuation I used (apostrophes, quotes, etc.). He looks in his "Sent mail" folde...

encoding type for existing .xml doc
How to determine the encoding type for an existing .xml doc ? UTF8 , 1252 , etc. John A Grandy: > How to determine the encoding type for an existing .xml doc ? > UTF8 , 1252 , etc. mainly guess! ;-) What's needed: http://groups.google.com/groups?threadm=uMdjuu6lCHA.1960%40tkmsftngp04 For UTF-8 and UTF-16 see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 .... especially the part on BOM. Mark John A Grandy wrote: > How to determine the encoding type for an existing .xml doc ? UTF8 , 1252 , > etc. If the encoding is anything but UTF-8 or UTF-16 then ...

Text Encoding Problem
I asked a friend to translate a letter I wrote in English to Hungarian, and she sent me back the translation via email. I then copied the text from the email message in Entourage and pasted it into a Word document. All of the instances of a letter that looks like an o with a double apostrophe over it (if it comes through this is it: ő ) in Entourage carries over into Wrd as something that looks like a capital O and capital E squashed together (if it comes through this is it: Œ ). How can I get the proper character to display in my Word document ??? ...

How to handle special "sentinal" characters encoded on magstripes
I purchased a card printer and encoder planning on starting a loyalty program. I encoded the customer account number, but when I swipe the card to look-up the customer, my reader includes the start and end characters in the lookup field (accessed through F7 on the POS screen). I think those characters are required to be there, but is there a way to handle them on the look-up? This is a multi-part message in MIME format. ------=_NextPart_000_08B0_01C6F6F2.F5F99190 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Brain, Turn off the sta...

Encoding #3
on my PC i cannot read chinese text email in MS Outlook 2003 even encoding it. but if view the email on my laptop, i can read the chinese text message after encoding. both machines installed with Windows XP SP2 & MS Outlook 2003. What could be the cause? ...

C# server encode -> C++ client side decode
I need to encode some information on the server side using ASP.NET with C#; sending via HTTP to a client side application, that needs to be decoded in an MFC C++ application. I'm not sure if I can encode something using: C#: System.Security.Cryptography (to encode) and C++: wincrypt.h (to decode) Does anybody have any idea about this? Thank you in advance; it is greatly appreciated. J jtfaulk@eudoramail.com wrote: > I need to encode some information on the server side using ASP.NET with > C#; sending via HTTP to a client side application, that needs to be > decoded in an ...

problems with encoding for different locales
Hi, I am trying to characterise a problem I am seeing on our C#/C++ xml driven application. We have recently added some basic Spanish language support to our application, but it seems that there is an issue with certain locale settings. Under Regional and Language Options -> Standards and Formats I would normally change the applications language by selecting Spanish (Mexico). It looks like the application ignores the Location field. However if I set the Standards and Formats drop-down to Spanish (Chile), the application raises an exception when processing the XML data, as it is s...

Description.SoapBindingUse.Encoded : R2706
hello i develop a web service in VB.Net that it works fine when i use Description.SoapBindingUse.Literal but it gives R2706 error code when i use this code. how can i use Description.SoapBindingUse.Encoded property. Thank you. <System.Web.Services.WebMethodAttribute(), System.Web.Services.Protocols.SoapDocumentMethodAttribute("DispatchDeneme", RequestNamespace:="http://127.0.0.1/DenemeWebService", ResponseNamespace:="http://127.0.0.1/DenemeWebService", Use:=Description.SoapBindingUse.Encoded)> _ Public Function dispatchDeneme(ByVal String_1 As Str...

How to determine encoding of XML file ?
I need to read XML file, transform it and then save in the same encoding. XmlDocument class naturally has no encoding-related members. How can I determine encoding of XML file ? Oleg Subachev Oleg Subachev wrote: > I need to read XML file, transform it and then save in the same encoding. > XmlDocument class naturally has no encoding-related members. If the XML document has an XML declaration with an encoding specified then xmlDocument.FirstChild is the XmlDeclaration node and has a property named Encoding: XmlDocument xmlDocument = new XmlDocument(); xmlDocument.L...

specifying the encoding attribute explicitly
Hello, Using code below to create an xml file, how do I specifiy utf-8 encoding in the root element, ie create "<?xml version="1.0" encoding="utf-8"?> Thanks! ------------------------------------------- Dim xmlWriter As New XmlTextWriter("C:\inetpub\wwwroot\rss2\20061019\test.xml", Nothing) xmlWriter.Formatting = Formatting.Indented With xmlWriter .WriteStartDocument() .WriteStartElement("rss") .WriteAttributeString("version", "2.0") .WriteStartElement(&...

charset & encoding. #2
Hi, I could not c the body. it is display junk character. I recived from japanese charset in mail header. but body in english. regds, anbu ...

Choosing a file encoding in VBA
Hello, I have to work with the Mac version of Excel (v.X, equivalent to Excel 2000). In a macro, I want to save a string as a file (with the open path fo output as #1 .... close #1 instructions), but I remarked that the file encoding was Mac OS Roman. Does anyone know how I can save the file with the Windows-1252 or UTF-8 file encoding? Thanks, Renaud -- Renaud do you have a file, save as, file type txt (MS-DOS) *.txt -- --- HTH, David McRitchie, Microsoft MVP - Excel [site changed Nov. 2001] My Excel Pages: http://www.mvps.org/dmcritchie/excel/excel.htm Search Page: ...

Encoding for Publisher 2003
Which encoder should be used for publishing web pages to the internet. When we installed Pub2003, it defaulted to Western European. Is there a site that explains the difference between these encoders, i.e. Western European vs. US ASCII? Thank you. LoBo Designs Leave it at Western European. That setting should not need adjusting. If you should happen to use some odd symbol and font that doesn't render correctly then setting it to Unicode would be required. In your IE browser go to View, Encoding... you will find it at Western European as well ( or possibly utf8). Please use our web fo...

Force Character Encoding
Hi -- Okay, so how does one go about forcing character encoding? I thought that by having this in the <head> section: <meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1"> That a browser would automatically choose to use ISO-8859-1, but for some inexplicable reason I've seen pages lately where both Firefox and IE8 choose to use Unicode instead. These are just pretty straight-up ASP pages, with this as the very first line: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w...

wrong encoding of incoming email Outlook 2002
Hi, I have Outlook 2002 with latest updates running on WinXP with Asian language support installed. I get quite a few messages from people using Chinese Windows. Although these are written in English, neither Outlook or IE seem capable of selecting the correct fonts to enable message to be viewed in English (all I get is gibberish). Manually changing the fonts (View - Encoding) sometimes helps, but often not. Auto-select seems useless. Any thoughts appreciated. Thanks David. Do you know what email program the people sending these messages are using? It sounds like the character set...

Encoding again
Hello! When I enter unicode characters into a search engine and press Enter, I do see the unicode characters, even in the address bar, but when I copy the text from the browser address bar, I see that it was replaced by another encoding... For example like this: http://www.bing.com/search?q=%CE%97%CE%B8%CE%B5%CE%BB%CE%B7%CE%BC%CE%AD%CE%BD%CE%B5%CF%82+&go=&form=QBRE&filt=all Can somebody tell me how to convert a Unicode string into this %... encoding? I have to do so in order to post this string to my browser control. Unicode characters are not recognized... ...

Reading XML Encoding errors
I am programming an XML reader in VB.NET 2005 and it works fairly well. Once in a while though I encounter an old XML file without the header <?xml version="1.0" encoding="UTF-8"?> It craps out on the Load with an error similar to "Invalid character in the given encoding. Line 3, position 5475070". After some research the character in question is the copyright character. My question is how can i force the reader to assume UTF-8? It seems like my other newer files do not have this problem, just my older files. I want to be able to catch this error and then...

HyperLink control and HTML encoding
If I have a hyperlink control like this: <asp:HyperLink ID="HyperLink1" runat="server" Text="&'><" ToolTip="&'><" NavigateUrl="~/"> </asp:HyperLink> it renders as: <a id="HyperLink1" title="&amp;&#39;>&lt;" href="/">&'><</a> Am I to understand that the Text attribute's value is never encoded, and the ToolTip's value is encoded by default? If so, according to what rules, since the > is not convert...

Howto select encoding ISO-8859-1
I'm would like to enconde my XML into ISO-8859-1, but I can't seem to find howto. ANy suggestions? StringWriter writer = new StringWriter(); XmlTextWriter xmlWriter = new XmlTextWriter(writer); xmlWriter.Formatting = Formatting.Indented; doc.Save(writer); return writer.ToString(); Michael H wrote: > I'm would like to enconde my XML into ISO-8859-1, but I can't seem to find > howto. ANy suggestions? > > StringWriter writer = new StringWriter(); > XmlTextWriter xmlWriter = new XmlTextWriter(writer); > xmlWriter.Formatting = For...

Mail encoding in Entourage and Outlook for Mac (Beta-2)
Version: 2008 Operating System: Mac OS X 10.6 (Snow Leopard) Processor: Intel Email Client: Exchange Hi, <br><br>We have a problem with writing non English text in emails. If I write email which have Lithuanian letters in it, like( &amp;#260;&amp;#269;&amp;#281;&amp;#279;&amp;#303;�&amp;#371;&amp;#363;� ) recipient (Outlook 2007 or Apple Mail) receives this as ( &amp;#260;&amp;#269;&amp;#281;??�??� ). This can be solved with inclusion of any spacial symbol like Euro sign ( � ). If this sign is anywhere in the email text, Entourage or O...

How: Setting UTF8 as an application wide text encoding format
XmlTextReader myXmlReader = new XmlTextReader(args[0]); string en = myXmlReader.Encoding.EncodingName; //Console.WriteLine(x); Error: Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object. HOW CAN I GET THE ENCODING NAME ? Basically determine the encoding type. Also How can I "set" an application wide Encoding ? I want all modules and everything to stick to and conform to UTF- 8. But somehow some methods automatically go to UTF-16 xmlguy, 1) What's args[0] in your example? Does it contain any Xml? In that case you first...

Outlook Subject Encoding
Hello All: I recieved an email from a client in China, the subject line is all gibberish but the body of the email has legible Chinese text. I am using Outlook 2007, how can I decipher the subject line? I have already installed Chinese language pack for Windows XP. Any help will be greatly appreciated. Cheers. ...